Deliverable: API documentation
This week I mainly worked on finishing the Linked Data API, and making it work for all types of entities available for export. I also worked on the API documentation. For testing the API, I wanted to export the entire database, or at least a large part of it, and upload it to a SPARQL server for testing. To make such an export I added pagination functionality to that API and as a result, similar APIs as well.
Because the requested data can be of various, specific formats I could not just
send some URLs to the next and the previous page, like in JSON-only APIs. What i
could do, however, is use the Link
HTTP header to send those URLs. That looks
like the following:
HTTP/1.1 200 OK
Server: nginx/1.13.9
Date: Thu, 20 Aug 2020 14:32:32 GMT
Content-Type: text/turtle; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Link: <http://127.0.0.1:2354/artifacts?page=1&limit=20>; rel="first"
Link: <http://127.0.0.1:2354/artifacts?page=1&limit=20>; rel="current"
Link: <http://127.0.0.1:2354/artifacts?page=16268&limit=20>; rel="last"
Link: <http://127.0.0.1:2354/artifacts?page=2&limit=20>; rel="next"
X-DNS-Prefetch-Control: off
If you get this response and want to go to the next page, you find the Link
with
rel="next"
and request that, until you get the last page and there is no “next”
link. Since this is the first page, there is no “prev” link.
I then made a Node.JS client for exporting the linked data in the N-Triples format,
mainly because in that format you can switch around the lines without changing the
meaning, unlike XML, JSON or Turtle. This allowed for easily streaming each page
directly to a file, keeping the memory footprint low. I used an existing library
to parse the Link
headers, parse-link-header
,
but to actually go from page to page I wrote a small asynchronous generator function:
const fetch = require('node-fetch')
const parseLinks = require('parse-link-header')
async function* fetchPages (url, format) {
let next = url + '?limit=100'
while (next) {
const response = await fetch(next, {
headers: { Accept: format }
})
const links = parseLinks(response.headers.get('Link'))
next = links && links.next && links.next.url
yield response.text()
}
}
There was some additional code to log the state and handle API mirrors but this was the essentially all that is needed. The usage is simple:
const fs = require('fs')
const file = fs.createWriteStream('./artifacts.nt')
const pages = fetchPages('http://127.0.0.1:2354/artifacts', 'application/n-triples')
for await (const page of pages) {
file.write(page)
}
Again, the actual client works slightly differently to handle parallel downloads
of different entity types, though at that point file I/O is probably the main
bottleneck. The full code is available at cdli-gh/framework-api-client
.
The deliverable of this week is complete, at least as soon as the documentation is actually merged into the framework.
# | Day | Date | A short description of the work done |
---|---|---|---|
1 | Mon | 2020/08/10 | |
2 | Tue | 2020/08/11 | Review search merge request (!156) |
3 | Wed | 2020/08/12 | Work on API documentation; review search merge request (!156) |
4 | Thu | 2020/08/13 | Add more mappings; add API documentation |
5 | Fri | 2020/08/14 | Implement LD exports for all relevant entities; implement RESTful pagination |
6 | Sat | 2020/08/15 | Fix minor mapping issues; start working on API client |
7 | Sun | 2020/08/16 | Support LD exports for API client |