API Week 11: API documentation

by Lars Willighagen

week
gsoc
gsoc2020
api
eval#3
week#11

Deliverable: API documentation

Week Summary

This week I mainly worked on finishing the Linked Data API, and making it work for all types of entities available for export. I also worked on the API documentation. For testing the API, I wanted to export the entire database, or at least a large part of it, and upload it to a SPARQL server for testing. To make such an export I added pagination functionality to that API and as a result, similar APIs as well.

Because the requested data can be of various, specific formats I could not just send some URLs to the next and the previous page, like in JSON-only APIs. What i could do, however, is use the Link HTTP header to send those URLs. That looks like the following:

HTTP/1.1 200 OK
Server: nginx/1.13.9
Date: Thu, 20 Aug 2020 14:32:32 GMT
Content-Type: text/turtle; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Link: <http://127.0.0.1:2354/artifacts?page=1&limit=20>; rel="first"
Link: <http://127.0.0.1:2354/artifacts?page=1&limit=20>; rel="current"
Link: <http://127.0.0.1:2354/artifacts?page=16268&limit=20>; rel="last"
Link: <http://127.0.0.1:2354/artifacts?page=2&limit=20>; rel="next"
X-DNS-Prefetch-Control: off

If you get this response and want to go to the next page, you find the Link with rel="next" and request that, until you get the last page and there is no “next” link. Since this is the first page, there is no “prev” link.

I then made a Node.JS client for exporting the linked data in the N-Triples format, mainly because in that format you can switch around the lines without changing the meaning, unlike XML, JSON or Turtle. This allowed for easily streaming each page directly to a file, keeping the memory footprint low. I used an existing library to parse the Link headers, parse-link-header, but to actually go from page to page I wrote a small asynchronous generator function:

const fetch = require('node-fetch')
const parseLinks = require('parse-link-header')

async function* fetchPages (url, format) {
    let next = url + '?limit=100'
    while (next) {
        const response = await fetch(next, {
            headers: { Accept: format }
        })

        const links = parseLinks(response.headers.get('Link'))
        next = links && links.next && links.next.url

        yield response.text()
    }
}

There was some additional code to log the state and handle API mirrors but this was the essentially all that is needed. The usage is simple:

const fs = require('fs')

const file = fs.createWriteStream('./artifacts.nt')
const pages = fetchPages('http://127.0.0.1:2354/artifacts', 'application/n-triples')

for await (const page of pages) {
    file.write(page)
}

Again, the actual client works slightly differently to handle parallel downloads of different entity types, though at that point file I/O is probably the main bottleneck. The full code is available at cdli-gh/framework-api-client.

The deliverable of this week is complete, at least as soon as the documentation is actually merged into the framework.

Daily Work Update

# Day Date A short description of the work done
1 Mon 2020/08/10
2 Tue 2020/08/11 Review search merge request (!156)
3 Wed 2020/08/12 Work on API documentation; review search merge request (!156)
4 Thu 2020/08/13 Add more mappings; add API documentation
5 Fri 2020/08/14 Implement LD exports for all relevant entities; implement RESTful pagination
6 Sat 2020/08/15 Fix minor mapping issues; start working on API client
7 Sun 2020/08/16 Support LD exports for API client