Occasionally, there is a need to search for and return a large number of annotations. Previously, Hypothesis’ API made use of a parameter called
offset that allowed the user to skip a number of initial annotations. By using
offset, users could search all annotations and return a specific subset of them at a time. The time it takes to perform this request is proportional to the
offset plus the number of annotations returned. So, while this method works very well when the value of
offset is only, say, a couple thousand, it becomes very slow for larger offsets. In some cases, if
offset is large enough, the request can fail completely. To combat this problem, a new method of searching for bulk annotations has been introduced: the parameter
search_after. Hypothesis recommends changing any requests to the /api/search endpoint that currently use
offset to page through thousands of annotations to use
Why We Made the Change
Previously, Hypothesis searched for bulk annotations by a sliding window where the /api/search endpoint would return
limit number of annotations starting at
|integer [ 0 .. 9800 ]
The number of initial annotations to skip. This is used for pagination.
|integer [ 0 .. 200 ]
The maximum number of annotations to return.
i.e.: If there were a total of 100 annotations,
limit=20, the search endpoint would return annotations 10-30.
Newer versions of elasticsearch impose a restriction on
limit such that
limit can not be greater than 10,000. This means that the /api/search endpoint will not return any annotations beyond the 10,000th annotation by using
limit. Regardless of what is passed to
offset is capped at 9,800, and so,
search_after became the new standard in Hypothesis to search for bulk annotations:
Returns results after the annotation whose sort field has this value. If specifying a date use the format
yyyy-MM-dd'T'HH:mm:ss.SSX or time in milliseconds since the epoch.
This is used for iteration through large collections of results.
How it Works
search_after is based on
sort defaults to the updated field, or the last time the annotation was updated, and
order defaults to descending (so the most recently updated annotations will be found and returned first).
search_after will return annotations that occur after the annotation whose sort field has the
- If there are 31 annotations—1 for each day in October—the search parameter combination of
search_after=2018-10-05 will retrieve annotations made from the 6th of October to the 16th of October.
- If there are 31 annotations with IDs 0-31, the search parameter combination of
search_after=5, will return the annotations with IDs 6-16.
limit is inefficient because elasticsearch must load all the annotations (
limit number of annotations) into memory and sort them before returning the window of annotations defined by
search_after does not require all the annotations to be loaded and sorted because it can be applied as a filter on the search query itself—as opposed to
offset, which must be applied after the initial search. This is why
search_after is more efficient and, while the old parameter
offset does remain, it is not recommended to use it.