I'm interested in exposing a direct REST interface to collections of JSON documents (think CouchDB or Persevere). The problem I'm running into is how to handle the GET
operation on the collection root if the collection is large.
As an example pretend I'm exposing StackOverflow's Questions
table where each row is exposed as a document (not that there necessarily is such a table, just a concrete example of a sizable collection of 'documents'). The collection would be made available at /db/questions
with the usual CRUD api GET /db/questions/XXX
, PUT /db/questions/XXX
, POST /db/questions
is in play. The standard way to get the entire collection is to GET /db/questions
but if that naively dumps each row as a JSON object, you'll get a rather sizeable download and a lot of work on the part of the server.
The solution is, of course, paging. Dojo has solved this problem in its JsonRestStore via a clever RFC2616-compliant extension of using the Range
header with a custom range unit items
. The result is a 206 Partial Content
that returns only the requested range. The advantage of this approach over a query parameter is that it leaves the query string for...queries (e.g. GET /db/questions/?score>200
or somesuch, and yes that'd be encoded %3E
).
This approach completely covers the behavior I want. The problem is that RFC 2616 specifies that on a 206 response (emphasis mine):
The request MUST have included a Range header field (section 14.35)
indicating the desired range, and MAY have included an If-Range
header field (section 14.27) to make the request conditional.
This makes sense in the context of the standard usage of the header but is a problem because I'd like the 206 response to be the default to handle naive clients/random people exploring.
I've gone over the RFC in detail looking for a solution but have been unhappy with my solutions and am interested in SO's take on the problem.
Ideas I've had:
- Return
200
with a Content-Range
header! - I don't think that this is wrong, but I'd prefer if a more obvious indicator that the response is only Partial Content.
- Return
400 Range Required
- There is not a special 400 response code for required headers, so the default error has to be used and read by hand. This also makes exploration via web browser (or some other client like Resty) more difficult.
- Use a query parameter - The standard approach, but I'm hoping to allow queries a la Persevere and this cuts into the query namespace.
- Just return
206
! - I think most clients wouldn't freak out, but I'd rather not go against a MUST in the RFC
- Extend the spec! Return
266 Partial Content
- Behaves exactly like 206 but is in response to a request that MUST NOT contain the Range
header. I figure that 266 is high enough that I shouldn't run into collision issues and it makes sense to me but I'm not clear on whether this is considered taboo or not.
I'd think this is a fairly common problem and I'd like to see this done in a sort of de facto fashion so I or someone else isn't reinventing the wheel.
What's the best way to expose a full collection via HTTP when the collection is large?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…