Part query support
When the index service response is very large and the time required to process the request and produce the complete response exceeds certain thresholds and it causes various timeout or network error.
Another problem related to a very large response volumes is that in case of the error the incomplete response needs to be discarded and a new request needs to be issued. This is expensive and wasteful.
Finally, when the response is large, one client may not be able to hold all the data. For example, 1 billion records are about 200GB. It could cause OutOfMemory error for single JVM client.
In order to address problems above, Index Layer provide the Part Query Support. User can split the single large query into multiple parts, then query different parts in parallel. Each part query is smaller so as to avoid timeout or network error. If there is error on individual part, only the specific part query need to be retried. The different part queries can be executed in a distributed cluster, such as with Apache Spark. The combined results are consistent and reflect the snapshot at a single point of time.
part queries, you have to firstly send a
getParts request. The format of a
getParts request is:
GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>/parts?numRequestedParts=<Positive Integer> HTTP/1.1
Example of response:
numRequestedParts must be a positive integer.
- If the layer you will query contains small amount of data, the actual number of partIds returned may be less than
Once you get the
partIds, you can query each part using the following format:
GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>?query=<RSQL>&part=<partId> HTTP/1.1
- The query must be performed within two hours of the
partIds being generated.