Marklogic REST API search for latest document version - rest

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.

The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.

The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!

Related

Complex requests with REST API

I am wondering if it is possible to adhere to REST principles when creating what will essentially amount to a BI tool.
In my scenario I have high data volume with 100,000's of IDs (frankly more than this but for the sake of this example let's go with that.). These are presented in a traditional table that allows for necessary features when accessing large data sets such as pagination. The user also has the ability to filter by one, or many of these ID's to drill down the data set as they see fit.
It is theoretically possible that the user would want to filter on 100 of the ID's, thus rendering a GET URI incredibly long. Which as I understand it would kind of break the resource identification principle of a REST API. Not to mention could potentially bump into the character limit in a GET request for certain browsers since the ID's may be quite long. Normally I would just use a POST since I can add all of the applied filters in the body and generate a where clause on the server.
Since a POST in a REST API is supposed to
Create a new entry in the collection.
By definition it would appear, at least to me that generating a complex query for something like this would mean that a REST API is not possible. Or does this perhaps mean that I am approaching the solution wrong (totally plausible).
It would seem that in my scenario using a GET simply isn't possible due to the potential length of the parameters. Thus I am forced to use a POST. Though using a POST as I am seems to violate REST style, which isn't the end of the world. I mostly just wanted to double check that I am not missing something and there is not a solution using a GET.
To follow the resources principle, make a search like resource. POST your ids in a body wto this endpoint and it will prepare a list of results for you and redirect you to searchresults/{id}.
See for example: https://gooroo.io/GoorooTHINK/Article/16583/HTTP-Patterns---Bouncer/25829#.W3aBsugzaUk

REST API Get single latest resource

I'm designing a REST api and interested if anyone can help with best practice in the following scenario.
I have...
GET Customers/{customerId}/Orders - to get all customer orders
GET Customers/{customerId}/Orders/{orderId} - to get a particular order
I need to provide the ability to get a customers most recent order. What is best practice in this scenario? Simply get all and sort by date or provide a specific method?
I need to provide the ability to get a customers most recent order.
Of course you could provide query parameters to filter, sort and slice the orders collection, but why not making it simpler and give the latest order if the client needs it?
You could use something like (returning a representation of a single order):
GET /customers/{customerId}/orders/latest
The above URL will map an order that will change over the time and it's perfectly fine.
Say there is also a case where you need last 5 orders. How would your route(s) look like?
The above approach focus on the ability to get a customers most recent order requirement. If returning the last 5 orders requirement eventually comes up after some time, I would probably introduce another mapping such as /recent that returns a representation of a collection with the recent orders and accepts a query parameter that indicates the amount of orders to be returned (5 would be the default value if the parameter is omitted).
The /latest mapping would still be valid and would return a representation of the very latest order only.
Providing query parameters to filter, sort and slice the orders collection is still a valid approach.
The key is: If you know the client who will consume the API, target it to their needs. Otherwise, make it more generic. And when modifying the API, be careful with breaking changes and versioning the API is also welcome.
I think there is no need for another route.
Pass something like &order=-created_at&limit=1 in your get request
Or &order=created_at&orderby=DESC&limit=1 (note I'm not sure about naming your params so maybe you could use &count=1 instead of &limit=1, ditto order params)
I think it also depends whether you are using pagination or not on that route, so perhaps additional params are required
Customers/{customerId}/Orders?order=-created_at&limit=1
The Github API for the similar use case is using latest, to fetch the single resource which is latest.
https://docs.github.com/en/rest/reference/repos#get-the-latest-release
So to fetch a single resource which is latest you can use.
GET /customers/{customerId}/orders/latest
However would like to know what community think about this.
IMO the resource/latest gives an impression that the response will be a list of resource sorted by latest to oldest.

Accessing versions/revisions of an object in a RESTful API

While designing a RESTful API we came across the problem of how to access different versions of the "same object". Let us say a page object is identified by a unique key and accessed by GET /api/page/pagekey. It can be updated by sending PUT /api/page/pagekey and an appropriate document in the body.
Now our system keeps track of old versions of the page, that we also want to access via the API. Let us assume that an older version of the document is version 1. There seem to be at least two ways to design the API to access this particular version of the page:
GET /api/page/pagekey/1
GET /api/page/pagekey?version=1
The first variant renders the particular version as its own resource; the second variant gives the existing resource an optional version context.
Is variant (1) or (2) a better solution? Or is there an even better way to do it?
In variant (1) a request for a version number that does not exist e.g. /api/page/pagekey/7 could trigger a HTTP 404 Not Found, which is quit handy. Would this also be a valid status response when considering variant (2), where we only change the context "version" of the existing resource, that would without the version parameter return a HTTP 200 Ok response?
Each resource url should be a permalink to identify that resource.
GET /api/page/{id}/{rev}
That certainly is a permalink to a specific version of the resource. So, that's fine. But note that the permalink does not require the content to be the same over time:
GET /api/page/{id}
That will return the latest revision which is fine and will change contents over time. To expand on that, you can even have temporal resources like this and be RESTful:
GET /api/page/latest
But, /api/page/{id}?version={rev} will also work and doesn't break any RESTful concepts.
I think the /{id}/{rev} is a bit purer since it specifically identifies that resource in the addressable url and feels a little more correct than putting making it a param. The reason is the params should be modifiers on how to retrieve the contents and not necessarily mutate the distinct resource you're retrieving. In your case, since each version is distinct, it seems more appropriate to distinctly address the resource. But, even that one doesn't break any RESTful url rules or concepts and if you asked 10 folks you might get a different answer :)
Either way, you should likely ensure the temporal resource /api/page/{id} returns the latest revision.
Almost by definition, REST will have no notion of "same object". If you need this in your protocol, then you'll need to have some kind of "identifier". As simple as that ;)
A URL parameter is one obvious way to go. "/1" or "?version=1" are certainly two good alternatives - which you choose is just a matter of preference (as well as a question of how much "other stuff" you might also want).
Either way, you're still going to have to cope with "version not found" kinds of errors, and recover gracefully.
IMHO...

REST Webservices - GET but for multiple objects

I have already gone through this
How best to design a REST API with multiple filters?
This does help when you have say 3 or 4 filtering criteria and you can accomodate that in the query String.
However let's take this example
You want to get call details about 20 telephone numbers, between a certain startdate and enddate.
Now I do agree ideally one should be advised to make individual queries for each number and then on the client side collate all data.
However for certain Live systems that would mean 20 rounds of queries on the switches or cdr databases. That is 20 request-response cycles plus the client having to collate and order them again based on time. While in the database level it would have been a simple single query that can return an ordered data and transformed back into a REST xml response that the client can embed on their system.
If we are to use GET the query string will get really confusing and has a limit as well.
Any suggestions to get around this issue.
Of course we can send a POST request with an xml having all numbers in it but that is against REST Get principles.
In case of GET use OData queries. For example when your start and end dates represented as numbers (unix time) URI could look like:
GET http://operatorcalls.com/Calls/Details?$filter=Date le 1342699200 and Date gt 1342526400
What you seem to be missing is an important concept of REST, caching. This can be done, as an example, in the browser, for a single client. Or it can be done as a shared cache between all the clients and the live production system (whatever it may be). Thus reducing queries against a live production system, or in your example, actual switches.
You should really take some time to read Fieldings thesis, and understand that REST is an architectural style.
I found a solution here Handling multiple parameters in a URI (RESTfully) in Java
but not quite happy with it.
So in effect we will end up using /cdr?numbers=number1,number2,number3 ...
However not too pleased with it as there is a limit to Query String in the url and also doesn't really seem to be an elegant solution. Anyone found any solution to this in their own implementation?
Basically not using POST for this kind of Fetch requests and also not using cumbresome and lengthy Query Strings.
We are using Jersey but also open to using CXF or Spring REST

Exposing database query parameters via REST interface

I have the basics of a REST service done, with "standard" list and GET/POST/PUT/DELETE verbs implemented around my nouns.
However, the client base I'm working with also wants to have more powerful operations. I'm using Mongo DB on the back-end, and it'd be easy to expose an "update" operation. This page describes how Mongo can do updates.
It'd be easy to write a page that takes a couple of JSON/XML/whatever arguments for the "criteria" and the "objNew" parts of the Mongo update function. Maybe I make a page like http://myserver.com/collection/update that takes a POST (or PUT?) request, with a request body that contains that data. Scrub the input for malicious querying and to enforce security, and we're done. Piece of cake.
My question is: what's the "best" way to expose this in a RESTful manner? Obviously, the approach I described above isn't kosher because "update" isn't a noun. This sort of thing seems much more suitable for a SOAP/RPC method, but the rest of the service is already using REST over HTTP, and I don't want users to have to make two different types of calls.
Thoughts?
Typically, I would handle this as:
url/collection
url/collection/item
GET collection: Returns a representation of the collection resource
GET collection/item: Returns a representation of the item resource
(optional URI params for content-types: json, xml, txt, etc)
POST collection/: Creates a new item (if via XML, I use XSD to validate)
PUT collection/item: Update an existing item
DELETE collection/item: Delete an existing item
Does that help?
Since as you're aware it isn't a good fit for REST, you're just going to have to do your best and invent a standard to make it work. Mongo's update functionality is so far removed from REST, I'd actually allow PUTs on the collection. Ignore the parameters in my examples, I haven't thought too hard about them.
PUT collection?set={field:value}
PUT collection?pop={field:1}
Or:
PUT collection/pop?field=1