Query node-label topology from Yarn via REST API [MapR 6.1/Hadoop-2.7] - rest

There is a Java and CLI-interface to query Yarn RM for node-to-nodelabel (and inverse) mappings. Is there a way to do this via the REST-API as well?
An initial RM-API search revealed only node-label based job submissions as an option.
Sadly that is actually broken in MapR-Hadoop (6.1 as of 6/6/19), so my code has to work around that, by implementing the correct scheduling itself. This works (barely - more broken APIs here as well) using the YarnClient Java API.
But as I want to schedule jobs against different resource managers at the same time, behind firewalls, the REST-API is the most compelling option to achieve this, and the YarnClient API's RPC backend can't be easily transported.
My current worst-case solution would be to parse the YARN-WebUI in some way.

The only solution I found so far:
Request /ws/v1/cluster/nodes - this gets you all nodes.
FlatMap/Distinct on each node's nodeLabels, if you need just the list of node labels. Filter by nodeLabel, if you need all nodes for a specified label.
This does mean, that you always have to query all nodes, then sort/filter/arrange by NodeLabels, which is a lot of client-side magic. But apparently there's no GetNodesToLabel or even GetClusterNodeLabels to help us out.
I assume getLabelsToNodes is just a client-side implementation, as the protocol doesn't define the API, so that's right out the window for REST, unless implemented in the WebService.

Related

Is Unique ID for the resource is necessary a single string

Most of the online tutorials has a end point the looked like this one
/users/{id}
- get
- post
I am currently on a platform where a 3rd party plugins can be integrated/installed and we are not sure, which third party plugins are installed by the customer. In order to get around this problem, we are thinking of converting the above mentioned example to some thing like this
/users/{vendorID}/{pluginID}/{artifactID}
- get
- post
A vendor can have multiple products/plugins and each plugin is made of multiple artifacts. So we assume {vendorID}/{pluginID}/{artifactID} is a unique resource. But this has a side effect of having two extra path parameters. Not sure if its the right ways.
Looking for some insights.
Thank you.
Endpoint path can include any number of path parameters. Multiple path parameters are very common when expressing the hierarchy of resources and subresources in an API. For example, GitHub's "Get a branch" endpoint is /repos/{owner}/{repo}/branches/{branch}.
It is perfectly fine, though you can define a function which merges the 3 strings into a single one.
The final addition to our constraint set for REST comes from the
code-on-demand style of Section 3.5.3 (Figure 5-8). REST allows client
functionality to be extended by downloading and executing code in the
form of applets or scripts. This simplifies clients by reducing the
number of features required to be pre-implemented. Allowing features
to be downloaded after deployment improves system extensibility.
However, it also reduces visibility, and thus is only an optional
constraint within REST.
https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
Though if I would be a client developer I would not allow this from security perspective, but it is just as easy to do something like:
userID = base64Encode(JSON.stringify({vendorID, pluginID, artifactID}))
and write into your documentation that userIDs are generated this way or if you distribute userID, then just give the generated one to your users. It can be even not decryptable if you use SHA1 instead of BASE64. So it is not a big deal to generate a unique ID if you have multiple IDs which are unique together. What can be a problem with the upper approach that the JSON object might be unordered, so maybe a JSON array is a better solution or something that certainly keeps order, but not a simple template string like "{vendorID}-{pluginID}-{artifactID}", because that is injectable unlike a serialization method.

How to use RabbitMQ message as a "Rest Api" to find entities?

I'm having a problem to solve in an application. I'll show an example about it.
I have a rabbitmq queue on a system that is responsable to return Orders, called by another systems (the communication among these systems is only throught message). Until then, the only possible Order search was by the order code.
It works well. When I search by order code, I also filter by the order with contracts and deleted (logically). So, if the order has no contracts or it was deleted, the query doesn't return registers.
Now, one of that systems needs to find Orders without contracts and/or deleted.
Basically, I believe I need to build the same logic used in an API rest like this one, but using a queue message:
/api/orders?id=123455?deleted=true&hasContracts=true
Do that it's easy with message. I just need send a message with this format.
{
"code": 123,
"deleted": true,
"hasContract": true
}
Mapping the values for Long and Boolean classes. If the information was null, this filter will be ignored by the query, except the code that's mandatory.
The doubt is: is this makes sense? I didn't find anything about this subject on the Internet. Create a queue for each case is not an option, because it will be hard for us to implement many queues.
It makes sense to me; using RPC with RabbitMQ is like RPC with HTTP/gRPC/..., so you have many options here:
if you need a great flexibility, you can
create your own query language (like in the example above)
use something like GraphQL
if your use cases are limited, you can choose to segregate the API endpoints with several routing keys (REST-over-AMQP).
Hope this helps.

Api naming in microservices design

Let's say that there are two microservices representing the resources orders(/orders) and customers(/customers). My requirement is to get all the orders made by a customer.
Had it been a monolithic application, I would have modeled my uri as /customers/{id}/orders. This would have hit the customers resource and made an in-memory service call to get the corresponding orders.
Now, in case of microservices, this isn't possible. So, is the only way to get the orders is to make a remote service call or is there a better way of doing it?
Can we create another resource with the representation /ordersByCustomers/{customerid}?
You can pass some query parameters as filters (this is the most common way I've seen). For example
/orders?customerId=123
I think that's quite clear, that you want to retrieve all customer orders filtered by customer id. In the same way you can add pagination or other filters.
The important thing to remember is that you want the order resource, so the URL should remain the same. I'm mentioning this, because this has been the most difficult thing for me to change... to think about resources rather than remote calls.
In general you should beware of using endpoint that are more or less similar to the one you suggested:
/ordersByCustomers/{customerid}
Why? Because this is not RESTful in general (even in microservices environment) and make the API difficult to understand and you by the consumers. What if you need orderByWhatever? Will you be introducing new endpoint every single time you need a new set of data? Try to avoid so opinionated endpoints.
What #Augutsto suggested is fully correct. If you're afraid of having a complicated logic in GET request this is the situation where you can break REST rules. I mean introducing:
POST /orders/filter/
Where all the filtering logic will be passed in requests body - so it's easier to carry complicated logic as well.

How to design a RESTful api for slow-generated resources or job status?

I am trying to design a RESTful api for a service that accepts a bunch of parameters and generates a large result. This is my first RESTful project. One tricky part is that the server needs some time (up to a few minutes) to generate the result. My current thought is to use POST to send in all the parameters. The server response can be a job id.
I can then retrieve the result using GET /result/{job_id}. The problem is that the result is not available for the first few minutes. Maybe I can return the resource unavailable at the beginning and the result once it is available. But this feels odd and add some odd logic in the client.
An alternative is to retrieve the job status GET /job_status/{job_id}, where the result might be running/error/done, similar to the http status code, where done status also comes with a result_id. Then I can retrieve it with GET /result/{result_id}.
Either case has some problem with what I have read about GET. In both cases, GET result is not fixed and not cacheable at the beginning while the job is still running. On the other hand, I read somewhere that it is OK to do things like GET /currentWhether or Get /currentTime, which are similar to at least my second approach. So my questions are:
Which one is better? Why?
Should I use GET for such situation?
Or neither one is OK? What would you do?
Thank you very much.
Should I use GET?
For long running operations, here is an approach which tells setting expire or max-age headers to your response properly. Here is the example Best practice for implementing long-running searches with REST
But I recommend The RESTy Long-op Protocol for your case.
Your solution will be more robust and more client friendly.

Marklogic REST API search for latest document version

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.
The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.
The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!