What's the pattern to request for all records using RESTful app? - rest

I would like to know what's the best approach for my RESTful app when requesting all records.
For example, I limit my responses to 10 if no $top value is provided to avoid overload. However, how can I format my request? Which is better, $top=allrows or $top=all? Is there any pattern I should check for?
If $top value is not provided I only return up to 10 rows.
GET /products?$top=
I just want to avoid this:
GET /products/all

There's no official pattern and any choice would depend on the size of your data.
Whatever you do, always put a maximum limit to the number of items you'll return regardless of parameters the client provides in the request.
Also, create a default count to return when no information is provided by parameters.
If you don't have tons of items to return, you count set your default count to be max limit and that be enough to always return all and you could just make the url without any details on specific counts return all.
GET /products (no count/provided)
If you have hundreds or thousands and you have a default count of say 100, maybe use something a explicit count to extend that limit (up to the max of course--if asking for count > max, return a 400 bad request with message indicating count can't be higher than max)
GET /products?count=1000000
However, this could be horrible for your server(s) and/or the client if you keep pushing the max limit higher and higher.
Typically, if you've a lot of records, you chunk it up and use a count and offset to pull it down in byte-sized chunks. Also add meta data to the response object letting the requester know the current position, total records, and offset supplied
A little pseudo-code:
$count = 1000
$offset = 0
While count*offset < total records:
GET /products?count=$count&offset=$offset
$offset = $offset + $count
Assuming one of the requests looks like:
GET /products?count=1000&offset=1000
Then in the response body you'd expect something like:
{
"result": [
{
"id": "123",
"name": "some product",
"link": "/product/123"
},
... many more products ...
{
"id": "465",
"name": "another product",
"link": "/product/465"
}
],
"meta": {
"count": 1000,
"offset": 1000,
"total_count": 3000,
"next_link": "/products?count=1000&offset=2000",
"prev_link": "/products?count=1000&offset=0",
},
"status": 200
}
If you really want gold star you can make your resources adhere to HATEOS ( https://en.wikipedia.org/wiki/HATEOAS ) and include links to the individual resources in the list and maybe in the meta have links to the next and prior chunks of the list if you're walking a large list of items. I've put some example links in the json sample above.

I would do it like that:
For all products GET /products
For certain product GET /products/{id}
For all products with filtering or sorting,
use GET /products but let the client send you Filter object in request body.
There it can specify if he wants certain page in pagination, or filter some records etc.
Filter Object could look like:
{"pageNumber":1,"pageSize":12,"freeText":"","conditions":[],"sortings":{}}
In your service map it to inner service filter and return the requested records.

Related

How can I get the categories a product is in without including grandparents?

I'm trying to get a list of all products from the Magento 2 REST API and the categories the products are in (I want to build a tree-view in another application).
I can query for all SKUs then run through the SKUs one by one, but this takes a long time, I'd rather make 1 API call and get all product/category relationships in one go.
If I query all SKUs at once, I don't get the category_ids attribute which shows which categories the item belongs to.
GET: http://someurl/rest/V1/products?searchCriteria=
// e.g. this is missing when querying for multiple SKUs
{
"attribute_code": "category_ids",
"value": [
"557"
]
},
If I use the category API instead, it gives me products that belong to sub-categories under the current category, which I don't want.
GET: http://someurl/rest/V1/categories/555/products
// e.g. the below aren't leaf-level on category 555, they live in a subcategory
[
{
"sku": "BC000018",
"position": 1,
"category_id": "555"
},
{
"sku": "BC000022",
"position": 1,
"category_id": "555"
},
{
"sku": "BC000023",
"position": 1,
"category_id": "555"
},
// and so on
Is there any way to query for only leaf-level items when inspecting categories, or any way to include detailed attributes when using the products API?
To achieve what you're looking for - you will need to build a custom Product Search / Product Details API, that returns a JSON resultset containing ALL pertinent data-points of interest (E.g: list of all categories a product is assigned to)
My 2 cents:
You might want to consider adding the additional data in the Product Details API Response and not into the "Product Node" of the Product Search's API.
Here's why?
Additional SQL queries may be needed to source the extra information you're looking to add to the Product JSON.
E.g: The category - product mappings are stored in the catalog_category_product table and will require atleast 1 extra SQL query.
Depending on your catalog size and/or category definitions, this additional query could add (best case scenario) 0.01 seconds - 0.5 seconds (in the case of complex/unoptimized table joints) per row to process.
If you have a catalog of 100 products - that meanss an extra 1 - 5 seconds to execute the Product Search API.
.
.
.
If you have a catalog of 1000 products - that means an extra 10 - 50 seconds to execute the Product Search API.
.
.
.
If you have a catalog of 10,000 products - that means an extra 100 - 500 seconds to execute the Product Search API.
.
.
.
You get the picture! :)
Good Luck!

Is it possible to process objects in a Google Cloud Storage bucket in FIFO order?

In my web app, I need to pull objects from gcs one by one and process them.
So the question is,
"How do I send a request to gcs to get the next unprocessed object?"
What I’d like to do is to simply rely on the sort order provided by gcs and then just process the objects in this sorted list one by one.
That way, I only need to keep track of the last processed item in my app.
I’d like to rely on the sort order provided by the timeCreated timestamp on each individual object in the bucket.
When I query my bucket via the JSON API, I notice that the objects are returned sorted by timeCreated from oldest to newest.
For example, this query ...
returns this list ...
{
"items": [
{
"name": "cars_train/00001.jpg",
"timeCreated": "2016-03-23T19:19:47.506Z"
},
{
"name": "cars_train/00002.jpg",
"timeCreated": "2016-03-23T19:19:49.320Z"
},
{
"name": "cars_train/00003.jpg",
"timeCreated": "2016-03-23T19:19:50.228Z"
},
{
"name": "cars_train/00004.jpg",
"timeCreated": "2016-03-23T19:19:51.377Z"
},
{
"name": "cars_train/00005.jpg",
"timeCreated": "2016-03-23T19:19:51.778Z"
},
{
"name": "cars_train/00006.jpg",
"timeCreated": "2016-03-23T19:19:52.817Z"
},
{
"name": "cars_train/00007.jpg",
"timeCreated": "2016-03-23T19:19:53.868Z"
},
{
"name": "cars_train/00008.jpg",
"timeCreated": "2016-03-23T19:19:54.925Z"
},
{
"name": "cars_train/00009.jpg",
"timeCreated": "2016-03-23T19:19:58.426Z"
},
{
"name": "cars_train/00010.jpg",
"timeCreated": "2016-03-23T19:19:59.323Z"
}
]
}
This sort order by timeCreated is exactly what I need, though I’m not certain if I can rely on this always being true?
So, I could code my app to process this list by simply searching for the first timeCreated value greater than the last object that processed.
The problem is this list can be very large and searching through a huge list every single time the user presses the NEXT button is too computationally expensive.
I would like to be able to specify in my query to gcs to filter the list so that I return only the single item that I need.
The API does allow me to set the maxResults returned to a value of 1.
However, I do not see an option that would allow me to return only objects whose timeCreated value is greater than the value I specified.
I think what I am trying to do is probably fairly common, so I’m guessing that a solution may exist for this problem.
One work around for this problem is to physically move an object that has been processed to another bucket.
That way the first item in the list would always be the newest one and I could simply send the request with maxCount=1.
But this adds complexity because it forces me have have 2 separate buckets for every project instead of 1.
Is there a way to filter this list of objects to only include ones whose timeCreated date is above a specified value?
In MySQL, it might be something like ...
SELECT name
FROM bucket
WHERE timeCreated > X
ORDER BY timeCreated
LIMIT 1
You can configure object change notifications on the bucket, and get a notification each time a new object arrives. That would allow you to process new objects without scanning a long listing each time. It also avoids the problem that listing a bucket is only eventually consistent (so, recently uploaded objects may not show up immediately when you list objects; I don't know if that's a problem for your app).
Details about object change notification are documented at https://cloud.google.com/storage/docs/object-change-notification.
Object listing in GCS is not sorted by timeCreated. Object listing results are always in alphabetical order. In your example, those two things merely happen to catch up.
If you want to get a list of objects in the order they were uploaded, you must ensure that each object has a name alphabetically later than the name of any object uploaded before it. Even then, however, you must take care, as object listing is eventually consistent, which means that objects you upload may not immediately show up in a listing.
If some ordering of objects is critically important, it would be a good idea to maintain a separate index of the objects and their timestamps in a separate data structure, perhaps populated via object change notifications as Mike suggested.

How to create an index in MongoDB which calls a JS function via system.js?

I have two collections viz. whitelist (id, count, expiry) and blacklist (id).
Now i would like to create an index such that when count>=200 then call a JS function which will remove the document from whitelist and add the id to blacklist.
So can i do this in Mongo using db.collection.createindex({"count":1}, ???);
or do i need to write a daemon to scan the entire collection? or is there any better method for the same?
You seem to be asking for what in a SQL relational database we would call a "trigger", which is something completely different from an "index" even in that world.
In the NoSQL world typically and especially with MongoDB, that sort of "server logic" is relegated to the "client" code operations rather than the server. Think of it as another part of the "scalability" philosphy of these products, where certain functions like "triggers" are taken away due to the stance that these "cost" a lot with distributed data.
So in order to do what you want you do it in "code" instead of defining a database "trigger". The process is simple enough, via .findAndModify() and other wrapping variants available to langauge API's:
// Increment below 200 and return the modified document
var doc = db.whitelist.findAndModify({
"query": { "_id": myId, "count": { "$lt": 200 } }
"update": { "count": { "$inc": 1 } },
"new": true
});
// Then remove the blacklist where the value meets conditions
if ( doc.hasOwnProperty("count") {
if ( doc.count >= 200 )
db.blacklist.remove({ "_id": myId });
}
Be careful with the actual language API method variant as the structure typically differs fromt the "query/update" keys as is provided in the shell method.
The basic principles remain the same. Modifiy and fetch, then remove from the other collection if your conditions are met. But it is "two" trips to the server, and there is no way to make the server "trigger" when such a condition is met by itself.
db.whitelist.insert(doc);
if(db.whitelist.find(criterion).count() >= 200) {
var bulkRemove = db.whitelist.initializeUnorderedBulkOp();
var bulkInsert = db.blacklist.initializeUnorderedBulkOp();
db.whitelist.find(criterion).forEach(
function(doc){
bulkInsert.insert({_id:doc._id});
bulkRemove.find({doc._id}).removeOne();
}
);
bulkInsert.execute();
bulkRemove.execute();
}
First, you insert the document as usual. Since criterion is going to use an index, the if clause should be determined fast and efficiently.
In case we have 200 or more documents matching that criterion, we use bulk operations to insert the ids into the blacklist and remove the documents from the whitelist, which will be executed in parallel.
The problem with only writing the _id to the backlist is that you need to check wether the criterion for being blacklisted is matched, so the _id needs to contain that criterion.
A better solution IMHO is to flag entries of a single collection using a field named blacklisted for individual entries or to use the aggregation framework to find blacklisted documents and write them to an a collection using the out pipeline stage. Sadly, you didn't give example data or a proper description of your use case, so you get an unspecified answer.

Given a list of unique discount codes, how to send each one only once

I'm wondering the best way to solve this problem: given a pre-generated list of 500 unique discount codes for an e-commerce site, how do I ensure that each of the first 500 users that receive a discount code each receive a unique one? The e-commerce site would be making an asynchronous request to a separate server with the list of discount codes stored in its database. It's this server's job to make sure that it sends back each discount code only once, in chronological order as requests are received.
As this seems like a rather primitive problem, I wonder if there is a clever and elegant way to do this with a relatively low level of effort.
A simple way is to have a collection of your codes and remove items as you select them. Here is a simple example with .findAndModify().
A basic collection example:
db.codes.insert([
{ "a": 1 },
{ "a": 2 },
{ "a": 3 }
])
Issue a .findAndModify():
db.codes.findAndModify({
"query": {},
"remove": true,
"new": false
})
Returns:
{ "_id" : ObjectId("550caf3f7d9c3dc0eab83334"), "a" : 1 }
And the new state of the collection is:
{ "a": 2 }
{ "a": 3 }
So as the document is retrieved it is removed from the collection preventing further selection. Since .findAndModify() is an atomic operation, no other request can see the same document and every request will get it's own unique response.
If your DB has atomic transactions, this is no problem. Just make a table discount with 2 fields, code (varchar wide enough to hold the code) and used (boolean), indexed by used and then by code. Initially INSERT 500 rows, each with used = false of course. Whenever a request comes, just SELECT min(code) FROM discount FOR UPDATE WHERE NOT used, and then UPDATE discount SET used = true WHERE NOT used AND code = <that code>, all inside a single DB transaction. (The NOT used part of the update is not necessary for correctness, but may speed things up by enabling the index to be used.)
If contention is a problem (and I don't see how it could be for 500 requests, but maybe it somehow could be), then add an integer id field containing a unique integer between 1 and 500 to the table. Then on each request, pick a random number r between 1 and 500, and SELECT min(code) FROM discount FOR UPDATE WHERE NOT used AND (id >= <r> OR id + 500 >= <r>). The condition in parentheses ensures that the search will "wrap around" to lower-numbered discounts if (and only if) all discounts >= r have already been taken.

How to find ID for existing Fi-Ware sensors

I'm working with Fi-Ware and I would like to include existing information from smartcities on my project. Clicking on the link below I could find information about how is the ID pattern and type of different device (for example OUTSMART.NODE.).
https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/Publish/Subscribe_Broker_-_Orion_Context_Broker_-_User_and_Programmers_Guide#Sample_code
However, I don't know the after that pattern
I've tried random numbers (OUTSMART.NODE.1 or OUTSMART.NODE.0001).
Is there some kind of list or somewhere to find that information??
Thank you!
In order to know the particular entity IDs for a given type, you can use a "discovery" query on the type associated to the sensor with the .* global pattern. E.g., in order to get the IDs associated to type "santander:traffic" you could use:
{
"entities": [
{
"type": "santander:traffic",
"isPattern": "true",
"id": ".*"
}
],
"attributes" : [
"TimeInstant"
]
}
Using "TimeInstant" in the "attributes" field is not strictly needed. You can leave "attribute" empty, in order to get all the attributes from each sensor. However, if you are insterested only in the IDs, "TimeInstant" would suffice and you will save length in the JSON response (the respone of the above query is around 17KB, while if you use an empty "attributes" field, the response will be around 48KB).
EDIT: since the update to Orion 0.14.0 in orion.lab.fi-ware.org on July 2nd, 2014 the NGSI API implements pagiation. The default limit is 20 entities so if you want to get all them, you will need to implement pagination in your cliente, using limit and details URI parameters. Have a look to the pagination section in the user manual for details.