IBM Cloudant DB - get historical data - best way? - nosql

I'm pretty confused concerning this hip thing called NoSQL, especially CloudantDB by Bluemix. As you know, this DB doesn't store the values chronologically. It's the programmer's task to sort the entries in case he wants the data to.. well.. be sorted.
What I try to achive is to simply get the last let's say 100 values a sensor has sent to Watson IoT (which saves everything in the connected CloudantDB) in an ORDERED way. In the end it would be nice to show them in a D3.css style kind of graph but that's another task. I first need the values in an ordered array.
What I tried so far: I used curl to get the data via PHP from https://averylongID-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_all_docs?limit=20&include_docs=true';
What I get is an unsorted array of 20 row entries with random timestamps. The last 20 entries in the DB. But not in terms of timestamps.
My question is now: Do you know of a way to get the "last" 20 entries? Sorted by timestamp? I did a POST request with a JSON string where I wanted the data to be sorted by the timestamp, but that doesn't work, maybe because of the ISO timestamp string.
Do I really have to write a javascript or PHP script to get ALL the database entries and then look for the 20 or 100 last entries by parsing the timestamp, sorting the array again and then get the (now really) last entries? I can't believe that.
Many thanks in advance!

I finally found out how to get the data in a nice ordered way. The key is to use the _design api together with the _view api.
So a curl request with the following URL / attributes and a query string did the job:
https://alphanumerical_something-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_design/iotp/_view/by-date?limit=120&q=name:%27timestamp%27
The curl result gets me the first (in terms of time) 120 entries. I just have to find out how to get the last entries, but that's already a pretty good result. I can now pass the data on to a nice JS chart and display it.

One option may be to include the timestamp as part of the ID. The _all_docs query returns documents in order by id.
If that approach does not work for you, you could look at creating a secondary index based on the timestamp field. One type of index is Cloudant Query:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#query
Cloudant query allows you to specify a sort argument:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#sort-syntax
Another approach that may be useful for you is the _changes api:
https://console.bluemix.net/docs/services/Cloudant/api/database.html#get-changes
The changes API allows you to receive a continuous feed of changes in your database. You could feed these changes into a D3 chart for example.

Related

Firestore pagination by offset

I would like to create two queries, with pagination option. On the first one I would like to get the first ten records and the second one I would like to get the other all records:
.startAt(0)
.limit(10)
.startAt(9)
.limit(null)
Can anyone confirm that above code is correct for both condition?
Firestore does not support index or offset based pagination. Your query will not work with these values.
Please read the documentation on pagination carefully. Pagination requires that you provide a document reference (or field values in that document) that defines the next page to query. This means that your pagination will typically start at the beginning of the query results, then progress through them using the last document you see in the prior page.
From CollectionReference:
offset(offset) → {Query}
Specifies the offset of the returned results.
As Doug mentioned, Firestore does not support Index/offset - BUT you can get similar effects using combinations of what it does support.
Firestore has it's own internal sort order (usually the document.id), but any query can be sorted .orderBy(), and the first document will be relative to that sorting - only an orderBy() query has a real concept of a "0" position.
Firestore also allows you to limit the number of documents returned .limit(n)
.endAt(), .endBefore(), .startAt(), .startBefore() all need either an object of the same fields as the orderBy, or a DocumentSnapshot - NOT an index
what I would do is create a Query:
const MyOrderedQuery = FirebaseInstance.collection().orderBy()
Then first execute
MyOrderedQuery.limit(n).get()
or
MyOrderedQuery.limit(n).get().onSnapshot()
which will return one way or the other a QuerySnapshot, which will contain an array of the DocumentSnapshots. Let's save that array
let ArrayOfDocumentSnapshots = QuerySnapshot.docs;
Warning Will Robinson! javascript settings is usually by reference,
and even with spread operator pretty shallow - make sure your code actually
copies the full deep structure or that the reference is kept around!
Then to get the "rest" of the documents as you ask above, I would do:
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).get()
or
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).onSnapshot()
which will start AFTER the last returned document snapshot of the FIRST query. Note the re-use of the MyOrderedQuery
You can get something like a "pagination" by saving the ordered Query as above, then repeatedly use the returned Snapshot and the original query
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).limit(n).get() // page forward
MyOrderedQuery.endBefore(ArrayOfDocumentSnapshots[0]).limit(n).get() // page back
This does make your state management more complex - you have to hold onto the ordered Query, and the last returned QuerySnapshot - but hey, now you're paginating.
BIG NOTE
This is not terribly efficient - setting up a listener is fairly "expensive" for Firestore, so you don't want to do it often. Depending on your document size(s), you may want to "listen" to larger sections of your collections, and handle more of the paging locally (Redux or whatever) - Firestore Documentation indicates you want your listeners around at least 30 seconds for efficiency. For some applications, even pages of 10 can be efficient; for others you may need 500 or more stored locally and paged in smaller chucks.

Structuring MongoDb Collections for a tracking system

I need to build a system that will track our users in all of our websites.
Each new user coming to our website will be getting an ID that will be stored in a cookie.
On every activity in the site, we would like to save the relevant data.
for example, when a user register, we will expose an api for adding the activity to the database. later, we will make reporting back-end on the data.
We havn't yet decided on the technology, but we assume we will go for nodejs + express + mongoose.
We believe that the first collection (see bellow) will have around 6 million rows in a month. the other collections might have half of that.
I dont know if the following data structure will work good in mongodb.
SessionCollection
Id mongo ObjectId - generated, will be the cookie Id eventually.
Referer - string (length of full query string uri)
LandingUrl - string (length of full query string uri)
DateTime
Params - KeyValue data, its the parsed data from LandingUrl, suppose to be a nested json tree.
if the LandingUrl was http://s.com?a=1&b=2&c=3 so the params will be :
params : {a:'1',b:'2',c:'3'}
ActivityCollection
Id mongo ObjectId
SessionId - "forein key" to SessionCollection
ActivityType - Short free string
DateTime
ActivityData - free KeyValue data (similar to the explanation above).
Both of the collection will be searchable in all fields, when I say all i mean all.
Is this good structure for mongo?
Do you recognize a bad pattern here?
Do you have suggestions to make it better?
Can a full url be indexed in mongodb?
thanks
I will answer #4 since it is an interesting question with not an obvious answer.
Can a full url be indexed in mongodb?
the answer is most of the times but not all of the time.
Explanation: A URL cannot always get indexed in MongoDB since MongoDB has a limit on the length of the index (1024 bytes). If the length is more than that then it will not get indexed or may get an error (depends on the version and the case). A full URL may exceed this limit (since at least 2000 chars are supported in almost all browsers). If you have the possibility of such long URLs a solution would be to use a hash approach for the index.
For more info on MondoDB limits and how it handles indexing for >1024 bytes (behavior has significantly changed from 2.6 and onward) see https://docs.mongodb.org/manual/reference/limits/
For URL lengths see What is the maximum length of a URL in different browsers?

Can response data from core reporting api be grouped?

Explanation:
I am able to query the Google Core reporting APIv3 using the client library to get data on pageviews for specific URLs of a website I am working on. I want to get data(pageviews) for each day within a specified range. So far I am simply looping through the range, sending individual request to the API. in each request I am setting the same value for the start date and the end date.
Problem:
Obviously this gets the job done, BUT it is certainly not the best way to go about it. Because, assumming I want to get data for the past 3 months for each of about 2000 URIs. Then I will need 360000 number of requests and that value is well over the limit quota defined by Google.
Potential solution: So one way I thought of solving this issue is probably to send a request setting start-date and end-date to be a week apart but the API will return a sum of the values rather than the individual values.
main question: So is there a way to insist that these values should not be added up and returned as a sum but rather returned (as associative array or something like that) separately for each.
I hope the question is clear and that there is a solution! Thank you!
Very straightforward:
Metric: ga:pageview, Dimension: ga:date, Set a filter for your pagepath, and set a start-date and end-date.
Example:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Axxyyzz&dimensions=ga%3Adate&metrics=ga%3Apageviews&filters=ga%3Apagepath%3D%3D%2Ffaq.html&start-date=2013-06-27&end-date=2013-07-11&max-results=50
This will return the pageviews for that the faq.html& page for each day in the time-frame.
You should check out the QueryExplorer. Great tool to find out how to structure queries.

RESTful URI for Filtering Nested Collections

I'm looking for a good way to form a URI for a resource that filters on a collection of values contained within records. For example, say you have a recipe database and you want to search for recipes that contain "cherry" (obviously most recipes would contain multiple ingredients).
If I just want to filter on single values, I could do something similar to the following:
/recipe/search/?name=Spaghetti
But what about filtering on multiple values? I was considering something like the following:
/recipe/search/?ingredients=contains=cherry
Any thoughts on this? Is there a "standard" for a filter of this kind?
Update: One problem I have with my idea is the way it gets handled on the backend (in my case Rails). When querying the server with this particular format, Rails generates a Ruby hash that could get ugly like the following:
{"ingredients"=>"contains=cherry",
"action"=>"search",
"controller"=>"recipe"}
Your URI
First of all, in contrast to other answers I'll start from a REST perspective and then find appropriate additions to it. I am not strong in Ruby so bear with no-code on the backend.
Recipies are the entities you wanna present
your users find them at /recipies
HTTP has QUERYS for filtering
wanna have sorted those recipies by date? use /recipies/?sortby=date&sort=asc
Only recipies with cherries goes similarly: /recipies/?ingredients=cherry
So that's the REST way of structuring your basic URL.
For multiple matches there is no official way to do it, but i'd follow user1758003. This is an intuitive construction of the url and easily to parse on the backend, so we have /recipies/?ingredients=cherry,chocolate
Don't forget /recipies/search is not restful because it mirrors recipies and does not represent an independent entity. However it is a great place to host a searchform for visitors to your site.
Now there are some additional questions packed into the first, let me address them one by one:
I have a website consuming this api, how should the search form look like?
Give your visitors a /recipie/search page or a quick filtering possibility on /recipies.
Just set the <form action="/recipies" type="GET">. The browser will add all parameters as an Query string after /recipies.
Advanced functionality
A request to /recipies should list all. If you add a query every parameter of the query must be respected. so /recipies/?ingredients=cheese MUST only return cheesy recipies.
For multiply query parameter values there is no fixed standard but I'd like a service to be intuitive.
I read GET /recipies/?ingredients=cherry,chocolate as Get me the recipies which have ingredients of cherry and chocolate. To get a list of recipies containing either cherry or chocolate I'd want to write the URI like /recipies/?ingredients=cherry|chocolate which makes it visually very different from a comma and has a predefined meaning (OR) in programming contexts.
I'm not familiar with the specifics of ruby hashes but I'm guessing the hash is created to uniquely identified the query both at the http and data access levels?
Regardless, you want to be careful about URL encoding if you wish to use json in a query parameter. Another thing to keep in mind is the term "search" could be considered repetitive. If your server is being accessed using a GET method and you have criteria then hopefully you're not modifying any state in the back end. Not your question but just thought I'd mention it.
So...
https://yourserver.com/approot/recipe/search?ingredients=cherry,cheese&type=cake
HTTP doesn't define commas as a query string separator so your framework should be able to parse 2 query string parameters:
ingredients: "cherry,cheese"
which you should be able to easily covert to an an array using split or whichever equivalent function ruby provides.
and
type: "cake" (extra query term added to illustrate a point and because cherry cheese cake is awesome and cake is not an ingredient)
If I understand your example correctly you would end up with:
{
"ingredients"=>"cherry,cheese",
"type"=>"cake",
"action"=>"search",
"controller"=>"recipe"
}
Is this what you where looking for?
Most of the REST webservice using JSON data only.So use JSON format which will return single string value only. From this JSON format you can send the array value also.
For array value means you to convert that array into the JSON format like this
from php:
$ingredients = array('contains'=>array('fruits'=>'cherry,apple','vitamins'=>'a,d,e,k'));
$ingredients_json = json_encode($ingredients);
echo $ingredients_json;
it will return json format like this:
{"contains":{"fruits":"cherry,apple","vitamins":"a,d,e,k"}}
and you can use this JSON string in the url
/recipe/search/?ingredients={"contains":{"fruits":"cherry,apple","vitamins":"a,d,e,k"}}
in the server side we have the option to decode this JSON format value to the array.
{"ingredients"=>"{\"contains\":{\"fruits\":\"cherry,apple\",\"vitamins\":\"a,d,e,k\"}}",
"action"=>"search",
"controller"=>"recipe"}

REST parameters vs URI

I'm just learning REST and trying to figure out how to apply it in practice. I have a sampling of data that I want to query, but I'm not sure how the URLs are meant to be formed, i.e. where I put the query. For example, for querying the most recent 100 data records:
GET http://data.com/data/latest/100
GET http://data.com/data?amount=100
which of the previous two queries is the better, and why? And the same for the following:
GET http://data.com/data/latest-days/2
GET http://data.com/data?days=2
GET http://data.com/data?fromDate=01-01-2000
Thanks in advance.
Personally, I would use the query string format in this case. If your /data path is returning all of the data, and you would like to perform this type of query, I believe it makes the most sense. You could also pass query string parameters such as ?since=01-01-2000 to get entries after a specified date or pass column names such as ?category=clothing to retrieve all entries with category equaling clothing.
Additionally, you would want paths such as /data/{id} to be available to retrieve certain entries given their unique id.
It really depends on a lot of things. If you're using any sort of MVC framework, you'd use the URI segments to define your get request to your API which I personally prefer.
It's not a big deal either way, it's all based on preference and how predictable you want the URL to be to your user. In some cases, I'd say go with the REST parameters, but more often than not a URI based GET is quite clean if your setup supports it.