How can I use Buckets with AEM QueryBuilder - aem

I've implemented a custom PredicateEvaluator to create facets with buckets, based on examples like this one
It seems to be working. I can see the count in my buckets changing depending on the criteria.
But what comes next? The query returns a SearchResult, which has a Hit collection and a Facet collection (which in turn has a Bucket collection), but I don't see a way to combine these sets - I can't find a Hit.getBuckets() or a Buckets.getHits() method. I assumed that I'd be able to use buckets to categorize my search results, but I don't see any way to associate a Hit with a Bucket. How do I figure out which hits fall in which buckets, or is this just not their purpose?

A Bucket can be used to refine a Query.
SearchResult unrefinedResult = query.getResults();
// get a bucket from the unrefined result - in real code you'd iterate.
Bucket bucket = unrefinedResult .getFacets().get("my_facet").getBuckets().get(0);
SearchResult refinedResult = query.refine(bucket).getResults();
This approach requires you to execute another search, which is possibly more overhead than you want to incur.

Related

Firestore pagination by offset

I would like to create two queries, with pagination option. On the first one I would like to get the first ten records and the second one I would like to get the other all records:
.startAt(0)
.limit(10)
.startAt(9)
.limit(null)
Can anyone confirm that above code is correct for both condition?
Firestore does not support index or offset based pagination. Your query will not work with these values.
Please read the documentation on pagination carefully. Pagination requires that you provide a document reference (or field values in that document) that defines the next page to query. This means that your pagination will typically start at the beginning of the query results, then progress through them using the last document you see in the prior page.
From CollectionReference:
offset(offset) → {Query}
Specifies the offset of the returned results.
As Doug mentioned, Firestore does not support Index/offset - BUT you can get similar effects using combinations of what it does support.
Firestore has it's own internal sort order (usually the document.id), but any query can be sorted .orderBy(), and the first document will be relative to that sorting - only an orderBy() query has a real concept of a "0" position.
Firestore also allows you to limit the number of documents returned .limit(n)
.endAt(), .endBefore(), .startAt(), .startBefore() all need either an object of the same fields as the orderBy, or a DocumentSnapshot - NOT an index
what I would do is create a Query:
const MyOrderedQuery = FirebaseInstance.collection().orderBy()
Then first execute
MyOrderedQuery.limit(n).get()
or
MyOrderedQuery.limit(n).get().onSnapshot()
which will return one way or the other a QuerySnapshot, which will contain an array of the DocumentSnapshots. Let's save that array
let ArrayOfDocumentSnapshots = QuerySnapshot.docs;
Warning Will Robinson! javascript settings is usually by reference,
and even with spread operator pretty shallow - make sure your code actually
copies the full deep structure or that the reference is kept around!
Then to get the "rest" of the documents as you ask above, I would do:
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).get()
or
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).onSnapshot()
which will start AFTER the last returned document snapshot of the FIRST query. Note the re-use of the MyOrderedQuery
You can get something like a "pagination" by saving the ordered Query as above, then repeatedly use the returned Snapshot and the original query
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).limit(n).get() // page forward
MyOrderedQuery.endBefore(ArrayOfDocumentSnapshots[0]).limit(n).get() // page back
This does make your state management more complex - you have to hold onto the ordered Query, and the last returned QuerySnapshot - but hey, now you're paginating.
BIG NOTE
This is not terribly efficient - setting up a listener is fairly "expensive" for Firestore, so you don't want to do it often. Depending on your document size(s), you may want to "listen" to larger sections of your collections, and handle more of the paging locally (Redux or whatever) - Firestore Documentation indicates you want your listeners around at least 30 seconds for efficiency. For some applications, even pages of 10 can be efficient; for others you may need 500 or more stored locally and paged in smaller chucks.

IBM Cloudant DB - get historical data - best way?

I'm pretty confused concerning this hip thing called NoSQL, especially CloudantDB by Bluemix. As you know, this DB doesn't store the values chronologically. It's the programmer's task to sort the entries in case he wants the data to.. well.. be sorted.
What I try to achive is to simply get the last let's say 100 values a sensor has sent to Watson IoT (which saves everything in the connected CloudantDB) in an ORDERED way. In the end it would be nice to show them in a D3.css style kind of graph but that's another task. I first need the values in an ordered array.
What I tried so far: I used curl to get the data via PHP from https://averylongID-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_all_docs?limit=20&include_docs=true';
What I get is an unsorted array of 20 row entries with random timestamps. The last 20 entries in the DB. But not in terms of timestamps.
My question is now: Do you know of a way to get the "last" 20 entries? Sorted by timestamp? I did a POST request with a JSON string where I wanted the data to be sorted by the timestamp, but that doesn't work, maybe because of the ISO timestamp string.
Do I really have to write a javascript or PHP script to get ALL the database entries and then look for the 20 or 100 last entries by parsing the timestamp, sorting the array again and then get the (now really) last entries? I can't believe that.
Many thanks in advance!
I finally found out how to get the data in a nice ordered way. The key is to use the _design api together with the _view api.
So a curl request with the following URL / attributes and a query string did the job:
https://alphanumerical_something-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_design/iotp/_view/by-date?limit=120&q=name:%27timestamp%27
The curl result gets me the first (in terms of time) 120 entries. I just have to find out how to get the last entries, but that's already a pretty good result. I can now pass the data on to a nice JS chart and display it.
One option may be to include the timestamp as part of the ID. The _all_docs query returns documents in order by id.
If that approach does not work for you, you could look at creating a secondary index based on the timestamp field. One type of index is Cloudant Query:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#query
Cloudant query allows you to specify a sort argument:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#sort-syntax
Another approach that may be useful for you is the _changes api:
https://console.bluemix.net/docs/services/Cloudant/api/database.html#get-changes
The changes API allows you to receive a continuous feed of changes in your database. You could feed these changes into a D3 chart for example.

REST parameters vs URI

I'm just learning REST and trying to figure out how to apply it in practice. I have a sampling of data that I want to query, but I'm not sure how the URLs are meant to be formed, i.e. where I put the query. For example, for querying the most recent 100 data records:
GET http://data.com/data/latest/100
GET http://data.com/data?amount=100
which of the previous two queries is the better, and why? And the same for the following:
GET http://data.com/data/latest-days/2
GET http://data.com/data?days=2
GET http://data.com/data?fromDate=01-01-2000
Thanks in advance.
Personally, I would use the query string format in this case. If your /data path is returning all of the data, and you would like to perform this type of query, I believe it makes the most sense. You could also pass query string parameters such as ?since=01-01-2000 to get entries after a specified date or pass column names such as ?category=clothing to retrieve all entries with category equaling clothing.
Additionally, you would want paths such as /data/{id} to be available to retrieve certain entries given their unique id.
It really depends on a lot of things. If you're using any sort of MVC framework, you'd use the URI segments to define your get request to your API which I personally prefer.
It's not a big deal either way, it's all based on preference and how predictable you want the URL to be to your user. In some cases, I'd say go with the REST parameters, but more often than not a URI based GET is quite clean if your setup supports it.

MapReduce on child objects not embedded

I am having a problem with creating a mapreduce algorithm that will get me the stats i need. I have a user object that can create a post and a post can have many likes by other users.
User
--Post
----Likes
The Post is not embedded in the user because we access posts separately and not just in a user context. The stat I need is the number of likes an author has gotten and i need to get this through the likes of the posts of a user. The problem is that because the posts are not embedded, I cannot access them in my map function. Here are the map and reduce functions I currently have
def reputation_map
<<-MAP
function() {
var posts = db.posts.find({user_id:this._id});
emit(this._id, {posts:posts});
}
MAP
end
def reputation_reduce
<<-REDUCE
function(key, values) {
var count = 0;
while(values.hasNext()){
values.next();
count+=1;
}
return {posts:count};
}
REDUCE
end
This should only return the posts for each user so I have not even gotten to the likes level yet but instead of a count, this only returns a dbquery for posts. What is the correct way of doing this?
Map Reduce is really designed to operate on a single collection at a time.
Technically, it is possible to query a separate collection from inside a Map function as you have done, but take caution as this is not recommended nor supported. you may run into issues, especially if the collection is sharded.
A similar question was asked a while back: How to call to mongodb inside my map/reduce functions? Is it a good practice?
If you are aggregating results from multiple collections, you may find that the safest and most straight-forward way to do it is in the application.
Alternatively, if likes per author is a value that will be searched for with some frequency, it may be preferable to include it as a value in each document, and spend a little more overhead on each update to increment this value, rather than periodically performing a potentially resource-heavy calculation of all the votes per author.
Hopefully this will give you some food for thought for retrieving the values that you need to.
If you would like some assistance writing a Map Reduce operation for a single collection, the Community is here to help. Please include a sample input document, and a description of the desired output.
For more information on Map Reduce, the documentation may be found here:
http://www.mongodb.org/display/DOCS/MapReduce
Additionally, there are some good Map Reduce examples in the MongoDB Cookbook:
http://cookbook.mongodb.org/
The "Extras" section of the cookbook article "Finding Max And Min Values with Versioned Documents" http://cookbook.mongodb.org/patterns/finding_max_and_min/ contains a good step-by-step walkthrough of a Map Reduce operation, explaining how the functions are executed.

Multiple tags / folders in Google Reader

I want to be able to grab data from multiple tags / folders in a users Google Reader.
I know how to do one http://www.google.com/reader/atom/user/-/label/SOMELABEL but how would you do two or three or ten?
Doesn't look like you can get multiple tags/folders in one request. If it's feasible you should iterate over the different tags/folders and aggregate them in your application.
[edit]
Since it looks like you have a large list of tags/folders you need to query, an alternative is to get the full list of entries, then sort out the ones the user wants. It looks like each entry has a category element that will tell you what tag is associated with it. This might be feasible in your case.
(Source: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI)
(Source: http://www.google.com/reader/atom/user/-/state/com.google/starred)
I think you cannot get aggregated data as you hope to be able to. If you think about it, even Google lets you browse folders or tags one at a time, and do not aggregate a sub-set of them.
You can choose to have a list of all the items (for each one of their available statuses) or a list of a particular tag/folder.
You could do it in 2 requests. First you need to perform a GET request to http://www.google.com/reader/stream/items/ids. It supports several parameters like
s (required parameter; stream id to fetch; may be defined more than one time),
n (required; number of items to fetch)
r for ranking (optional)
and others (see more under /ids section)
And then you should perform a POST request (this is because there could be a lot of ids, and therefore the request could be cut off) to http://www.google.com/reader/api/0/stream/items/contents. The required parameter is i which holds the feed item identifier (could be defined more than once).
This should return data from several feeds (as returned for me).