How to sort data based on one of the timestamp field in Druid Scan Query - druid

I'm using Druid scan query with ordering param "ascending". It is returning data based on configured timestamp field called serverReceiveTime. I wanted to sort my data on one of the other timestamp field(streamingSegmentStartTime). As per Scan query documentation, there is no such sort argument we can pass.
ScanDruidQuery.builder()
.dataSource(route.getDataSource())
.intervals(IntervalParser.getIntervals(getSessionsQuery.getStartTime(), getSessionsQuery.getEndTime()))
.filter(filterTranslator.translate(getSessionsQuery.getFilter()))
.order(DRUID_DATA_SORT_ORDER)
.columns(columnList)
.context(new DruidQueryContext(genericQuery.getRequestId()))
.limit(getSessionsQuery.getResultSize())
.offset(NumberUtils.toInt(getSessionsQuery.getNextToken(), 0))
.build();
Please let me know if there is any way to sort this data based on streamingSegmentStartTime at Druid end

Not sure what your query is doing, so this might not help, but you can sort by other columns if you use a group by query.
Take a look at the sortByDimsFirst query context property of the Group By query here: https://druid.apache.org/docs/latest/querying/groupbyquery.html#groupby-v2-configurations
If you set the first dimension of the DimensionSpec to the streamingSegmentStartTime and use sortByDimsFirst set to True, I think you can achieve what you want.

Related

What does the distinct on clause mean in cloud datastore and how does it effect the reads?

This is what the cloud datastore doc says but I'm having a hard time understanding what exactly this means:
A projection query that does not use the distinct on clause is a small operation and counts as only a single entity read for the query itself.
Grouping
Projection queries can use the distinct on clause to ensure that only the first result for each distinct combination of values for the specified properties will be returned. This will return only the first result for entities which have the same values for the properties that are being projected.
Let's say i have a table for questions and i only want to get the question text sorted by the created date would this be counted as a single read and rest as small operations?
If your goal is to just project the date and text fields, you can create a composite index on those two fields. When you query, this is a small operation with all the results as a single read. You are not trying to de-duplicate (so no distinct/on) in this case and so it is a small operation with a single read.

Which sorting method is efficient among sorting using JPA order by or sorting using collection sort() method?

I have a table called 'userProfileEmployment'. It has a start_date col. which stores values in yyyy-mm-dd format. From this I have to fetch the list of employments in Asc order.
Now, I have two approaches for it
1. fetch directly sorted rows from DB through JPA query.
2. fetch the rows as it is from DB, store it in a List and then sort it using collection sort method
I am sharing Code for approach 1:-
List employmentList = userProfileEmploymentRepository.findAllByProfileIdSorted(userProfileId);
Snippet from UserProfileEmploymentRepository.java class:
#Query("select upe from UserProfileEmployment upe where upe.profileId = :profileId and (upe.deleted = 0 or upe.deleted is null) order by upe.startDate")
List<UserProfileEmployment> findAllByProfileIdSorted(#Param("profileId") Long profileId);
Now we get the sorted output from both approaches. So my question is which one of the two is better. Does sorting using the order by clause is more costly or sorting using collection sort method is more better
Sorting in DB is preferable than in-memory(i.e. Collections) in most of the cases.
Firstly because DB's sorting is always optimized to handle large data set and you can be even optimize it better by using indexes.
Secondly, if later you want to start returning paginated data(i.e. slice of whole data) then you can load only those slices by using pageable from DB and save heap memory.

Algolia: Best way to query slave index to get sort by date ranking functionality

I have a data set where I want to dynamically sort by date (both ascending and descending) on the fly. I read through the docs and as instructed I've created a slave index of my master index, where the top ranking value is my 'date' ordered by DESC. The date is in the correct integer and unix timestamp format.
My question is how do I query this new index on the fly using the front end Javascript Algolia API?
Right now, my code looks like the following:
this.client = algoliasearch("xxxx", "xxxxx");
this.index = this.client.initIndex('master_index');
this.index.search(
this.query, {
hitsPerPage: 10,
page: this.pagination,
facets: '*',
facetFilters: facetArray
},
function(error, results) {
// do stuff
}.bind(this));
What I've tried doing is to just change the initIndex to use my slave index instead and this does work...but I'm thinking that this is slow and inefficient if I need to reinitialize the index every time the user just wants to sort by date. Isn't there a parameter instead that I can insert in the query to sort by date?
Also, my second question is that even when I change the index to the slave index, it only sorts by descending. How can I have it sort by ascending as well?
I really do not want to create ANOTHER slave index just to sort by ascending date since I have many thousands of rows and am already close to exceeding my record limit. Surely there must be another way here?
Thanks!
What I've tried doing is to just change the initIndex to use my slave index instead and this does work...but I'm thinking that this is slow and inefficient if I need to reinitialize the index every time the user just wants to sort by date. Isn't there a parameter instead that I can insert in the query to sort by date?
You should store all the indices you want to do sorts in different properties on the this object:
this.indices = {
mostRelevant: this.client.initIndex('master_index'),
desc: this.client.initIndex('slave_desc')
};
Then you can use this.indices.mostRelevant.search() or this.indices.desc.search().
This is not a performance issue to do so.
Also see the dedicated library to create instant-search experiences: https://community.algolia.com/instantsearch.js/
Also, my second question is that even when I change the index to the slave index, it only sorts by descending. How can I have it sort by ascending as well?
I really do not want to create ANOTHER slave index just to sort by ascending date since I have many thousands of rows and am already close to exceeding my record limit. Surely there must be another way here?
This is the only true way to do sorts in Algolia. This is by design what makes Algolia so fast and is currently the only way to do so.

can we use find query inside map in mongo

I need to perform some aggregation on one existing table and then use aggregated table to perform the map reduce.
The aggregation table is sort of a temporary used so that it can be used in map reduce. Record set in temporary table reaches around 8M.
What can be the way to avoid the temporary table?
One way could be to write find() query inside map() function and emit the aggregated result(initially being stored on aggregation table).
However, I am not able to implement this.
Is there a way! Please help.
You can use the "query" parameter on MongoDB MapReduce. With this parameter the data sent to map function is filtered before processing.
More info on MapReduce documentation

mongodb computed field based on another query

I have a mongodb query, and I want to add a computed field. The computed field is based on where or not the item is in the results of another query. So my query returns the columns a,b,c,d, and then column e should be based on whether or not the current row would be matched by another query.
Is there an efficient way to do this in mongo? I'm not really sure how to do this one...
There is no way currently to execute a function as you describe within the database when returning a document via standard functions such as find. It's been requested by the community, but the general request is to operate only on a single document.
There are calculated fields using $project in the aggregation framework. But, they only operate on the current document in the pipeline. So, they can't summarize other queries.
You'll need to likely build your e value as part of your data access layer.