Hazelcast- Distributed query aggregations with group by support - distributed-computing

We need to query IMDG example using Hazelcast 3.8-EA version
select sum(salary),sum(bonus),dept from Employee where birthYear > 1989 group by dept
where clause :: SqlPredicate("birthYear > 1989")
Aggregation::
Using Aggregators.doubleSum("salary") , Aggregators.doubleSum("bonus") on Employee Map
Or by extending AbstractAggregator
Question is how to handle multiple aggregation using built-in aggregations and how to handle group by clause?

There's no official group by support yet but what you could do is to create your own SumWithGroupBy aggregation that sums the salary and bonus per group in the way you want it to be grouped.
You can have a look at the Aggregators.doubleSum code to see how the aggregation can be implemented.
It's a bit of manual coding but it will be just a couple of lines of custom logic.

Related

Mongo find all documents from a particular month

I have a collection with documents containing a date.
Is there a way to find all documents from a particular month (all years), using db.myCollection.find()?
I managed to achieve approximately what I need by using the aggregation pipeline and the $month operator, but I couldn't find anything similar among the 'query and projection' operators.
I do not need to group the documents (I only need to filter them).
I am modifying some code that creates filters dynamically based on the user selection and the code uses find with the generated filters.
I would like to be able to give a user the ability to see only entries from particular months (say, only entries for the summer season).
Finally is there a performance difference between using a filter with find and using the equivalent filter in a $match stage with the aggregation pipeline ?

Query performance issues when using MATCH and SELECT together OrientDB

I'm facing a problem with a query in OrientDB.
SELECT FROM (
MATCH
{class: article, as: article}.in('authorOf'){as: author}
RETURN article, author
) ORDER BY createdAt desc SKIP 0 LIMIT 50
As you see I want to fetch the last 50 most recent articles with their corresponding author. The problem I'm facing is that the subquery first iterates over all my articles then passes it down to the parent and then it gets filtered. This is obviously not very effective, because all the articles are loaded into memory when I only need 50 of them.
Does anyone know a better approach without having to use multiple queries.
you could try with
select #rid as article,in('authorOf')[0] as author from article order by createdAt desc SKIP 0 LIMIT 50
With this one I'm getting a sligthly better performances, but nothing extreme.
EDIT following Luigi's comment
Create an index on createdAt property:
CREATE INDEX article.createdAt ON article (createdAt) NOTUNIQUE
PS
I'm not sure that the order by in your query is working well

Doctrine ODM Distinct With Limit

I have the following problem: I can't limit number of results when using distinct. Exemple :
$stores = $this->dm->createQueryBuilder('Application\Document\Item')
->distinct('storeName')
->limit(10)
->getQuery()
->execute();
This query render 100 entries but I want only 10 results.
With query builder class in ORM you need to use:
->setMaxResults(10);
As per #Siol and #john Smith said, in ODM you could use limit:
->limit(10);
I don't think distinct will work with limit as suggested in the Jira mongodb issue ticket Ability to use Limit() with Distinct():
The current Distinct() implementation only allows for bringing back
ALL distinct values in the collection or matching a query, but there
is no way to limit these results. This would be very convenient and
there are many use cases.

How can I effectively join 2 huge collections in MongoDb?

I have two huge (few hundred thousands of records) collections Col1 and Col2, and I need to fetch joined data from both of them. There is a join criteria that lets me dramatically decrease number of records returned to few hundreds, so in SQL language I would run something like
SELECT ... FROM Col1 INNER JOIN Col2 ON Col1.field1 = Col2.field2
and it would run pretty fast as Col1.field1 and Col2.field2 are indexed fields. Is there any direct way or workaround to do the same thing fast in MongoDb with indexes usage, not to scan all the items?
Note: I cannot redesign collections to merge them into one.
MongoDB has no JOIN so there is not a fast equivalent. It is most likely a schema design issue but you said you can't change that. You can't query multiple collections in one query.
You can either do the join client-side in 2 queries or you can do it in non-live style by doing a map-reduce and generating a 3rd collection.
Reference this other question for details on how to do a map-reduce
In order to join in MongoDb 4.2 you can use aggregation and $lookup like this query:
db.collection.aggregate([
{ $lookup: { from: "...", ... } }
])
It is usefull for me
More information: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
the join in MongoDB is so expensive. 2 solutions:
Redesign merge them into one
limit, match before you join

MongoDB: How to execute a query to result of another query (nested queries)?

I need to apply a set of filters (queries) to a collection. By default, the MongoDB applies AND operator to all queries submitted to find function. Instead of whole AND I need to apply each query sequentially (one by one). That is, I need to run the first-query and get a set of documents, run the second-query to result of first-query, and so on.
Is this Possible?
db.list.find({..q1..}).find({..q2..}).find({..q3..});
Instead Of:
db.list.find({..q1..}, {..q2..}, {..q3..});
Why do I need this?
Bcoz, the second-query needs to apply an aggregate function to result of first-query, instead of applying the aggregate to whole collection.
Yes this is possible in MongoDB. You can write nested queries as per the requirement.Even in my application I created nested MongoDb queries.If you are familiar with SQL syntax then compare this with in of sql syntax:
select cname from table where cid in (select .....)
In the same way you can create nested MongoDB queries on different collections also.