I have 2 collections companies and products where partnerId in common. I need to find the names of the company companies.name which are not available in products collection matching the partnerId. How can I do that? I have tried in both the aggregate and SQL as follows
db.companies.aggregate([{$lookup:
{
from: "products",
localField: "partner",
foreignField: "_id",
as: "products"
}
}])
and
mb.runSQLQuery(`
SELECT name
FROM companies
WHERE NOT EXISTS (
SELECT *
FROM products
WHERE partnerId=companies.partnerId
`);
I am using nosqlbooster. The first query is returning all the documents of companies and the second query is throwing an error. Thanks in advance
I don't recommend using a 3rd party to create your MongoDB queries from SQL. SQL and MongoDB query language are very different and you will not be able to effectively or efficiently utilize this language by doing so.
As a rule, in document DBs the data should be stored denormalized, unlike in transactional tabular databases where the data should normalized. So a better design would be to add the data as sub-documents i.e. a list inside the document or a dictionary/object within the document.
With the data design you have implemented, a simple query will not help - each query only runs on a single collection. However, you can use the aggregation pipeline's $loopup operator to enrich the results from one collection with results from another..
Related
I am trying to take an extract from a huge MongoDB collection.
In particular, the collection contains 2.65TB data (unzipped), i.e., 600GB data (zipped). Each document has a deep hierarchy and a couple of arrays and I want to extract some parts out of them. In this collection we have multiple documents for each customer id. Since I want to export the most active document for each customer, I need to group and take the records with the maximum timestamp field and perform some further processing on them. I need some help in forming the query for the export. I have tried to sort the documents per customer id, but this could not be achieved in an acceptable time when combined with a 'match' construct (this is needed since it is a huge collection and we try to create the export in parts). Currently the query looks like this:
db.getCollection('CEM').aggregate([
{'$match' : {'LiveFeed.customer.profile.id':'TCAYT2RY2PF93R93JVSUGU7D3'}},
{'$project':{'LiveFeed.customer.profile.id':1,'LiveFeed.customer.profile.products.air.flights':1, 'LiveFeed.context.timestamp':1}},
{'$sort':{'LiveFeed.customer.profile.id':1,"LiveFeed.context.timestamp":1}},
{'$group':{'_id':'$LiveFeed.customer.profile.id',
'products':{'$last':'$LiveFeed.customer.profile.products.air.flights'}}},
{'$unwind': '$products'},
{'$unwind': '$products.sources'},
{'$project':{'_id':0,
'ceid': '$_id',
'coupon_no':{'$ifNull':['$products.couponId.couponNumber', ""]},
'ticket_no':{'$ifNull':['$products.couponId.ticketId.number','']},
'pnr_id':'$products.sources.id',
'departure_date':'$products.segment.departure.at',
'departure_airport':'$products.segment.departure.code',
'arrival_airport':'$products.segment.arrival.code',
'created_date':'$products.createdAt'}}])
Any ideas/suggestions on to how to improve this query will be very helpful indeed - Thanks in advance!
It is difficult to answer this without knowing the indexes on your collection. However, you can save some time by eliminating stage 3. The $sort is undone by the $group in stage 4. See $group does not preserve order
QUERYING MONGODB: RETREIVE SHOPS BY NAME AND BY LOCATION WITH ONE SINGLE QUERY
Hi folks!
I'm building a "search shops" application using MEAN Stack.
I store shops documents in MongoDB "location" collection like this:
{
_id: .....
name: ...//shop name
location : //...GEOJson
}
UI provides to the users one single input for shops searching. Basically, I would perform one single query to retrieve in the same results array:
All shops near the user (eventually limit to x)
All shops named "like" the input value
On logical side, I think this is a "$or like" query
Based on this answer
Using full text search with geospatial index on Mongodb
probably assign two special indexes (2dsphere and full text) to the collection is not the right manner to achieve this, anyway I think this is a different case just because I really don't want to apply sequential filter to results, "simply" want to retreive data with 2 distinct criteria.
If I should set indexes on my collection, of course the approach is to perform two distinct queries with two distinct mehtods ($near for locations and $text for name), and then merge the results with some server side logic to remove duplicate documents and sort them in some useful way for user experience, but I'm still wondering if exists a method to achieve this result with one single query.
So, the question is: is it possible or this kind of approach is out of MongoDB purpose?
Hope this is clear and hope that someone can teach something today!
Thanks
I have two huge (few hundred thousands of records) collections Col1 and Col2, and I need to fetch joined data from both of them. There is a join criteria that lets me dramatically decrease number of records returned to few hundreds, so in SQL language I would run something like
SELECT ... FROM Col1 INNER JOIN Col2 ON Col1.field1 = Col2.field2
and it would run pretty fast as Col1.field1 and Col2.field2 are indexed fields. Is there any direct way or workaround to do the same thing fast in MongoDb with indexes usage, not to scan all the items?
Note: I cannot redesign collections to merge them into one.
MongoDB has no JOIN so there is not a fast equivalent. It is most likely a schema design issue but you said you can't change that. You can't query multiple collections in one query.
You can either do the join client-side in 2 queries or you can do it in non-live style by doing a map-reduce and generating a 3rd collection.
Reference this other question for details on how to do a map-reduce
In order to join in MongoDb 4.2 you can use aggregation and $lookup like this query:
db.collection.aggregate([
{ $lookup: { from: "...", ... } }
])
It is usefull for me
More information: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
the join in MongoDB is so expensive. 2 solutions:
Redesign merge them into one
limit, match before you join
I'm learning MongoDB these days. I find that MongoDB doesn't support join.
I just want to know why MongoDB choose to do this?
THANKS in advance..
Mongo - is not relational database and does not have physical relations and constraints.
Join kills scalability.
Usually denormalization replace sql join.
For example, on stackoverflow you have question and his owner, in mongodb it is normal case to denormilize owner data into question and avoid a join:
question
{
_id,
text,
user_short :
{
id,
full_name
}
}
It is for sure lead to additional complexity on updates, but it give you significant performance improvements when you read the data. And for the most applications read is 95% and writes only 5% or even less.
Because MongoDb is a non relational database. Non-relational database does not support join it is by design.
You can now do it in Mongo 3.2 using $lookup
$lookup takes four arguments
from: Specifies the collection in the same database to perform the join with. The from collection cannot be sharded.
localField: Specifies the field from the documents input to the $lookup stage. $lookup performs an equality match on the localField to the foreignField from the documents of the from collection.
foreignField: Specifies the field from the documents in the from collection.
as: Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection.
db.Foo.aggregate(
{$unwind: "$bars"},
{$lookup: {
from:"bar",
localField: "bars",
foreignField: "_id",
as: "bar"
}},
{$match: {
"bar.testprop": true
}}
)
For example, we have two collections
users {userId, firstName, lastName}
votes {userId, voteDate}
I need a report of the name of all users which have more than 20 votes a day.
How can I write query to get data from MongoDB?
The easiest way to do this is to cache the number of votes for each user in the user documents. Then you can get the answer with a single query.
If you don't want to do that, the map-reduce the results into a results collection, and query that collection. You can then run incremental map-reduces that only calculate new votes to keep your results up to date: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-IncrementalMapreduce
You shouldn't really be trying to do joins with Mongo. If you are you've designed your schema in a relational manner.
In this instance I would store the vote as an embedded document on the user.
In some scenarios using embedded documents isn't feasible, and in that situation I would do two database queries and join the results at the client rather than using MapReduce.
I can't provide a fuller answer now, but you should be able to achieve this using MapReduce. The Map step would return the userIds of the users who have more than 20 votes, the reduce step would return the firstName and lastName, I think...have a look here.