I have a collection vehiclelocation in MongoDB which is like this :
{
_id: ...,
Vehicle: ...,
Location: ....,
Date: ....
}
It contains an history of vehicle locations.
I have two criteria for my query : a location, and a date.
I need to get the list of vehicles of which the last known location at a given date is the one from my criteria.
I was able to do that in SQL, but I'm trying to migrate to Mongo, and can't figure out how to write that.
A more simple alternative is to get the last known location of the vehicles at a given date.
It's easy to do that for one specific vehicle, but I don't how to do that for all the vehicles in one query.
Is there any way to do it with MongoDB ?
EDIT:
I tried this:
db.runCommand( {aggregate:"vehiclelocationhistory", pipeline: [
{$match: {DateTime:{$lte:ISODate("2013-08-02T10:50:00Z")}}},
{$sort:{DateTime:-1}},
{$group:{_id:"$VehicleId", LocationId: {$first: "$LocationId"} } }], allowDiskUse:true})
But I had to add "allowDiskUse" which means it is consuming a lot of memory.
Related
I am building a website using Next.js and MongoDB. On one of my website page, I have implemented filters to help search for products. To retrieve and update the filters (update item count each time a filter is changing), I have an api endpoint which query my MongoDB Collection. This specific collection contains ~200.000 items. Each item have several fields such as brand, model, place etc...
I have 9 fields which I use to filter and thus must fetch through my api each time there's a change. Therefore I have 9 queries running through my api, on for each field/filter and the query on MongoDB looks like :
var models = await db_collection
.aggregate([
{
$match: {
$and: [filter],
},
},
{
$group: { _id: '$model', count: { $sum: 1 } },
},
{ $sort: { _id: 1 } },
])
.toArray();
The problem is that, as 9 queries are running, the update of the page (mainly due to the queries) takes ~4secs which is too long. I would like to reach <1sec. I would like to now if there is a good practice I am missing such as doing one query instead of one for each filter or maybe a database optimization on my database.
Thank you,
I have tried using a $project argument before $groupon aggregate pipeline for the query to reduce the number of field returned, using distinct and then sorting instead of aggregate but none of these solutions seem to improve efficiency.
EDIT :
As suggested by R2D2, I am posting the structure of a document on MongoDB in my collection :
{
_id : ObjectId('example_id')
source : string
date : date
brand : string
family : string
model : string
size : string
color : string
condition : string
contact : string
SKU : string
}
Depending on the pages, I query unique values of each field of interest (source, date, brand, family, model, size, color, condition, contact) and their count depending on filters (e.g. Number for each unique values of model for selected brands, I also query documents based on specific values of these fields.
As mentioned, you indexes are important and if you are querying by those field I recomand to create compound indexes, see here for indexes optimisation : https://learnmongodbthehardway.com/schema/indexes/
As far as the aggregation pipeline goes, nothing is out of the ordinary, but this specific aggregation just return the number of items per model matching the criteria, not the matching document. If it is all the data you need you might find it usefull to create a new collection when you perform pre-caculation for common search daily (how many items have the color black, ...) this way, when the page loads, you don't have to look in you 200k+ items, but just in your pre-calculated statistical collection. Schedule a cron task or use a lambda function to invoke a route on your api that will calculate all your stats once a day and upsert them in a new collection.
Also I believe the "and" is useless useless since you can use the implicit $and. You can look for an object like :
{
color : {$in : ['BLACK', 'BLUE']},
size : 3
}
rather than :
[{color : 'BLACK'}, {color : 'BLUE'}, {size : 3}]
Reserve the explicit $and for when you really need it.
I have a single collection named assets that contains documents in 2+ formats, ParentObject and ChildObject. I am currently associating ParentObject to ChildObject with two queries. Can this be done with an aggregate query?
ParentObject
{
"_id" : {
"oid" : "ParentFooABC",
"brand" : "acme"
},
"type": "com.ParentClass",
"title": "Title1234",
"availableDate": Date,
"expirationDate": Date
}
ChildObject
{
"_id" : {
"oid" : "ChildFoo",
"brand" : "acme"
},
"type": "com.ChildClass",
"parentObject": "ParentFooABC",
"title": "Title1234",
"modelNumber": "8HE56",
"modelLine": "Metro",
"availableDate": Date,
"expirationDate": Date,
"IDRequired": true
}
Currently I filter data like this
val parent = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ParentClass"))
val children = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ChildClass"), MongoDBObject("parentObject": "${parent._id.oid}"))
if(childs.nonEmpty) {
//I have confirmed this parent has a child associated and should be returned
val childModelNumbers = childs.map(child -> child.modelNumber)
val response = ResponseObject(parent, childModelNumbers)
}
Can I do this in an aggregate query?
Updated:
Mongo Version: db version v2.6.11
Language: Scala
Driver: Casbah 2.8.1
Technically yes, however what you're doing now is standard practice with mongodb. If you need to join collections of data frequently you probably should use an RDBMS. However if you occasionally need to aggregate data from 2 separate collection then there is $lookup. In general you'll find yourself populating data into your document from another document pretty frequently, but that's typically how a document database works. Using $lookup when you're really looking for an equivalent to a SQL JOIN isn't something I would recommend. It's not really the intended purpose, and you'd be better off doing exactly what you're doing now or moving to an RDBMS.
Addition:
Since the data is in the same collection you can use an aggregation to $group that data together. You just have to group them based on _id.brand.
A quick example(in mongodb shell):
db.asset.aggregate([
{ //start with a match to narrow down
$match: {"_id.brand": "acme"}
},
{ //group by `_id.brand`
$group: {
_id: "$_id.brand",
"parentObject": {$first: "$parentObject"},
title: {$first: "$title"},
modelNumer: {$addToSet: "$modelNumer"}, //accumulate child model # as array
modelLine: {$addToSet: "$modelLine"}, //accumulate child model line as array
availableDate: {$first: "$availableDate"},
expirationDate: {$first: "$expirationDate"}
}
}
]);
This should return documents where all the children's properties have been grouped into the parent document as an array(modelNumber,modelLine). It's probably better to do what you're doing above, it might even be better yet if you put the children into their own collection instead of keeping track of their type with a type field in the document. This way you know that the documents in each collection represent a given data structure. However, if you were to do that you would not be able to perform the aggregation in the example.
I'm currently working with mongodb (mongoose). One of my example documents is:
myuser{
_id: 7777...,
money: 1000,
ships:[{
_id: 7777...
name: "myshipname",
products:[{
product_id: 7777....,
quantity: 24
}]
}
}
What I aim to do is to get a certain product given user id, ship id and product id, being the result something like: { product_id: 777..,quantity:24}
So far I got to find a certain user ship with:
findOne(userId,ships:{ $elemMatch: {_id:shipId}})
Which returns the information of the ship with shipId inside the array ships from user with userId. However, I cannot find the way to get only a certain product from that ship
What you want can probably best be done using the aggregation framework. Something like:
db.users.aggregate([
{$match: { _id : <user>}},
{$unwind: "$ships"},
{$unwind: "$ships.products"},
{$match: { "ships._id": <ship>}},
{$match: { "ships.products.product_id": <product>}}
]);
Note I'm not on a computer with mongo right now, so my syntax might be a bit off.
I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!
I have two collections - shoppers (everyone in shop on a given day) and beach-goers (everyone on beach on a given day). There are entries for each day, and person can be on a beach, or shopping or doing both, or doing neither on any day. I want to now do query - all shoppers in last 7 days who did not go to beach.
I am new to Mongo, so it might be that my schema design is not appropriate for nosql DBs. I saw similar questions around join and in most cases it was suggested to denormalize. So one solution, I could think of is to create collection - activity, index on date, embed actions of user. So something like
{
user_id
date
actions {
[action_type, ..]
}
}
Insertion now becomes costly, as now I will have to query before insert.
A few of suggestions.
Figure out all the queries you'll be running, and all the types of data you will need to store. For example, do you expect to add activities in the future or will beach and shop be all?
Consider how many writes vs. reads you will have and which has to be faster.
Determine how your documents will grow over time to make sure your schema is scalable in the long term.
Here is one possible approach, if you will only have these two activities ever. One record per user per day.
{ user: "user1",
date: "2012-12-01",
shopped: 0,
beached: 1
}
Now your query becomes even simpler, whether you have two or ten activities.
When new activity comes in you always have to update the correct record based on it.
If you were thinking you could just append a record to your collection indicating user, date, activity then your inserts are much faster but your queries now have to do a LOT of work querying for both users, dates and activities.
With proposed schema, here is the insert/update statement:
db.coll.update({"user":"username", "date": "somedate"}, {"shopped":{$inc:1}}, true)
What that's saying is: "for username on somedate increment their shopped attribute by 1 and create it if it doesn't exist aka "upsert" (that's the last 'true' argument).
Here is the query for all users on a particular day who did activity1 more than once but didn't do any of activity2.
db.coll.find({"date":"somedate","shopped":0,"danced":{$gt:1}})
Be wary of picking a schema where a single document can have continuous and unbounded growth.
For example, storing everything in a users collection where the array of dates and activities keeps growing will run into this problem. See the highlighted section here for explanation of this - and keep in mind that large documents will keep getting into your working data set and if they are huge and have a lot of useless (old) data in them, that will hurt the performance of your application, as will fragmentation of data on disk.
Remember, you don't have to put all the data into a single collection. It may be best to have a users collection with a fixed set of attributes of that user where you track how many friends they have or other semi-stable information about them and also have a user_activity collection where you add records for each day per user what activities they did. The amount or normalizing or denormalizing of your data is very tightly coupled to the types of queries you will be running on it, which is why figure out what those are is the first suggestion I made.
Insertion now becomes costly, as now I will have to query before insert.
Keep in mind that even with RDBMS, insertion can be (relatively) costly when there are indices in place on the table (ie, usually). I don't think using embedded documents in Mongo is much different in this respect.
For the query, as Asya Kamsky suggest you can use the $nin operator to find everyone who didn't go to the beach. Eg:
db.people.find({
actions: { $nin: ["beach"] }
});
Using embedded documents probably isn't the best approach in this case though. I think the best would be to have a "flat" activities collection with documents like this:
{
user_id
date
action
}
Then you could run a query like this:
var start = new Date(2012, 6, 3);
var end = new Date(2012, 5, 27);
db.activities.find({
date: {$gte: start, $lt: end },
action: { $in: ["beach", "shopping" ] }
});
The last step would be on your client driver, to find user ids where records exist for "shopping", but not for "beach" activities.
One possible structure is to use an embedded array of documents (a users collection):
{
user_id: 1234,
actions: [
{ action_type: "beach", date: "6/1/2012" },
{ action_type: "shopping", date: "6/2/2012" }
]
},
{ another user }
Then you can do a query like this, using $elemMatch to find users matching certain criteria (in this case, people who went shopping in the last three days:
var start = new Date(2012, 6, 1);
db.people.find( {
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
}
});
Expanding on this, you can use the $and operator to find all people went shopping, but did not go to the beach in the past three days:
var start = new Date(2012, 6, 1);
db.people.find( {
$and: [
actions : {
$elemMatch : {
action_type : { $in: ["shopping"] },
date : { $gt : start }
}
},
actions : {
$not: {
$elemMatch : {
action_type : { $in: ["beach"] },
date : { $gt : start }
}
}
}
]
});