I am using mongodb for store, I need to find how frequency one item is selling. I know logic, but not syntactic way in mongo, assume I have 3 items, first itemA was sold in "2015-08-25 00:28:41", itemB "2015-08-25 00:29:05", itemC "2015-08-25 00:30:02", so I need to subtract C-B, B-A then add and divide 2. How can I do query for multiple items ? for example 100 items. Thanks.
Seems your question is a bit more basic - How to query MongoDB.
So if your collection name is 'store' you will use:
db.store.find() // This will get all records.
If you want to sort it by date, you can add .sort({ date: -1}) - will sort them.
Then adding .limit(100) will limit the results you will get, then you can carry on with whatever logic you need.
db.store.find().sort({ date: -1}).limit(100)
Related
I am trying to take an extract from a huge MongoDB collection.
In particular, the collection contains 2.65TB data (unzipped), i.e., 600GB data (zipped). Each document has a deep hierarchy and a couple of arrays and I want to extract some parts out of them. In this collection we have multiple documents for each customer id. Since I want to export the most active document for each customer, I need to group and take the records with the maximum timestamp field and perform some further processing on them. I need some help in forming the query for the export. I have tried to sort the documents per customer id, but this could not be achieved in an acceptable time when combined with a 'match' construct (this is needed since it is a huge collection and we try to create the export in parts). Currently the query looks like this:
db.getCollection('CEM').aggregate([
{'$match' : {'LiveFeed.customer.profile.id':'TCAYT2RY2PF93R93JVSUGU7D3'}},
{'$project':{'LiveFeed.customer.profile.id':1,'LiveFeed.customer.profile.products.air.flights':1, 'LiveFeed.context.timestamp':1}},
{'$sort':{'LiveFeed.customer.profile.id':1,"LiveFeed.context.timestamp":1}},
{'$group':{'_id':'$LiveFeed.customer.profile.id',
'products':{'$last':'$LiveFeed.customer.profile.products.air.flights'}}},
{'$unwind': '$products'},
{'$unwind': '$products.sources'},
{'$project':{'_id':0,
'ceid': '$_id',
'coupon_no':{'$ifNull':['$products.couponId.couponNumber', ""]},
'ticket_no':{'$ifNull':['$products.couponId.ticketId.number','']},
'pnr_id':'$products.sources.id',
'departure_date':'$products.segment.departure.at',
'departure_airport':'$products.segment.departure.code',
'arrival_airport':'$products.segment.arrival.code',
'created_date':'$products.createdAt'}}])
Any ideas/suggestions on to how to improve this query will be very helpful indeed - Thanks in advance!
It is difficult to answer this without knowing the indexes on your collection. However, you can save some time by eliminating stage 3. The $sort is undone by the $group in stage 4. See $group does not preserve order
I have a headache for a idea how to properly sort data from a MongoDB. It is using 2dsphere index and has timestamp createdAt. The goal is to show latest pictures (that what this collection is about, just a field mediaUrl...) but it has to be close to the user. I'm not very familiar with complex MongoDB aggregation queries so I thought here's a good place to ask. Sorting with $near shows only items sorted by distance. But there's a upload time, e.g. if item is 5 min fresh but is like 500 meters far than older item it still should be sorted higher.
Ugly way would be to iterate every few hundreds meters and collect data but maybe there's a smarter way?
So if I am correct you want to be able to sort on 2 fields ?
distance
timestamp
You should check out this method:
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/
It allows you to sort multiple columns.
I'm trying to store "Votes" in MongoDB and I am stuck on how to proceed in an efficient way.
Basically , I have a question with several options like A B C D ...(6 total).
I am giving voters the option to choose an option and want to save the "Vote" with fields like:
MongoDate, option, voter name, and maybe couple more fields.
I am planning to have unlimited "Votes" in the thousands and even in millions on a given question.
In terms of retrieving the data : I would like to be able to query it mainly by Date and present in charts, like a stock price with hourly, daily, monthly... intervals
In other words it is like time series.
I am not sure on the "format" of the document in MongoDB;
One reasonable way to do it would be to have a votes collection, where each document looks like:
{
v: 'a', //voted for the first option
d: Date(), //the date
n: 'Bob',
...
}
Then, index on the date field. Be careful not to shard on the date field alone, though, if you have to end up sharding this. I listed the field names as single characters because the name of every field is stored in mongoDB, so for better space efficiency, you should use shorter names. If you aren't concerned about space, a longer, more informative name is probably fine.
Let say we have user and post collection. In post collection, vote store the user name as a key.
db.user.insert({name:'a', age:12});
db.user.insert({name:'b', age:12});
db.user.insert({name:'c', age:22});
db.user.insert({name:'d', age:22});
db.post.insert({Title:'Title1', vote:[a]});
db.post.insert({Title:'Title2', vote:[a,b]});
db.post.insert({Title:'Title3', vote:[a,b,c]});
db.post.insert({Title:'Title4', vote:[a,b,c,d]});
We would like to group by the post.Title and find out the count of vote in different user age.
> {_id:'Title1', value:{ ages:[{age:12, Count:1},{age:22, Count:0}]} }
> {_id:'Title2', value:{ ages:[{age:12, Count:2},{age:22, Count:0}]} }
> {_id:'Title3', value:{ ages:[{age:12, Count:2},{age:22, Count:1}]} }
> {_id:'Title4', value:{ ages:[{age:12, Count:2},{age:22, Count:2}]} }
I have searched through and doesn't find a way to access 2 collection in mongodb mapreduce.
Could it be possible to achieve in re-reduce?
I know it is much simple to embedded the user document in post, but it is not a nice way to do as the real user document have many properties. If we include the simplify version of user document, it will limit the dimension of analysis.
{Title:'Title1', vote:[{name:'a', age:12}]}
MongoDB does not have a multi-collection Map / Reduce. MongoDB does not have any JOIN syntax and may not be very good for ad-hoc joins. You will need to denormalize this data in some way.
You have a few options:
Option #1: Embed the age with the vote.
{Title:'Title1', vote:[{name:'a', age:12}]}
Option #2: Keep a counter of the ages
{Title:'Title1', vote:[a, b], age: { "12" : 1, "22" : 1 }}
Option #3: Do a "manual" join
Your last option is to write script/code that does a for loop over both collections and merges the data correctly.
So you would loop over post and output a collection with the title and the list of votes. Then you would loop through the new collection and update the ages by looking up each user.
My suggestion
Go with #1 or #2.
Instead of
{name:'a', age:12}
It is easier to add a new field to user document and maintain it in each vote update.Of course, you can enjoy to use map reduce to analysis your data.
{name:'a', age:12, voteTitle:["Title1","Title2","Title3","Title4"]}
I've a collection named Events. Each Eventdocument have a collection of Participants as embbeded documents.
Now is my question.. is there a way to query an Event and get all Participants thats ex. Age > 18?
When you query a collection in MongoDB, by default it returns the entire document which matches the query. You could slice it and retrieve a single subdocument if you want.
If all you want is the Participants who are older than 18, it would probably be best to do one of two things:
Store them in a subdocument inside of the event document called "Over18" or something. Insert them into that document (and possibly the other if you want) and then when you query the collection, you can instruct the database to only return the "Over18" subdocument. The downside to this is that you store your participants in two different subdocuments and you will have to figure out their age before inserting. This may or may not be feasible depending on your application. If you need to be able to check on arbitrary ages (i.e. sometimes its 18 but sometimes its 21 or 25, etc) then this will not work.
Query the collection and retreive the Participants subdocument and then filter it in your application code. Despite what some people may believe, this isnt terrible because you dont want your database to be doing too much work all the time. Offloading the computations to your application could actually benefit your database because it now can spend more time querying and less time filtering. It leads to better scalability in the long run.
Short answer: no. I tried to do the same a couple of months back, but mongoDB does not support it (at least in version <= 1.8). The same question has been asked in their Google Group for sure. You can either store the participants as a separate collection or get the whole documents and then filter them on the client. Far from ideal, I know. I'm still trying to figure out the best way around this limitation.
For future reference: This will be possible in MongoDB 2.2 using the new aggregation framework, by aggregating like this:
db.events.aggregate(
{ $unwind: '$participants' },
{ $match: {'age': {$gte: 18}}},
{ $project: {participants: 1}
)
This will return a list of n documents where n is the number of participants > 18 where each entry looks like this (note that the "participants" array field now holds a single entry instead):
{
_id: objectIdOfTheEvent,
participants: { firstName: 'only one', lastName: 'participant'}
}
It could probably even be flattened on the server to return a list of participants. See the officcial documentation for more information.