MongoDB versioning with timestamps as keys - mongodb

I'm trying to implement versioning (based on storing the diffs between documents) in MongoDB as described in this post: Ways to implement data versioning in MongoDB.
I'm using the same approach so far but want to make queries like: "Give me all the changes between Date1 and Date2". The schema presented in the post uses just a plain incremental numbering as the key for the changes which doesn't allow this kind of querying:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
My only guess is that to make time based queries I would have to include some time information in the key value. Perhaps an ObjectId or ISODate like so:
{
_id : "id of address book record",
changes : {
ISODate(Date) : { "city" : "Omaha", "state" : "Nebraska" },
ObjectId(id) : { "city" : "Kansas City", "state" : "Missouri" }
}
}
I googled quite a while for a solution but couldn't find anything really useful. If someone has an idea on how to do this, it would be great if you can help me out. It might even be that this is the completely wrong way to approach the problem and that there is a better way to solve this issue ... I am open to any suggestions!
Thanks for helping me out...

Using a IsoDate (or simple Date) for versioning is a good approach. However, in order to search for it efficiently, you shouldn't use it as a field identifier, but instead make the changes an array and put the date into the sub-documents in this array:
{
_id : "id of address book record",
changes : [
{ "change_date": ISODate(Date), "city" : "Omaha", "state" : "Nebraska" },
{ "change_date": ISODate(Date), "city" : "Kansas City", "state" : "Missouri" }
]
}
This allows you to use the changes.change_date field for queries with $gte or $lte. You can also find the most recent one with the $max aggregation grouping operator (not to be confused with the $max query operator, which does something completely different).
You might also want to add an index to changes.change_date to speed up these queries.

Related

Query on all descendants of node

Given a collection of MongoDb documents with a property "myContacts" like this:
{
"_id": 123,
"myContacts" : {
"contacts" : {
"10" : {
"_id" : NumberLong(10),
"name" : "c1",
"prop" : true
},
"20" : {
"_id" : NumberLong(20),
"name" : "c2"
},
}
}
}
I want to select all documents, where at least one contact lacks the "prop" field.
I figured out a general query:
db.getCollection('xyz').find({ 'myContacts.contacts.???.prop': { $exists: false } })
The problem is that IDs of the contacts are part of the path and I cannot know them ahead. I want sth like 'myContacts.contacts.$anyChild.prop', but cannot find sth similar in the mongo docs.
Does it mean there is no way to do it?
PS: I cannot change the document structure, a live app uses it. I've spent some time with Google and my bet it's not possible. I however would like an opinion from people who have experience with Mongo.
Thank you guys for helpful comments, this got me going! I could get the results I wanted with:
db.getCollection('xyz').aggregate([{$project: {_id:1, contacts:{$objectToArray: "$myContacts.contacts"}}}, {$match: {"contacts.v.prop" : null}}])

Using $last on Mongo Aggregation Pipeline

I searched for similar questions but couldn't find any. Feel free to point me in their direction.
Say I have this data:
{ "_id" : ObjectId("5694c9eed4c65e923780f28e"), "name" : "foo1", "attr" : "foo" }
{ "_id" : ObjectId("5694ca3ad4c65e923780f290"), "name" : "foo2", "attr" : "foo" }
{ "_id" : ObjectId("5694ca47d4c65e923780f294"), "name" : "bar1", "attr" : "bar" }
{ "_id" : ObjectId("5694ca53d4c65e923780f296"), "name" : "bar2", "attr" : "bar" }
If I want to get the latest record for each attribute group, I can do this:
> db.content.aggregate({$group: {_id: '$attr', name: {$last: '$name'}}})
{ "_id" : "bar", "name" : "bar2" }
{ "_id" : "foo", "name" : "foo2" }
I would like to have my data grouped by attr and then sorted by _id so that only the latest record remains in each group, and that's how I can achieve this. BUT I need a way to avoid naming all the fields that I want in the result (in this example "name") because in my real use case they are not known ahead.
So, is there a way to achieve this, but without having to explicitly name each field using $last and just taking all fields instead? Of course, I would sort my data prior to grouping and I just need to somehow tell Mongo "take all values from the latest one".
See some possible options here:
Do multiple find().sort() queries for each of the attr values you
want to search.
Grab the original _id of the $last doc, then do a findOne() for each of those values (this is the more extensible option).
Use the $$ROOT system variable as shown here.
This wouldn't be the quickest operation, but I assume you're using this more for analytics, not in response to a user behavior.
Edited to add slouc's example posted in comments:
db.content.aggregate({$group: {_id: '$attr', lastItem: { $last: "$$ROOT" }}}).

I have big database on mongodb and can't find and use my info

This my code:
db.test.find() {
"_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
"title" : "Sir",
"name" : {
"_id" : ObjectId("4d3ed089fb60ab534684b7ff"),
"first_name" : "Farid"
},
"addresses" : [
{
"city" : "Baku",
"country" : "Azerbaijan"
},{
"city" : "Susha",
"country" : "Azerbaijan"
},{
"city" : "Istanbul",
"country" : "Turkey"
}
]
}
I want get output only all city. Or I want get output only all country. How can i do it?
I'm not 100% about your code example, because if your 'find' by ID there's no need to search by anything else... but I wonder whether the following can help:
db.test.insert({name:'farid', addresses:[
{"city":"Baku", "country":"Azerbaijan"},
{"city":"Susha", "country":"Azerbaijan"},
{"city" : "Istanbul","country" : "Turkey"}
]});
db.test.insert({name:'elena', addresses:[
{"city" : "Ankara","country" : "Turkey"},
{"city":"Baku", "country":"Azerbaijan"}
]});
Then the following will show all countries:
db.test.aggregate(
{$unwind: "$addresses"},
{$group: {_id:"$country", countries:{$addToSet:"$addresses.country"}}}
);
result will be
{ "result" : [
{ "_id" : null,
"countries" : [ "Turkey", "Azerbaijan"]
}
],
"ok" : 1
}
Maybe there are other ways, but that's one I know.
With 'cities' you might want to take more care (because I know cities with the same name in different countries...).
Based on your question, there may be two underlying issues here:
First, it looks like you are trying to query a Collection called "test". Often times, "test" is the name of an actual database you are using. My concern, then, is that you are trying to query the database "test" to find any collections that have the key "city" or "country" on any of the internal documents. If this is the case, what you actually need to do is identify all of the collections in your database, and search them individually to see if any of these collections contain documents that include the keys you are looking for.
(For more information on how the db.collection.find() method works, check the MongoDB documentation here: http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find)
Second, if this is actually what you are trying to do, all you need to for each collection is define a query that only returns the key of the document you are looking for. If you get more than 0 results from the query, you know documents have the "city" key. If they don't return results, you can ignore these collections. One caveat here is if data about "city" is in embedded documents within a collection. If this is the case, you may actually need to have some idea of which embedded documents may contain the key you are looking for.

Retrieving only the relevant part of a stored document

I'm a newbie with MongoDB, and am trying to store user activity performed on a site. My data is currently structured as:
{ "_id" : ObjectId("4decfb0fc7c6ff7ff77d615e"),
"activity" : [
{
"action" : "added",
"item_name" : "iPhone",
"item_id" : 6140,
},
{
"action" : "added",
"item_name" : "iPad",
"item_id" : 7220,
}
],
"name" : "Smith,
"user_id" : 2
}
If I want to retrieve, for example, all the activity concerning item_id 7220, I would use a query like:
db.find( { "activity.item_id" : 7220 } );
However, this seems to return the entire document, including the record for item 6140.
Can anyone suggest how this might be done correctly? I'm not sure if it's a problem with my query, or with the structure of the data itself.
Many thanks.
You have to wait the following dev: https://jira.mongodb.org/browse/SERVER-828
You can use $slice only if you know insertion order and position of your element.
Standard queries on MongoDb always return all document.
(question also available here: MongoDB query to return only embedded document)

Suitability of MongoDB for hierarchial type queries

I have a particular data manipulation requirement that I have worked out how to do in SQL Server and PostgreSQL. However, I'm not too happy with the speed, so I am investigating MongoDB.
The best way to describe the query is as follows. Picture the hierarchical data of the USA: Country, State, County, City. Let's say a particular vendor can service the whole of California. Another can perhaps service only Los Angeles. There are potentially hundreds of thousands of vendors and they all can service from some point(s) in this hierarchy down. I am not confusing this with Geo - I am using this to illustrate the need.
Using recursive queries, it is quite simple to get a list of all vendors who could service a particular user. If he were in say Pasadena, Los Angeles, California, we would walk up the hierarchy to get the applicable IDs, then query back down to find the vendors.
I know this can be optimized. Again, this is just a simple query example.
I know MongoDB is a document store. That suits other needs I have very well. The question is how well suited is it to the query type I describe? (I know it doesn't have joins - those are simulated).
I get that this is a "how long is a piece of string" question. I just want to know if anyone has any experience with MongoDB doing this sort of thing. It could take me quite some time to go from 0 to tested, and I'm looking to save time if MongoDB is not suited to this.
EXAMPLE
A local movie store "A" can supply Blu-Rays in Springfield. A chain store "B" with state-wide distribution can supply Blu-Rays to all of IL. And a download-on-demand store "C" can supply to all of the US.
If we wanted to get all applicable movie suppliers for Springfield, IL, the answer would be [A, B, C].
In other words, there are numerous vendors attached at differing levels on the hierarchy.
I realize this question was asked nearly a year ago, but since then MongoDB has an officially supported solution for this problem, and I just used their solution. Refer to their documentation here: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
The concept relating closest to your question is named "partial path."
While it may feel a bit heavy to embed ancestor data; this approach is the most suitable way to solve your problem in MongoDB. The only pitfall to this, that I've experienced so far, is that if you're storing all of this in a single document you can hit the, as of this time, 16MB document size limit when working with enough data (although, I can only see this happening if you're using this structure to track user referrals [which could reach millions] rather than US cities [which is upwards of 26,000 according to the latest US Census]).
References:
http://www.mongodb.org/display/DOCS/Schema+Design
http://www.census.gov/geo/www/gazetteer/places2k.html
Modifications:
Replaced link: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Note that this question was also asked on the google group. See http://groups.google.com/group/mongodb-user/browse_thread/thread/5cd5edd549813148 for that disucssion.
One option is to use an array key. You can store the hierarchy as an
array of values (for example ['US','CA','Los Angeles']). Then you can
query against records based on individual elements in that array key
For example:
First, store some documents with the array value representing the
hierarchy
> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} )
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} )
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} )
Make sure we have an index on the location field so we can perform
fast queries against its values
> db.hierarchical.ensureIndex({'location':1})
Find all records in California
> db.hierarchical.find({location: 'CA'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
Find all records in Massachusetts
> db.hierarchical.find({location: 'MA'})
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Find all records in the US
> db.hierarchical.find({location: 'US'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Note that in this model, your values in the array would need to be
unique. So for example, if you had 'springfield' in different states,
then you would need to do some extra work to differentiate.
> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' })
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' })
> db.hierarchical.find({location: 'Springfield'})
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }
You can overcome this by using the $all operator and specifying more
levels of the hierarchy. For example:
> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} })
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} })
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }