MongoDB C driver _id generation - mongodb

I use mongo_insert() three times to insert my data in three different collections. The problem is that the "_id" field must be exactly the same in each of the collections, but I do not know how to (ideally) recover and reuse the "_id" field generated in my first mongo_insert...
Please, advice me how to do it.

Normally, you could have different field, like CustomId for your private needs, and leave _id for mongo generation.
But if you still need it to be exactly the same - there could be 2 variants:
1) setting custom generated _id do each doc.
2) Save first doc, then read it again, check it's _id and set it to the other docs.

Related

create new collection in mongodb by adding index in system.indexes

when I attempt to insert new document manually in system.indexes collection in mongodb,new collection created.here goes the code
{
"v" : 1,
"key" : {
"code" : 1
},
"name" : "code_1",
"ns" : "mydb.collection"
}
where collection is my collection name which is not already present in database and mydb is my database name. Why new collection is getting created?
Is it possible to create collection by adding index manually in system.indexes.
Why are you asking us this? You already tried to add a new index to system.indexes. Has a new collection been created? If yes, then yes it is possible, if no, then not possible.
Is this a correct way?
How do you think? Have you read somewhere in documentation that in order to create a new collection you need to dance around and to create manually indexes in some system defined collection? Or may be it was written in documentation that db.createCollection(name, options) is what you should do or if you so desire you can just insert a document in a non existed collection and it will create it.
So why after all this one might think that the correct way is to do some manipulation with system.indexes?
As a complement to #Salvador Dali's answer strongly discouraging you to do modify system.index directly: if for some reason you really don't want/can't use createCollection, just remember this is a wrapper around the create command.
You can issue yourself such command to create a new collection:
db.runCommand( { create: "collection" } )
As about inserting an entry in system.indexes: from the doc:
Deprecated since version 3.0: Access this data using listIndexes.
The <database>.system.indexes collection lists all the indexes in the database.
By reading that it appears that system.indexes should be considered as read-only (its direct use is even deprecated since 3.0). The behavior you observed should be considered as unspecified. And so unreliable and subject to change without further notices.
If you really need to understand why it behave that way, maybe you should take a look at the source code or ask the question on the MongoDB developer mailing list. There you could have all the insights.

How to comapre all records of two collections in mongodb using mapreduce?

I have an use case in which I want to compare each record of two collections in mongodb and after comparing each record I need to find mismatch fields of all record.
Let us take an example, in collection1 I have one record as {id : 1, name : "bks"}
and in collection2 I have a record as {id : 1, name : "abc"}
When I compare above two records with same key, then field name is a mismatch field as name is different.
I am thinking to achieve this use case using mapreduce in mongodb. But I am facing some problems while accessing collection name in map function. When I tried to compare it in map function, I got error as : "errmsg" : "exception: ReferenceError: db is not defined near '
Can anyone give me some thoughts on how to compare records using mapreduce?
I might have helped you to read the documentation:
When upgrading to MongoDB 2.4, you will need to refactor your code if your map-reduce operations, group commands, or $where operator expressions include any global shell functions or properties that are no longer available, such as db.
So from your error fragment, you appear to be referencing db in order to access another collection. You cannot do that.
If indeed you are intending to "compare" items in one collection to those in another, then there is no other approach other than looping code:
db.collection.find().forEach(function(doc) {
var another = db.anothercollection.findOne({ "_id": doc._id });
// Code to compare
})
There is simply no concept of "joins" as such available to MongoDB, and operations such as mapReduce or aggregate or others strictly work with one collection only.
The exception is db.eval(), but as per all of strict warnings in the documentation, this is almost always a very bad idea.
Live with your comparison in looping code.

MongoDB: Dynamically generated field value

Suppose that I have a database with student information:
{'student_name' : 'Alen', 'subjects' : {'cse101' : 4, 'cse102' : 3, 'cse201' : 4}}
Suppose I need to store the aggregate information of the student as well. I can add the field 'aggregate' : 3.67 to the record. But the aggregate changes when another subject is added to the subjects list. Is there a way I can write a "dynamic field" which could calculate the aggregate whenever requested? Something like student['aggregate'] which is not persistent but available when needed?
P.S: Aggregate is just a simple example. I am dealing with something more complex involving various other fields of the element.
There are no dynamic or calculated fields in MongoDB at the moment (although there are some tickets in the jira).
But you can always implement this functionality in the app code.

The fastest way to show Documents with certain property first in MongoDB

I have collections with huge amount of Documents on which I need to do custom search with various different queries.
Each Document have boolean property. Let's call it "isInTop".
I need to show Documents which have this property first in all queries.
Yes. I can easy do sort in this field like:
.sort( { isInTop: -1 } );
And create proper index with field "isInTop" as last field in it. But this will be work slowly, as indexes in mongo works best with unique fields.
So is there is solution to show Documents with field "isInTop" on top of each query?
I see two solutions here.
First: set Documents wich need to be in top the _id from "future". As you know, ObjectId contains timestamp. So I can create ObjectId with timestamp from future and use natural order
Second: create separate collection for Ducuments wich need to be in top. And do queries in it first.
Is there is any other solutions for this problem? Which will work fater?
UPDATE
I have done this issue with sorting on custom field which represent rank.
Using the _id field trick you mention has the problem that at some point in time you will reach the special time, and you can't change the _id field (without inserting a new document and removing the old one).
Creating a special collection which just holds the ones you care about is probably the best option. It gives you the ability to logically (and to some extent, physically) separate the documents.
Newly introduced in mongodb there is also support for a "sparse" index which may fulfill your needs as well. You could only set the "isInTop" field when you want it to be special, and then create a sparse index on it which would not have the problems you would normally have with a single indexed boolean field (in btrees).

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

I'm trying to build small web-system (url shortcutting) using nonsql Cassandra DB, the problem I stack is id auto generation.
Did someone already stack with this problem?
Thanks.
P.S. UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence
UPDATED:
The reason why I'm not ok with GUID ids is inside of scope of my application.
My app has url shortcutting part, and I do need to make url as short as possible. So I follow next approach: I'm taking numbers starting from 0 and convert it base64 string. So in result I have url like mysite.com/QA (where QA is base 64 string).
This is was very easy to implement using SQL DB, I just took auto incremented ID, convert it to URL and was 100-percents sure, that URL is unique.
Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).
It can be done by using the findandmodify command.
Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:
> db.runCommand( { "findandmodify" : "sequences",
"query" : { "name" : "postid"},
"update" : { $inc : { "id" : 1 }},
"new" : true } );
This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.
Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?
You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.
Im not sure I follow you. What language are you using? Are we talking about uuid?
The following is how you generate UUIDs in some languages:
java.util.UUID.randomUUID(); // (Java) variant 2, version 4
import uuid // (Python)
uuid.uuid1() // version 1