mgo NewObjectId corrupt on insert - mongodb

If I generate a new object id for a document in mgo:
obId := bson.NewObjectId()
and then insert it, it ends up in mongo (looking via the cli) as
"_id" : "U�`�\u0006#�\rU\u0000\u0000\u0001"
When it should be
"_id" : ObjectId("559a47643d9827f0d9405420")
Same goes if I try and update an existing document where I generate the id by
obId := bson.ObjectIdHex(stringId)
It still gets serialized to the corrupted format.
My struct which I'm trying to insert looks like this:
type MyStruct struct {
Id bson.ObjectId `bson:"_id,omitempty" json:"id"`
...
}

The representation "U�`�\u0006#�\rU\u0000\u0000\u0001" is clearly indicating that an ObjectId got sent to the database as a string rather than as a properly typed object id. Every such case before was a code path in the application side delivering the string explicitly as such by mistake. I recommend investigating every code path that inserts objects in that collection, and if you can find no case that is sending it as an actual string, then try to create a reproducer and report it upstream to the mgo driver.
Update: Per your comment below, the issue is being caused because some part of the application is using an ObjectId type from a package that is not the one actually used during communication with the database. This has the effect described above: the ObjectId type coming from the wrong package is just a normal string, as far as the correct bson package is concerned.

Related

Redefine the bson tag in golang, how to be compatible with the fields in the database with lowercase field names?

There are many structures(There are roughly hundreds of structures, all generated automatically by a tool.) like this:
type ChatInfo struct {
Name string `json:"name"`
IconFrame uint64 `json:"iconFrame"`
}
The bson tag was forgotten, causing the field name in the database to become lowercase.
Like this:
{
"name" : "myname",
"iconframe" :101
}
Now, I want to add the bson tag:
type ChatInfo struct {
Name string `json:"name" bson:"name"`
IconFrame uint64 `json:"iconFrame" bson:"iconFrame"`
}
But when I read the data from the database, I find that the value of the IconFrame field is 0.
I want to find a way to maintain compatibility with the original lowercase field names. Because there is a lot of data and the storage is somewhat disorganized, it is not very practical to modify the data in the database.
Thanks.
You could implement custom BSON unmarshaling logic that decodes both iconframe and iconFrame MongoDB fields into the ChatInfo.IconFrame field, but that custom logic would run in all document decoding!
You simply don't want that unnecessary overhead. Simplest is to rename existing fields in MongoDB. You just have to run a few update operations once, and then you'll be good to go forever.
For example, to rename the iconframe field to iconFrame, use this:
db.chatinfo.updateMany(
{"iconframe": {$exists: true}},
{$rename: {"iconframe": "iconFrame"}}
)
(The filter part may even by empty as non-existing fields will not be updated.)

How do you use MongoDB/bson in repository pattern

Good morning,
I'm currently trying to build a REST-API project for learning purposes and want to use the repository pattern as practice.
I have a model module where I define structs to use. Like user and posts (for both I have set the ID to a string). MongoDB uses ObjectIDs as IDs though. So my current solution is converting that in the MongoDB repository awkwardly to a user/post-struct using that ObjectID and then back again to the user/post-model that the rest of the program is using.
Is there a better way to achieve that same thing without awkward struct conversions?
Here is the code for that:
https://github.com/schattenbrot/mini-blog-api
Thanks in advance already for any ideas and tips I can try out.
Mongo uses the field _id as an object's id. When you insert a document without this field, mongo auto generates its own object id which is what is happening in your case. The solution is to insert your struct with the ID field marked as _id.
This can be done by adding a bson tag on your struct with _id. Mongo will use your id instead of generating one on its own. When you marshal to json, the bson tag is ignored so everything else in your application stays the same.
type Post struct {
ID string `json:"id,omitempty" bson:"_id"`
Title string `json:"title,omitempty" validate:"omitempty,min=3,max=40"`
Text string `json:"text,omitempty" validate:"omitempty,min=5,max=700"`
Creator string `json:"user,omitempty" validate:"omitempty"`
CreatedAt time.Time `json:"created_at,omitempty"`
UpdatedAt time.Time `json:"updated_at,omitempty"`
}

Configure pymongo to use string _id instead of ObjectId

I'm using pymongo to seed a database with old information from a different system, and I have a lot of queries like this:
studentId = studentsRemote.insert({'price': price})
In the actual python script, that studentId prints as a string, but in the javascript Meteor application I'm using this data in, it shows up everywhere as ObjectId(...).
I want to configure pymongo to generate the _id as a string and not bother with ObjectId's
Any objects I create with the Meteor specification will use the string format, and not the ObjectId format. I don't want to have mixing of id types in my application, because it's causing me interoperability headaches.
I'm aware I can create ObjectId's from Meteor but frankly I'd much rather use the string format. It's the Meteor default, it's much simpler, and I can't find any good reason to use ObjectId's in my particular app.
The valueOf() mongo function or something similar could parse the _id and be used to update the document once it's in the database, but it would be nice to have something more direct.
in .py files:
from bson.objectid import ObjectId
......
kvdict['_id'] = str(ObjectId())
......
mongoCollection.insert(kvdict)
it's ok!
It ended up being fairly simple.
The son_manipulator module can be used to change incoming documents to a different form. Most of the time this is used to encode custom objects, but it worked for this as well.
With the manipulator in place, it was just a matter of calling the str() function on the ObjectId to make the transformation.
from pymongo.son_manipulator import SONManipulator
class ObjectIdManipulator(SONManipulator):
def transform_incoming(self, son, collection):
son[u'_id'] = str(son[u'_id'])
return son
db.add_son_manipulator(ObjectIdManipulator())

How do I describe a collection in Mongo?

So this is Day 3 of learning Mongo Db. I'm coming from the MySql universe...
A lot of times when I need to write a query for a MySql table I'm unfamiliar with, I would use the "desc" command - basically telling me what fields I should include in my query.
How would I do that for a Mongo db? I know, I know...I'm searching for a schema in a schema-less database. =) But how else would users know what fields to use in their queries?
Am I going at this the wrong way? Obviously I'm trying to use a MySql way of doing things in a Mongo db. What's the Mongo way?
Type the below query in editor / mongoshell
var col_list= db.emp.findOne();
for (var col in col_list) { print (col) ; }
output will give you name of columns in collection :
_id
name
salary
There is no good answer here. Because there is no schema, you can't 'describe' the collection. In many (most?) MongoDb applications, however, the schema is defined by the structure of the object hierarchy used in the writing application (java or c# or whatever), so you may be able to reflect over the object library to get that information. Otherwise there is a bit of trial and error.
This is my day 30 or something like that of playing around with MongoDB. Unfortunately, we have switched back to MySQL after working with MongoDB because of my company's current infrastructure issues. But having implemented the same model on both MongoDB and MySQL, I can clearly see the difference now.
Of course, there is a schema involved when dealing with schema-less databases like MongoDB, but the schema is dictated by the application, not the database. The database will shove in whatever it is given. As long as you know that admins are not secretly logging into Mongo and making changes, and all access to the database is controller through some wrapper, the only place you should look at for the schema is your model classes. For instance, in our Rails application, these are two of the models we have in Mongo,
class Consumer
include MongoMapper::Document
key :name, String
key :phone_number, String
one :address
end
class Address
include MongoMapper::EmbeddedDocument
key :street, String
key :city, String
key :state, String
key :zip, String
key :state, String
key :country, String
end
Now after switching to MySQL, our classes look like this,
class Consumer < ActiveRecord::Base
has_one :address
end
class Address < ActiveRecord::Base
belongs_to :consumer
end
Don't get fooled by the brevity of the classes. In the latter version with MySQL, the fields are being pulled from the database directly. In the former example, the fields are right there in front of our eyes.
With MongoDB, if we had to change a particular model, we simply add, remove, or modify the fields in the class itself and it works right off the bat. We don't have to worry about keeping the database tables/columns in-sync with the class structure. So if you're looking for the schema in MongoDB, look towards your application for answers and not the database.
Essentially I am saying the exactly same thing as #Chris Shain :)
While factually correct, you're all making this too complex. I think the OP just wants to know what his/her data looks like. If that's the case, you can just
db.collectionName.findOne()
This will show one document (aka. record) in the database in a pretty format.
I had this need too, Cavachon. So I created an open source tool called Variety which does exactly this: link
Hopefully you'll find it to be useful. Let me know if you have questions, or any issues using it.
Good luck!
AFAIK, there isn't a way and it is logical for it to be so.
MongoDB being schema-less allows a single collection to have a documents with different fields. So there can't really be a description of a collection, like the description of a table in the relational databases.
Though this is the case, most applications do maintain a schema for their collections and as said by Chris this is enforced by your application.
As such you wouldn't have to worry about first fetching the available keys to make a query. You can just ask MongoDB for any set of keys (i.e the projection part of the query) or query on any set of keys. In both cases if the keys specified exist on a document they are used, otherwise they aren't. You will not get any error.
For instance (On the mongo shell) :
If this is a sample document in your people collection and all documents follow the same schema:
{
name : "My Name"
place : "My Place"
city : "My City"
}
The following are perfectly valid queries :
These two will return the above document :
db.people.find({name : "My Name"})
db.people.find({name : "My Name"}, {name : 1, place :1})
This will not return anything, but will not raise an error either :
db.people.find({first_name : "My Name"})
This will match the above document, but you will have only the default "_id" property on the returned document.
db.people.find({name : "My Name"}, {first_name : 1, location :1})
print('\n--->', Object.getOwnPropertyNames(db.users.findOne())
.toString()
.replace(/,/g, '\n---> ') + '\n');
---> _id
---> firstName
---> lastName
---> email
---> password
---> terms
---> confirmed
---> userAgent
---> createdAt
This is an incomplete solution because it doesn't give you the exact types, but useful for a quick view.
const doc = db.collectionName.findOne();
for (x in doc) {
print(`${x}: ${typeof doc[x]}`)
};
If you're OK with running a Map / Reduce, you can gather all of the possible document fields.
Start with this post.
The only problem here is that you're running a Map / Reduce on which can be resource intensive. Instead, as others have suggested, you'll want to look at the code that writes the actual data.
Just because the database doesn't have a schema doesn't mean that there is no schema. Generally speaking the schema information will be in the code.
I wrote a small mongo shell script that may help you.
https://gist.github.com/hkasera/9386709
Let me know if it helps.
You can use a UI tool mongo compass for mongoDb. This shows all the fields in that collection and also shows the variation of data in it.
If you are using NodeJS and want to get the all the field names using the API request, this code works for me-
let arrayResult = [];
db.findOne().exec(function (err, docs)){
if(err)
//show error
const JSONobj = JSON.parse(JSON.stringify(docs));
for(let key in JSONobj) {
arrayResult.push(key);
}
return callback(null, arrayResult);
}
The arrayResult will give you entire field/ column names
Output-
[
"_id",
"emp_id",
"emp_type",
"emp_status",
"emp_payment"
]
Hope this works for you!
Consider you have collection called people and you want to find the fields and it's data-types. you can use below query
function printSchema(obj) {
for (var key in obj) {
print( key, typeof obj[key]) ;
}
};
var obj = db.people.findOne();
printSchema(obj)
The result of this query will be like below,
you can use Object.keys like in JavaScript
Object.keys(db.movies.findOne())

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

I'm trying to build small web-system (url shortcutting) using nonsql Cassandra DB, the problem I stack is id auto generation.
Did someone already stack with this problem?
Thanks.
P.S. UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence
UPDATED:
The reason why I'm not ok with GUID ids is inside of scope of my application.
My app has url shortcutting part, and I do need to make url as short as possible. So I follow next approach: I'm taking numbers starting from 0 and convert it base64 string. So in result I have url like mysite.com/QA (where QA is base 64 string).
This is was very easy to implement using SQL DB, I just took auto incremented ID, convert it to URL and was 100-percents sure, that URL is unique.
Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).
It can be done by using the findandmodify command.
Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:
> db.runCommand( { "findandmodify" : "sequences",
"query" : { "name" : "postid"},
"update" : { $inc : { "id" : 1 }},
"new" : true } );
This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.
Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?
You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.
Im not sure I follow you. What language are you using? Are we talking about uuid?
The following is how you generate UUIDs in some languages:
java.util.UUID.randomUUID(); // (Java) variant 2, version 4
import uuid // (Python)
uuid.uuid1() // version 1