How to find by partial binary over BinData field in mongodb? - mongodb

I have a BinData field in my mongo and I need to make a find over it with partial information.
Let's say that the bindata that I have looks like this:
{ "_id" : ObjectId("5480356518e91efd34e9b5f9"), "test" : BinData(0,"dGVzdA==") }
If I do this query I get the result:
> db.test.find({"test" : BinData(0,"dGVzdA==")})
{ "_id" : ObjectId("5480356518e91efd34e9b5f9"), "test" : BinData(0,"dGVzdA==") }
However I would like to find it with only a part of the binary object.
Is it possible?
Thanks!

"partial" is a vague term - if you're searching for a contiguous block of binary data (needle) at any point in the haystack, you're going to need a very different solution I think, maybe something based on a suffix tree / suffix array for binary data.
If you want to find binary data that starts with specific bytes, you might want to consider storing the data as hex or base64 encoded strings and use a rooted regex for index use. But that is fraught with its own perils (padding, endianness, etc.) and incredibly ugly...
Isn't there a way to store the binary data in a way that MongoDB understands it? That might be easier...

Related

how to set the data type of mongoexport

The problem is that, I find mongoexport cannot preserve the data type in db. For example, there is a field named "tweetID", it should be a string of figures, like "23465478". After export a collection into a csv file, I found that for some entries the tweetID are exported as decimal type, like "254323467.0", while some entries are not. To avoid unnecessary mistakes, I just want to export all the fields in pure string type. Anyone knows how to set this in command mongoexport? Thanks in advance.
You can't. If mongoexport exported 123 as 123.0, then 123 was a Double type in the document. You should try inserting the value as a 32- or 64-bit integer
db.collection.insert({ "tweetId" : NumberLong(1234567) })
mongoexport exports JSON, using strict mode JSON representation, which inserts some type information into the JSON so MongoDB JSON parsers (like mongoimport) can reproduce the correct BSON data types while the exported JSON still conforms to the JSON standard
{ "tweetId" : { "$numberLong" : "1234567" } }
To preserve all the type information, use mongodump/mongorestore instead. To export all field values as strings, you'll need to write your own script with a driver that fetches each doc and stringifies all the values.

MongoDB and big numerical document IDs

Mongodb uses BSON format to store data on the disk. BSON defines different data types, including signed int64 for storing big integers.
Let's try to save document with big ID (887190505588098573), that fits in signed int64 range (its absolute value is less than 2^63)
> db.query.insert({_id: 887190505588098573, 'q': 'zzz'})
> db.query.find({_id: 887190505588098573})
{ "_id" : 887190505588098600, "q" : "zzz" }
Well, we got response with document ID that differs from the ID we requested.
What am I missing?
Javascript can't handle a number that big - it only handles integers up to 2^53.
You can see this by putting 887190505588098573 into a JS console and you'll receive 887190505588098600 back.
Non-JS clients hand this just fine. For example, Ruby:
jruby-1.7.12 :013 > c["test"]["query"].insert({_id: 887190505588098574, q: 'zzz'})
=> 887190505588098574
jruby-1.7.12 :016 > c["test"]["query"].find({_id: 887190505588098574}).next()
=> {"_id"=>887190505588098574, "q"=>"zzz"}
There is the NumberLong type in MongoDB that does conform to a 64-Bit Integer (BSON type 18)
db.collection.insert({ "_id": new NumberLong(887190505588098573) })
So that matches on the $type
db.collection.find({ "_id": { "$type": 18 } })
Mileage may vary as to where you can use this though, as a browser client might get an extended JSON form of this but there are limitations on how it can be used without similarly wrapping in a class to handle it.
Working with most other languages should be fine as the driver will cast to the native type. So it really is down to your language implementation as to the practicality of this. But MongoDB itself will handle it.

MongoDB: Using *NOT-dot* notation in sort

I'm having an issue with the Mongo sort on nested collection and Google search didn't help:
Dot notation works (returns first element from sorted collection):
db.myCollection.find().sort({ 'comments.Comment' : -1 })[0]
Array (Not-dot) notation doesn't work (always returns first element from un-sorted collection):
db.myCollection.find().sort({ "comments['Comment']" : -1 })[0]
For some business reasons I would like my app to be dynamic and handle spaces/pluses/and few more un-standard characters as the keys in the documents,
So far I was ok with it but sort always returns first (unordered) result if it can't understand the key which I want to sort on.
Simply put:
"For some business reasons I would like my app to be dynamic and handle spaces/pluses/and few more un-standard characters as the keys in the documents"
Yeah, well bad luck it's not valid JSON notation, it may be JavaScript notation but that doesn't mean it's valid JSON. And the BSON spec derives from this fact.
You have dot (.) notation and that is it. So basically your condition is parsed as "invalid" and is ignored, hence no sorting is done how you expect.
Feel free to raise a JIRA issue with MongoDB if you believe this is important.

Am i correctly using indexes for this mongoDB?

So i need some advice as to what i'm doing incorrectly.
My database is setup up exactly like a file system consisting of folders and files.
It begins with a folder, but can have a relatively infinite number of subfolders and or files.
{
"name":"folder1",
"uniqueID":"zzz0",
"subcontents": [ {"name":"subfolder1", "uniqueID":"zzz1"},
{"name":"subfile1", "uniqueID":"zzz2"},
{"name":"subfile2", "uniqueID":"zzz3"},
{"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"},
]
}
Each folder/file document have a uniqueID so that I can reference to it (seen above zzz#). My question is, can I make a mongoDB query to pull out only a single document?
Like say for example db.fileSystemCollection.find({"uniqueID":"zzz4"}) and it would give me the following result? Do i have to use indexes to do this? I've been trying but the query returns empty every time.
intended result ---> {"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"}
[EDIT]
Based on the responses below, I will consider an XML database instead on mongoDB. The json structure cant be rearranged to work with MongoDB (too much data).
Short answer is no, as it's stated by Chris.
Your embedded representation of a tree is really good for intuitive understanding (and implementation as well). But if you want to allow effective searches on your tree using indices in MongoDB, you might consider another ways for tree storage. A bunch of ways is listed at http://docs.mongodb.org/manual/tutorial/model-tree-structures/
Please keep in mind that every representation has its own pros and cons depending on your access patterns.
Since for filesystem-like structure it's likely to have the ability to find all the sub contents of a given folder, you may use child references pattern for this:
{
"name":"folder1",
"uniqueID":"zzz0",
"subcontents": [ "zzz1",
"zzz2",
"zzz3",
"zzz4"
]
}
{
"name":"subfolder1",
"uniqueID":"zzz1"
}
...
No; searching for {uniqueID: "zzz4"} will only get you documents whose top-level uniqueID matches.
What you probably want is to maintain an array on the document which lists all the unique IDs in that tree. So your document would be:
{
"name":"folder1",
"uniqueID":"zzz0",
"idList": ["zzz0", "zzz1", "zzz2", "zzz3", "zzz4"],
"subcontents": [ {"name":"subfolder1", "uniqueID":"zzz1"},
{"name":"subfile1", "uniqueID":"zzz2"},
{"name":"subfile2", "uniqueID":"zzz3"},
{"name":"subfolder2", "subcontents": [...etc...], "uniqueID":"zzz4"},
]
}
Then you can index that:
db.fileSystemCollection.ensureIndex({"idList": 1})
Then you can find on it:
db.fileSystemCollection.find({"idList": "zzz4})
That'll return you those documents.
As an aside, if you're trying to store files in Mongo, have you looked at GridFS?

How do you insert into mongoDB with specific data type in Perl?

In MySQL for example you have data types such as varchar, int, etc.
I googled and found http://docs.mongodb.org/manual/core/document/#bson-types page. It seems like with string you just use '' or "". Integers seem to be automatically recognized without specifing the type. How would inserting something like this into mongoDB collection in Perl look like?
Example:
{
"Name" : "John"
"Age" : 20
"Weight" : 180.5
"Dateofbirth" : 01/01/1990
}
The reason why I want data type specified in the db is that I can use operators to compare numbers for example. If it is text I cannot do that.
So far I am thinking in Perl:
$my_collection->insert({
'Name' : "$Name",
'Age' : $age,
'Weight':$weight,
'Dateofbirth': $datevar,
} );
In the above code I am not sure how to specify the data type. For example to tell Weight is Double not integer or string.
For numeric types, the Perl MongoDB driver will go by whatever Perl thinks the number is. Perl has an internal flag for keeping track of whether something is a float or an int. The MongoDB driver will use 32 or 64-bit ints depending on your platform. If it looks like a string to Perl, it will be stored as a string in MongoDB.
For date types, you need to wrap the date in a DateTime object, or DateTime::Tiny if you use the dt_type attribute.
Use the looks_like_number parameter in the Perl MongoDB driver
use MongoDB::BSON
$MongoDB::BSON::looks_like_number = 1;
https://metacpan.org/pod/MongoDB::BSON#looks_like_number