MongoDB and big numerical document IDs - mongodb

Mongodb uses BSON format to store data on the disk. BSON defines different data types, including signed int64 for storing big integers.
Let's try to save document with big ID (887190505588098573), that fits in signed int64 range (its absolute value is less than 2^63)
> db.query.insert({_id: 887190505588098573, 'q': 'zzz'})
> db.query.find({_id: 887190505588098573})
{ "_id" : 887190505588098600, "q" : "zzz" }
Well, we got response with document ID that differs from the ID we requested.
What am I missing?

Javascript can't handle a number that big - it only handles integers up to 2^53.
You can see this by putting 887190505588098573 into a JS console and you'll receive 887190505588098600 back.
Non-JS clients hand this just fine. For example, Ruby:
jruby-1.7.12 :013 > c["test"]["query"].insert({_id: 887190505588098574, q: 'zzz'})
=> 887190505588098574
jruby-1.7.12 :016 > c["test"]["query"].find({_id: 887190505588098574}).next()
=> {"_id"=>887190505588098574, "q"=>"zzz"}

There is the NumberLong type in MongoDB that does conform to a 64-Bit Integer (BSON type 18)
db.collection.insert({ "_id": new NumberLong(887190505588098573) })
So that matches on the $type
db.collection.find({ "_id": { "$type": 18 } })
Mileage may vary as to where you can use this though, as a browser client might get an extended JSON form of this but there are limitations on how it can be used without similarly wrapping in a class to handle it.
Working with most other languages should be fine as the driver will cast to the native type. So it really is down to your language implementation as to the practicality of this. But MongoDB itself will handle it.

Related

MongoJsonSchema.builder() does not generate SchemaMap with same specification as MongoDB CSFLE extended JSON Schema

I am trying to implement Automatic Client-Side Field Level Encryption (CSFLE) in MongoDB (Enterprize Edition). The steps provided in MongoDB Docs work perfectly fine. However, spring-data-mongodb provides a way to generate $jsonSchema using MongoJsonSchema.builder() class to avoid writing the schema in raw JSON.
The problem that I am facing is that the schema generated by MongoJsonSchem.builder() differs from the specification mentioned and the example provided in MongoDB Docs here.
To be specific, the example has
"keyId": [
{
"$binary": {
"base64": "<paste_your_key_id_here>",
"subType": "04"
}
}
]
but the schema generated by builder has
"keyId": [
{
"$binary": "<base64 encoded uuid>",
"$type": "03"
}
]
The execution fails only because the format of the key is not what the driver expects.
Unfortunately, the EncryptedJsonSchemaProperty only has keyId() that accepts a string and a keys() that accepts an array of UUIDs. Both of the methods generate schema that doesn't match with the example.
Is there anything that I am missing or is the builder not meant to be used to generate a SchemaMap that could be supplied to AutoEncryptionSettings as yet.
The $type syntax is legacy extended json and is described here. In theory whatever is parsing the extended json should be capable of understanding both formats ($subtype and $type varieties).
The UUID types have recently been standardized here. There is no automatic conversion possible from type 3 to type 4 because there are multiple different implementations that use type 3 but store bytes in different order. So this needs to be fixed on the producing side.
I do not write Java myself but hopefully this helps identify where things are going wrong.
Apparently, the problem was not at all in MongoJsonSchemaBuilder. It builds the schema just fine. It was the conversion to BsonDocument that was converting the UUID to version 3 representation. AutoEncryptionSettings needs BsonDocument for schemaMap().
Finally, I had to supply in a CodecRegistry with UuidCodec with STANDARD representation of UUID. Also had to build the MongoClient with this codec registery and things worked just fine.
Sample code:
final CodecRegistry codecRegistry =
CodecRegistries.fromRegistries(CodecRegistries.fromCodecs(new UuidCodec(STANDARD)), getDefaultCodecRegistry());
final BsonDocument document = schema.toDocument().toBsonDocument(BsonDocument.class, codecRegistry).getDocument("$jsonSchema");
This was finally supplied to AutoEncryptionSettings.builder().schemaMap()
A full sample of the code is here: https://github.com/nishkarsh/mongodb-auto-csfle-demo

How to find by partial binary over BinData field in mongodb?

I have a BinData field in my mongo and I need to make a find over it with partial information.
Let's say that the bindata that I have looks like this:
{ "_id" : ObjectId("5480356518e91efd34e9b5f9"), "test" : BinData(0,"dGVzdA==") }
If I do this query I get the result:
> db.test.find({"test" : BinData(0,"dGVzdA==")})
{ "_id" : ObjectId("5480356518e91efd34e9b5f9"), "test" : BinData(0,"dGVzdA==") }
However I would like to find it with only a part of the binary object.
Is it possible?
Thanks!
"partial" is a vague term - if you're searching for a contiguous block of binary data (needle) at any point in the haystack, you're going to need a very different solution I think, maybe something based on a suffix tree / suffix array for binary data.
If you want to find binary data that starts with specific bytes, you might want to consider storing the data as hex or base64 encoded strings and use a rooted regex for index use. But that is fraught with its own perils (padding, endianness, etc.) and incredibly ugly...
Isn't there a way to store the binary data in a way that MongoDB understands it? That might be easier...

how to set the data type of mongoexport

The problem is that, I find mongoexport cannot preserve the data type in db. For example, there is a field named "tweetID", it should be a string of figures, like "23465478". After export a collection into a csv file, I found that for some entries the tweetID are exported as decimal type, like "254323467.0", while some entries are not. To avoid unnecessary mistakes, I just want to export all the fields in pure string type. Anyone knows how to set this in command mongoexport? Thanks in advance.
You can't. If mongoexport exported 123 as 123.0, then 123 was a Double type in the document. You should try inserting the value as a 32- or 64-bit integer
db.collection.insert({ "tweetId" : NumberLong(1234567) })
mongoexport exports JSON, using strict mode JSON representation, which inserts some type information into the JSON so MongoDB JSON parsers (like mongoimport) can reproduce the correct BSON data types while the exported JSON still conforms to the JSON standard
{ "tweetId" : { "$numberLong" : "1234567" } }
To preserve all the type information, use mongodump/mongorestore instead. To export all field values as strings, you'll need to write your own script with a driver that fetches each doc and stringifies all the values.

Mongodb $strcasecmp. Strange behaviour when the field content has dollar signs

I'm triying to compare two strings on MongoDB Aggregation Framework. This is the query I'm using:
db.people.aggregate({
$project:{
name:1,
balance:1,
compareBalance:{$strcasecmp:["$balance","$2,500.00"]}
}
});
My problem is that each "$balance" field has a dollar sign at the begining of the string, and the results returned by the query seem to be incorrect. For example:
{
"_id" : ObjectId("5257e2e7834a87e7ea509665"),
"balance" : "$1,416.00",
"name" : "Houston Monroe",
"compareBalance" : 1
}
As you can see the results, the field comparision is 1, but it should be -1 because $2,500.00 is higher than $1,416.00. In fact, all comparisions has a value of 1.
There is a workaround by using $substr to remove the dollar sign at the beginning of all fields, but I want to know who is doing this wrong, MongoDB or me.
Thanks in advance.
It sounds like you are trying to use the "balance" field as a numeric, for example might want to compare $10 to $100.
The best way to do this is to store the actual value, and add the formatting, the $ the , etc when displaying to the user.
So, you would have - balance: 2500
Slightly unrelated...
Not sure if you are doing much calculation on the value, but using binary floating point numbers for currency is a bad idea (can't accurately represent all numbers), so, it's often better to store an integer with the cents (or if high precision is required, an integer for hundredths of cents)
This could give: balanceCents: 250000 or balanceFourDec: 25000000
Then you can use $gt $lt and arithmetic
The $ is used as a field reference operator. So, the aggregation pipeline is trying to do a comparison between a field called "$balance" and "$2,500.00":
{
"balance": "$5,000.00",
"2,500.00": undefined
}
Of course, that's not what you are looking for.
You shouldn't start with the $ in the data. Also, unless you've got fixed length strings, sorting and comparisons isn't going to work the way you would expect if you're trying to store numbers as strings. If you're just doing this as an example, I'd suggest you use the actual math operators for numbers, and leave $strcasecmp to actual strings.
you can use the { $literal: < value > } pipeline operator to ignore the cash sign.
https://docs.mongodb.com/manual/reference/operator/aggregation/literal/

How do you insert into mongoDB with specific data type in Perl?

In MySQL for example you have data types such as varchar, int, etc.
I googled and found http://docs.mongodb.org/manual/core/document/#bson-types page. It seems like with string you just use '' or "". Integers seem to be automatically recognized without specifing the type. How would inserting something like this into mongoDB collection in Perl look like?
Example:
{
"Name" : "John"
"Age" : 20
"Weight" : 180.5
"Dateofbirth" : 01/01/1990
}
The reason why I want data type specified in the db is that I can use operators to compare numbers for example. If it is text I cannot do that.
So far I am thinking in Perl:
$my_collection->insert({
'Name' : "$Name",
'Age' : $age,
'Weight':$weight,
'Dateofbirth': $datevar,
} );
In the above code I am not sure how to specify the data type. For example to tell Weight is Double not integer or string.
For numeric types, the Perl MongoDB driver will go by whatever Perl thinks the number is. Perl has an internal flag for keeping track of whether something is a float or an int. The MongoDB driver will use 32 or 64-bit ints depending on your platform. If it looks like a string to Perl, it will be stored as a string in MongoDB.
For date types, you need to wrap the date in a DateTime object, or DateTime::Tiny if you use the dt_type attribute.
Use the looks_like_number parameter in the Perl MongoDB driver
use MongoDB::BSON
$MongoDB::BSON::looks_like_number = 1;
https://metacpan.org/pod/MongoDB::BSON#looks_like_number