Creating short, unique object id's in MongoDB - mongodb

I'm making an app similar to instagram using Rails/Mongoid. I want a unique ID that I can use in a url like http://instagr.am/p/DJmU8/
What's the easiest way to do that? Can I derive such an ID from the default BSON ObjectID Mongo creates?

You may try to use first 4 bytes of ObjectID (they will represent timestamp).
But, to be 100% safe, it's better to produce really unique short id, by implementing a counter. You can use separate collection to maintain current value of your counter.
More details on mongo's ObjectID structure can be found here: http://www.mongodb.org/display/DOCS/Object+IDs
As an alternative you can convert convert hex string id representation to a representation based on 36 symbols (26 latin letters + 10 digits). It will obviously be shorter.
It seems, that there is a ruby library, that can do such conversions http://rubyworks.github.com/radix/

Why not use dylang/shortid?
Install using npm npmjs.com/package/shortid:
npm i shortid
Then require:
const shortid = require('shortid');
In mongoose schema:
new Schema {
_id: {
type: String,
default: shortid.generate
}
}
or just insert directly:
users.insert({
_id: shortid.generate()
name: ...
email: ...
});

You could try Mongoid::Token
https://github.com/thetron/mongoid_token
From the docs:
This library is a quick and simple way to generate unique, random
tokens for your mongoid documents, in the cases where you can't, or
don't want to use slugs, or the default MongoDB IDs.
Mongoid::Token can help turn this:
http://myawesomewebapp.com/video/4dcfbb3c6a4f1d4c4a000012/edit
Into something more like this:
http://myawesomewebapp.com/video/83xQ3r/edit

#aav was mention that you can use first 4 bytes, but this value are in seconds and you can get even 10.000 or more insert per seconds. Other thing objectID is Uniq and you need check "when" you get error from duplicate value "Write Concerns"?
new Date().getTime() - is in milliseconds => 1557702577900 why not use last 4 bytes ? Timestamp in base62 is rqiJgPq
This code look interesting:
https://github.com/treygriffith/short-mongo-id/blob/master/lib/objectIdToShortId.js
Check also ObjectID timestamp parser:
https://steveridout.github.io/mongo-object-time/
Or you can execute ObjectId().toString() and base of this string create new by hashids [nodejs,php, andmanymore]
Maybe best options it to use 4-5 bytes from js timestamp and INC from
bson then hash this value by hids
var id = ObjectID('61e33b8467a45920f80eba52').toString();
console.log("id:", id);
console.log("timestamp:", parseInt(id.substring(0, 8),16).toString());
console.log("machineID:", parseInt(id.substring(8, 14),16) );
console.log("processID:", parseInt(id.substring(14, 18),16) );
console.log("counter:", parseInt(id.slice(-6),16) );
var ObjTimestamp = parseInt(ObjectID2.substring(0, 8),16).toString().slice(-5);
var Counter = parseInt(ObjectID2.slice(-6),16);
//https://github.com/base62/base62.js/
console.log('Final:',base62.encode(parseInt(Counter) + parseInt(ObjTimestamp) ));
Output:
vI7o
15V9L
5t4l
You can get collision if: more process are running, then consider add PID to unique and when you run multiple instance on different computer

Try gem https://github.com/jffjs/mongoid_auto_inc

The Hashids library is meant for generating IDs like this. Check it out here ☞ https://github.com/peterhellberg/hashids.rb

Related

Insert field name with dot in mongo document

A Meteor server code tries to insert an object into a Mongo collection. The value of one of the property is a string which contains a dot i.e. ".".
Meteor terminal is complaining :-
Error: key Food 1.1 and drinks must not contain '.'
What does this mean and how to fix it?
let obj = {
food: group,
rest: rule,
item: item[0],
key: i
};
FoodCol.insert(obj);
edit
The suggested answer by Kishor for replacing the "." with "\uff0E" will produce a space after the dot which is not what a user expects.
From this link, How to use dot in field name?
You can replace dot symbols of your field name to Unicode equivalent
"\uff0E":
Update: As Fred suggested, please use "\u002E" for "."
We solved this issue by encoding (Base64) the key before insertion and decode after taking out from the db. Since we consume the document as it is and query fields are different and their keys are not encoded.
But if u want to make query using this key or the key should be readable to the user, this solution will be not be suitable.

call custom python function on every document in a collection Mongo DB

I want to call a custom python function on some existing attribute of every document in the entire collection and store the result as a new key-value pair in that (same) document. May I know if there's any way to do that (since each call is independent of others) ?
I noticed cursor.forEach but can't it be done just using python efficiently ?
A simple example would be to split the string in text and store the no. of words as a new attribute.
def split_count(text):
# some complex preprocessing...
return len(text.split())
# Need something like this...
db.collection.update_many({}, {'$set': {"split": split_count('$text') }}, upsert=True)
But it seems like setting a new attribute in a document based on the value of another attribute in the same document is not possible this way yet. This post is old but the issues seem to be still open.
I found a way to call any custom python function on a collection using parallel_scan in PyMongo.
def process_text(cursor):
for row in cursor.batch_size(200):
# Any complex preprocessing here...
split_text = row['text'].split()
db.collection.update_one({'_id': row['_id']},
{'$set': {'split_text': split_text,
'num_words': len(split_text) }},
upsert=True)
def preprocess(num_threads=4):
# Get up to max 'num_threads' cursors.
cursors = db.collection.parallel_scan(num_threads)
threads = [threading.Thread(target=process_text, args=(cursor,)) for cursor in cursors]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
This is not really faster than cursor.forEach (but not that slow either), but it helps me execute any arbitrarily complex python code and save the results from within Python itself.
Also if I have an array of ints in one of the attributes, doing cursor.forEach converts them to floats which I don't want. So I preferred this way.
But I would be glad to know if there're any better ways than this :)
It is quite unlikely that it will ever be efficient to do this kind of thing in python. This is because the document would have to make a round trip and go through the python function on the client machine.
In your example code, you are passing the result of a function to a mongodb update query, which won't work. You can't run any python code inside mongodb queries on the db server.
As the answer to you linked question suggests, this type of action has to be performed in the mongo shell. e.g:
db.collection.find().snapshot().forEach(
function (elem) {
splitLength = elem.text.split(" ").length
db.collection.update(
{
_id: elem._id
},
{
$set: {
split: splitLength
}
}
);
}
);

Though I have a record in database, it's giving "Mongoid::Errors::DocumentNotFound"

Though I have the record with id 13163 (db.locations.find({_id: 13163})), it's giving me error:
Mongoid::Errors::DocumentNotFound in LocationsController#show
Problem: Document(s) not found for class Location with id(s) 13163.
Summary: When calling Location.find with an id or array of ids, each
parameter must match a document in the database or this error will be
raised. The search was for the id(s): 13163 ... (1 total) and the
following ids were not found: 13163. Resolution: Search for an id that
is in the database or set the Mongoid.raise_not_found_error
configuration option to false, which will cause a nil to be returned
instead of raising this error when searching for a single id, or only
the matched documents when searching for multiples.
# Use callbacks to share common setup or constraints between actions.
def set_location
#location = Location.find(params[:id])
end
locations_controller.rb:
class LocationsController < ApplicationController
before_action :set_location, only: [:show, :edit, :update, :destroy]
# GET /locations
# GET /locations.json
def index
#locations = Location.all
end
# GET /locations/1
# GET /locations/1.json
def show
end
private
# Use callbacks to share common setup or constraints between actions.
def set_location
#location = Location.find(params[:id])
end
# Never trust parameters from the scary internet, only allow the white list through.
def location_params
params.require(:location).permit(:loc_name_en, :loc_name_jp, :channel)
end
end
Setting up the option raise_not_found_error: false is not the case as I do have a document in database.
SOLUTION:
Big thanks to #mu is too short for giving me a hint.
The problem can be solved in 2 ways:
Declare field :_id, type: Integer in the model location.rb
Or converting the passing parameter to Integer like Location.find(params[:id].to_i) in locations_controller.rb as shown below in the #mu is too short's answer
I'd guess that you have a type problem. You say that this:
db.locations.find({_id: 13163})
finds the document in the MongoDB shell. That means that you have a document in the locations collection whose _id is the number 13163. If you used the string '13163':
db.locations.find({_id: '13163'})
you won't find your document. The value in params[:id] is probably a string so you're saying:
Location.find('13163')
when you want to say:
Location.find(13163)
If the _id really is a number then you'll need to make sure you call find with a number:
Location.find(params[:id].to_i)
You're probably being confused because sometimes Mongoid will convert between Strings and Moped::BSON::ObjectIds (and sometimes it won't) so if your _id is the usual ObjectId you can say:
Model.find('5016cd8b30f1b95cb300004d')
and Mongoid will convert that string to an ObjectId for you. Mongoid won't convert a String to a number for you, you have to do that yourself.

Play with Meteor MongoID in Monk

I'm building an API using Express and Monk that connects to a database where wrote are mainly handled by a Meteor application.
I know that Meteor uses its own algorithm to generate IDs. So when I do something like that:
id = "aczXLTjzjjn3PchX6" // this is an ID generated by Meteor (not a valid MongoID)
Users.findOne({ _id: id }, function(err, doc) {
console.log(doc);
});
Monk outputs:
Argument passed in must be a single String of 12 bytes or a string of 24 hex characters.
This way, it seems very tricky to me to design a solid and reliable REST API. Thus, I have two questions:
How can I handle the difference in my queries between ids generated by Meteor and valid MongoID()? Is there a simple way to get JSON results from a Meteor database?
Will it be a problem to insert documents from the API which this time will have a valid MongoId()? I will end up with both type of ids in my database, seems very bad to me. :/
As I said in here, in a similar issue you can just override the id converter part of monk:
var idConverter = Users.id; // Keep a reference in case...
Users.id = function (str) { return str; };
But don't expect monk to convert Ids automatically anymore.
How can I handle the difference in my queries between ids generated by Meteor and valid MongoID()? Is there a simple way to get JSON results from a Meteor database?
Nothing much you need to do. When it is a valid ObjectId(mongo db ids) and you got a string just convert it to Object id:
id = ObjectId(id);
User.find(id, ...)
Here's the implementation for monk id method(this.col.id is reference to mongodb native ObjectId):
Collection.prototype.id =
Collection.prototype.oid = function (str) {
if (null == str) return this.col.id();
return 'string' == typeof str ? this.col.id(str) : str;
};
Will it be a problem to insert documents from the API which this time will have a valid MongoId()? I will end up with both type of ids in my database, seems very bad to me. :/
It is bad. Though it won't cause much trouble(in my experience in nodejs) if you be careful. But not all the times you are careful(programmer errors happen a lot), but it's manageable. In static languages(like Java) this is a big NO, because a field can only one type(either string or ObjectId).
My suggestion is that don't use mongodb ObjectId at all and just use strings as ids. On inserts just give it string _id, so the driver won't give it ObjectId. Though you can stop the driver from doing so by overriding pkFactory, but it doesn't seem to be easy with monk.
One more thing is that monk is not actively maintained and it's just a thin layer on top of mongodb. In my experience if you have multiple collections and large/complex code base mongoose will be much better to use.
Just to keep this question updated.
As stated in the docs, Monk is automatically casting the strings into OjbectID.
In order to disable this behaviour without using hacky solutions you'll have to disable that feature. In order to do that you just need to set castIds to false when getting the db.
So:
const Users = db.get('users', { castIds: false });
Now this will work:
Users.findOne({ _id: "aczXLTjzjjn3PchX6" }, function(err, doc) {
console.log(doc);
});

Autocomplete with Firebase

How does one use Firebase to do basic auto-completion/text preview?
For example, imagine a blog backed by Firebase where the blogger can tag posts with tags. As the blogger is tagging a new post, it would be helpful if they could see all currently-existing tags that matched the first few keystrokes they've entered. So if "blog," "black," "blazing saddles," and "bulldogs" were tags, if the user types "bl" they get the first three but not "bulldogs."
My initial thought was that we could set the tag with the priority of the tag, and use startAt, such that our query would look something like:
fb.child('tags').startAt('bl').limitToFirst(5).once('value', function(snap) {
console.log(snap.val())
});
But this would also return "bulldog" as one of the results (not the end of the world, but not the best either). Using startAt('bl').endAt('bl') returns no results. Is there another way to accomplish this?
(I know that one option is that this is something we could use a search server, like ElasticSearch, for -- see https://www.firebase.com/blog/2014-01-02-queries-part-two.html -- but I'd love to keep as much in Firebase as possible.)
Edit
As Kato suggested, here's a concrete example. We have 20,000 users, with their names stored as such:
/users/$userId/name
Oftentimes, users will be looking up another user by name. As a user is looking up their buddy, we'd like a drop-down to populate a list of users whose names start with the letters that the searcher has inputted. So if I typed in "Ja" I would expect to see "Jake Heller," "jake gyllenhaal," "Jack Donaghy," etc. in the drop-down.
I know this is an old topic, but it's still relevant. Based on Neil's answer above, you more easily search doing the following:
fb.child('tags').startAt(queryString).endAt(queryString + '\uf8ff').limit(5)
See Firebase Retrieving Data.
The \uf8ff character used in the query above is a very high code point
in the Unicode range. Because it is after most regular characters in
Unicode, the query matches all values that start with queryString.
As inspired by Kato's comments -- one way to approach this problem is to set the priority to the field you want to search on for your autocomplete and use startAt(), limit(), and client-side filtering to return only the results that you want. You'll want to make sure that the priority and the search term is lower-cased, since Firebase is case-sensitive.
This is a crude example to demonstrate this using the Users example I laid out in the question:
For a search for "ja", assuming all users have their priority set to the lowercased version of the user's name:
fb.child('users').
startAt('ja'). // The user-inputted search
limitToFirst(20).
once('value', function(snap) {
for(key in snap.val()){
if(snap.val()[key].indexOf('ja') === 0) {
console.log(snap.val()[key];
}
}
});
This should only return the names that actually begin with "ja" (even if Firebase actually returns names alphabetically after "ja").
I choose to use limitToFirst(20) to keep the response size small and because, realistically, you'll never need more than 20 for the autocomplete drop-down. There are probably better ways to do the filtering, but this should at least demonstrate the concept.
Hope this helps someone! And it's quite possible the Firebase guys have a better answer.
(Note that this is very limited -- if someone searches for the last name, it won't return what they're looking for. Hence the "best" answer is probably to use a search backend with something like Kato's Flashlight.)
It strikes me that there's a much simpler and more elegant way of achieving this than client side filtering or hacking Elastic.
By converting the search key into its' Unicode value and storing that as the priority, you can search by startAt() and endAt() by incrementing the value by one.
var start = "ABA";
var pad = "AAAAAAAAAA";
start += pad.substring(0, pad.length - start.length);
var blob = new Blob([start]);
var reader = new FileReader();
reader.onload = function(e) {
var typedArray = new Uint8Array(e.target.result);
var array = Array.prototype.slice.call(typedArray);
var priority = parseInt(array.join(""));
console.log("Priority of", start, "is:", priority);
}
reader.readAsArrayBuffer(blob);
You can then limit your search priority to the key "ABB" by incrementing the last charCode by one and doing the same conversion:
var limit = String.fromCharCode(start.charCodeAt(start.length -1) +1);
limit = start.substring(0, start.length -1) +limit;
"ABA..." to "ABB..." ends up with priorities of:
Start: 65666565656565650000
End: 65666665656565650000
Simples!
Based on Jake and Matt's answer, updated version for sdk 3.1. '.limit' no longer works:
firebaseDb.ref('users')
.orderByChild('name')
.startAt(query)
.endAt(`${query}\uf8ff`)
.limitToFirst(5)
.on('child_added', (child) => {
console.log(
{
id: child.key,
name: child.val().name
}
)
})