Compound indexes in MongoDB efficiency unclear - mongodb

Im looking for a structure to save userdata for a discord bot.
The context is that i need a unique save for a user for each discord sever (aka. guild) he is on.
Therefore neither userID nor guildID should be unique, but i could use them as compound index to quickly find users inside the users collection.
Is my train of thought correct until now?
My actual question is:
Which ID should be the first index its "sorted" by?
there are multiple hundred or thousand users per guild, but a single user is on about 1-5 guilds the bot is on.
Therefore first searching by guildID would make the amount of data to search in by userID somewhat smaller.
But first searching for userID would make the amount of data to search in by guildID even smaller.
Since the DB will search both indexes completely anyway, so step1 will be similarly quick for both, the second idea with first filtering by userID and then by guildID seems more efficient to me.
I'd like to know if my assumption seems viable, and if not, why not.
Or if there would be a better way that i haven't thought of.
Thanks in advance!

Compound indexes worked fine.
Still not big enough to see any difference in implementation of them, so i don't know about that.

Related

Looking up submissions by ID with MongoDB/Mongoose

So I'm used to looking up submissions by an auto increment primary ID with MySQL, but after using the MongoDB ORM wrapper, Mongoose, I'm finding that since Mongo stores data within collections differently, there is not really any concept of a traditional auto-increment ID.
I'm stuck trying to figure out how to grab a submission now because normally I'd structure my URL like so:
submission/34/category/slug-goes-here.
Since the 34 now becomes an ugly string based UUID with Mongo, I don't necessary want to display that in my URLs, but I want a unique URL in order to look up my submissions.
I'm thinking of maybe having a set method that when I insert the submission into my database, it generates some kind of 6 character hash e.g. zhXk40 and looks it up like that.
I'm wondering if I do it like this what the performance trade-offs would be. If I made constraints on the slug and then looked them up with the slug, and verified that the category matched, would that be more efficient? Either way I'm going to have to check if the category and slug match, but I'm not sure if an ID is even really necessary in this case.
What's the best practice for creating a route + looking up some piece of data from the db based on that route?
The first thing you should know is:
The _id property doesn't necessarily is that "ugly" ObjectId string.
Actually, the _id just need to be unique within its collection, so if you want to use auto incrementing IDs, there are no problems, however...
If you plan to use sharding within your database, then using auto incrementing field as the _id is overkill.
Why? Read the accepted answer here: Should I implement auto-incrementing in MongoDB?
In my application, as we're not going to shard it, then we use a indexed, numeric ID just for easier usability to the final user, and internally all references are ObjectIds.
Also, here's an good tuto for creating a auto incrementing field in MongoDB: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Sheesh! If we were able to truly know best practice here we'd all be better off. The talking heads are still talking and will be for some time.
How I would approach this is to go as pretty as I could. If I had a usable text string to make it semantic that'd be best case.
If I couldn't do that I'd go with the hash thing you suggested.
With both solutions the challenge will be to ensure that it remains unique. That means lookups before you save.
On performance, it's the same as SQL. Index what you use to look things up with. Mongo does good with composite indexes so category name and hash will look up pretty quick.

MongoDB, sort() and pagination

I known there is already some patterns on pagination with mongo (skip() on few documents, ranged queries on many), but in my situation i need live sorting.
update:
For clarity i'll change point of question. Can i make query like this:
db.collection.find().sort({key: 1}).limit(n).sort({key: -1}).limit(1)
The main point, is to sort query in "usual" order, limit the returned set of data and then reverse this with sorting to get the last index of paginated data. I tried this approach, but it seems that mongo somehow optimise query and ignores first sort() operator.
I am having a huge problem attempting to grasp your question.
From what I can tell when a user refreshes the page, say 6 hours later, it should show not only the results that were there but also the results that are there now.
As #JohnnyHK says MongoDB does "live" sorting naturally whereby this would be the case and MongoDB, for your queries would give you back the right results.
Now I think one problem you might trying to get at here (the question needs clarification, massively) is that due to the data change the last _id you saw might no longer truely represent the page numbers etc or even the diversity of the information, i.e. the last _id you saw is now in fact half way through page 13.
These sorts of things you would probably spend more time and performance trying to solve than just letting the user understand that they have been AFAK for a long time.
Edit
Aha, I think I see what your trying to do now, your trying to be sneaky by getting both the page and the last item in the list at the same time. Unfortunately just like SQL this is not possible. Even if sort worked like that the sort would not function like it should since you can only sort one way on a single field.
However for future reference the sort() function is exactly that on a cursor and until you actually open the cursor by starting to iterate it calling sort() multiple times will just overwrite the cursor property.
I am afraid that this has to be done with two queries, so you get your page first and then client side (I think your looking for the max of that page) scroll through the records to find the last _id or just do a second query to get the last _id. It should be super dupa fast.

MongoDB. Use cursor as value for $in in next query

Is there a way to use the cursor returned by the previous query as a value for $in in the next query? For example, something like this:
var users = db.user.find({state:1})
var offers = db.offer.find({user:{$in:users}})
I think this can reduce the traffic between mongodb and client in case the client doesn't need user information at all, just offers. Am i wrong?
Basically you want to do a join between two collections which Mongo doesn't support. You can reduce the amount of data being transferred from the server by limiting the fields returned from the first query to only the unique user information (i.e. the _id) that you need to get data from the offers collection.
If you really just want to make one query then you should store more information in the offers collection. For example, if you're trying to find offers for active users then you would store the active state of the user in the offers collection.
To work from your comment:
Yes, that's why I used tag 'join' in a question. The idea is that I
can make a first query more сomplex using a bunch of fields and
regexes without storing user data in other collections except
references. In these cases I always have to perform two consecutive
queries, but transfering of the results of the first query is not
necessary neither for me nor for the mongodb itself. I just want to
understand could it be done now, will it be possible to do so in the
future or it cannot be implemented for some technical reasons
As far as I understand it there is no immediate hurry to make this possible. Also the way it is coded atm will make this quite a big change to the way cursors work and are defined. A change big enough to possibly cause implementation breaks for other people. It is really a case of whether to set safe for inserts and updates for all future drivers. It is recognised that safe should be default but this will break implementation for other people who expect it the other way around.
It is rather inefficient if you don't require the results of the first query at all however since most networks are prepped with high traffic in mind and the traffic is cheap there hasn't been a demand to make it able to do chained queries server side in the cursor.
However subselects (which this basically is, it is selecting a set of rows based upon a sub selection of previous rows) have been on mongodb-user a couple of times and there might even be a JIRA for it somewhere, if not might be useful to make one.
As for doing it right now: there is no way.

Query for set complement in CouchDB

I'm not sure that there is a good way to do with with the facilities CouchDB provides, but I'd like to somehow extract the relative complement of the sets of two different document types over a particular key.
For example, let's say that I have documents representing users and posts, both of which have a (unique) username field. There's a validation in place ensuring that a user document exists for the username in every post, but there may be any number post documents with a given username, include none. It's trivial to create a view which counts the number of posts per username. The view can even include zero-counts by emitting zero post-counts for the user documents in the view map function. What I want to do though is retrieve just the list of users who have zero associated posts.
It's possible to build the view I described above and filter client-side for zero-value results, but in my actual situation the number of results could be very, very large, and the interesting results a relatively small proportion of the total. Is there a way to do this sever-side and retrieve back just the interesting results?
I would write a map function to iterate through the documents and emit the users (or just usersnames) with 0 posts.
Then I would write a list function to iterate through the map function results and format them however you want (JSON, csv, etc).
(I would NOT use a reduce function to format the results, even if a reduce function appears to work OK in development. That is just my own experience from lessons learned the hard way.)
Personally I would filter on the client-side until I had performance issues. Next I would probably use Teddy's _filter technique—all pretty standard CouchDB stuff.
However, I stumbled across (IMO) an elegant way to find set complements. I described it when exploring how to find documents missing a field.
The basic idea
Finding non-members of your view obviously can't be done with a simple query (and a straightforward index scan.) However, it can be done in constant memory, and linear time, by simultaneously iterating through two query results at the same time.
One query is for all possible document ids. The other query is for matching documents (those you don't want). Importantly, CouchDB sorts query results, therefore you can calculate the complement efficiently.
See my details in the previous question. The basic idea is you iterate through both (sorted) lists simultaneously and when you say "hey, this document id is listed in the full set but it's missing in the sub-set, that is a hit.
(You don't have to query _all_docs, you just need two queries to CouchDB: one returning all possible values, and the other returning values not to be counted.)

Mongodb: object id as short primary key within a collection

How to make better use of objectId generate by MongoDB. I am not an expert user, but so far i ended up creating seperate id for my object (userid, postid) etc because the object id is too long and makes the url ugly if use as the main ID. I keep the _id intact as it help indexing etc. I was wondering about any better strategy so that one can use mongo objectId as more url friendly and easy to remember key. I read the key was a combination of date etc, so any of the part can be used unique within a collection for this purpose.
thanks,
bsr/
If you have an existing ID (say from an existing data set), then it's perfectly OK to override _id with the one you have.
...keeo the _id intact as it help indexing etc
MongoDB indexes the _id field by default. If you start putting integers in the _id field, they will be indexed like everything else.
So most RDBMs provide an "auto-increment" ID. This is nice for small datasets, but really poor in terms of scalability. If you're trying to insert data to 20 servers at once, how do you keep the "auto-increment" intact?
The normal answer is that you don't. Instead, you end up using things like GUIDs for those IDs. In the case of MongoDB, the ObjectId is already provided.
I was wondering about any better strategy so that one can use mongo objectId as more url friendly and easy to remember key
So the problem here is that "easy to remember" ID doesn't really mesh with "highly scalable database". When you have a billion documents, the IDs are not really "easy to remember".
So you have to make the trade-off here. If you have a table that can get really big, I suggest using the ObjectId. If you have a table that's relatively small and doesn't get updated often, (like a "lookup" table) then you can build your own auto-increment.
The choice is really up to you.
You can overwrite the _id yourself. There is no obligation for using the auto-generated object id. What is the problem with overriding _id inside your app according to your own needs?