Best practices for finding objects from a list of ids? - mongodb

I need to retrieve all documents associated with a list of ids.
I thought of a couple of ways to do it:
Use a filter, then add a or clause and for each id add a id equals x condition to that or.
Use the in operator, like query.field("_id").in(ids)
something else?
Is there a method considered to be a best practice for this kind of case?
Also, which method performs best in large data sets?

Related

Best Way to Implement Unique ID DynamoDB Swift

I am working on an app where users can create posts that uses Amazon DynamoDB. One of the attributes of a post item in the database is postId. I am searching for the best practice to set this value upon creation. So far, I have thought of:
Counting the current items in the DB and then assigning the value as postId = dbcount + 1. I cannot find a count method for DynamoDB using Swift, and the ways I have found (scan & description) are either inefficient or accurate. Also, I thought of the scenario of 2 users posting at the same time.
I could create a UUID with Swift and set the postId to this value.
Upon these 2 options, which route is better? Is there a preferred industry standard? Option 2 seems to be the better choice, but I am not sure. Are there any other potential alternatives? Thank you!
I would definitely stay away from option 1 - as you said the potential for a race condition is too high and it could be expensive to implement too.
A UUID would certain work and is likely to be the least painful. However, there are other options too. An atomic counter would work. A bit more complicated but you could even use a conditional write. But the logic for that would be a pain.
The advantage of the UUID is that you generate it so that it can be used for, as an example, a row of data in a child table.

what is the best way to retrive information in a graph through has Step

I'm using titan graph db with tinkerpop plugin. What is the best way to retrieve a vertex using has step?
Assuming employeeId is a unique attribute which has a unique vertex centric index defined.
Is it through label
i.e g.V().has(label,'employee').has('employeeId','emp123')
g.V().has('employee','employeeId','emp123')
(or)
is it better to retrieve a vertex based on Unique properties directly?
i.e g.V().has('employeeId','emp123')
Which one of the two is the quickest and better way?
First you have 2 options to create the index:
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).buildCompositeIndex()
mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).indexOnly(employee).buildCompositeIndex()
For option 1 it doesn't really matter which query you're going to use. For option 2 it's mandatory to use g.V().has('employee','employeeId','emp123').
Note that g.V().hasLabel('employee').has('employeeId','emp123') will NOT select all employees first. Titan is smart enough to apply those filter conditions, that can leverage an index, first.
One more thing I want to point out is this: The whole point of indexOnly() is to allow to share properties between different types of vertices. So instead of calling the property employeeId, you could call it uuid and also use it for employers, companies, etc:
mgmt.buildIndex('employeeById', Vertex.class).addKey(uuid).indexOnly(employee).buildCompositeIndex()
mgmt.buildIndex('employerById', Vertex.class).addKey(uuid).indexOnly(employer).buildCompositeIndex()
mgmt.buildIndex('companyById', Vertex.class).addKey(uuid).indexOnly(company).buildCompositeIndex()
Your queries will then always have this pattern: g.V().has('<label>','<prop-key>','<prop-value>'). This is in fact the only way to go in DSE Graph, since we got completely rid of global indexes that span across all types of vertices. At first I really didn't like this decision, but meanwhile I have to agree that this is so much cleaner.
The second option g.V().has('employeeId','emp123') is better as long as the property employeeId has been indexed for better performance.
This is because each step in a gremlin traversal acts a filter. So when you say:
g.V().has(label,'employee').has('employeeId','emp123')
You first go to all the vertices with the label employee and then from the employee vertices you find emp123.
With g.V().has('employeeId','emp123') a composite index allows you to go directly to the correct vertex.
Edit:
As Daniel has pointed out in his answer, Titan is actually smart enough to not visit all employees and leverages the index immediately. So in this case it appears there is little difference between the traversals. I personally favour using direct global indices without labels (i.e. the first traversal) but that is just a preference when using Titan, I like to keep steps and filters to a minimum.

Efficient way to get a random element in Scala?

What is an efficient way to get a random element from a collection in Scala? There's a related question here, but like one of the comments pointed out, "[that] question does not specify any efficiency needs".
An arbitrary collection cannot be accessed in constant time. So you need some special collection with the desired property. For instance — Vector or Array. See Performance Characteristics of collections for others.
util.Random.shuffle(List.range(1,100)) take 3
Use a collection with a constant-time size() and get() method.
If you need random order of all collection elements, then Random.shuffle is what you need. (You'd better convert the original collection to array to avoid forward and backward conversion.)

Advanced Queries in REST

I'm trying to create a more advanced query mechanism for REST. Assume I have the following:
GET /data/users
and it returns a list of users. Then to filter the users returned for example I'd say:
GET /data/users?age=30
to get a list of 30 year old users. Now lets say I want users aged 30 - 40. I'd like to have essentially a set of reusable operators such as:
GET /data/users?greaterThan(age)=30&lessThan(age)=40
The greaterThan and lessThan would be reusable on other numeric, date, etc fields. This would also allow me to add other operators (contains, starts with, ends with, etc). I'm a REST noob so I'm not sure if this violates any of the core principles REST follows. Any thoughts?
Alternately, you might simply be better off with optional parameters "minAge" and "maxAge".
Alternative 2: encode the value(s) for parameters to indicate the test to be performed: inequalities, pattern matching etc.
This gets messy no matter what you do for complex boolean expressions. At some point, you almost want to make a document format for the query description itself, but it's hard to think of it as a "GET" anymore.
I would look into setting the value of the query parameter to include syntax for operators and such .. something like this for a range of values
/data/users?age=[30,40]
or
/data/users?age=>30&age=<40
would make it a little easier to read, just make sure to url encode if you are using any reserved characters

MongoDB: What's a good way to get a list of all unique tags?

What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!