MongoDB - is this efficient? - mongodb

I have a collection users in mongodb. I am using the node/mongodb API.
In my node.js if I go:
var users = db.collection("users");
and then
users.findOne({ ... })
Is this idiomatic and efficient?
I want to avoid loading all users into application memory.

In Mongo, collection.find() returns a cursor, which is an abstract representation of the results that match your query. It doesn't return your results over the wire until you iterate over the cursor by calling cursor.next(). Before iterating over it, you can, for instance, call cursor.limit(x) to limit the number of results that it is allowed to return.
collection.findOne() is effectively just a shortcut version of collection.find().limit(1).next(). So the cursor is never allowed to return more than the one result.
As already explained, the collection object itself is a facade allowing access to the collection, but which doesn't hold any actual documents in memory.
findOne() is very efficient, and it is indeed idiomatic, though IME it's more used in dev/debugging than in real application code. It's very useful in the CLI, but how often does a real application need to just grab any one document from a collection? The only case I can think of is when a given query can only ever return one document (i.e. an effective primary key).
Further reading:
Cursors,
Read Operations

Yes, that should only load one user into memory.
The collection object is exactly that, it doesn't return all users when you create a new collection object, only when you either use a findOne or you iterate a cursor from the return of a find.

Related

Firestore pagination by offset

I would like to create two queries, with pagination option. On the first one I would like to get the first ten records and the second one I would like to get the other all records:
.startAt(0)
.limit(10)
.startAt(9)
.limit(null)
Can anyone confirm that above code is correct for both condition?
Firestore does not support index or offset based pagination. Your query will not work with these values.
Please read the documentation on pagination carefully. Pagination requires that you provide a document reference (or field values in that document) that defines the next page to query. This means that your pagination will typically start at the beginning of the query results, then progress through them using the last document you see in the prior page.
From CollectionReference:
offset(offset) → {Query}
Specifies the offset of the returned results.
As Doug mentioned, Firestore does not support Index/offset - BUT you can get similar effects using combinations of what it does support.
Firestore has it's own internal sort order (usually the document.id), but any query can be sorted .orderBy(), and the first document will be relative to that sorting - only an orderBy() query has a real concept of a "0" position.
Firestore also allows you to limit the number of documents returned .limit(n)
.endAt(), .endBefore(), .startAt(), .startBefore() all need either an object of the same fields as the orderBy, or a DocumentSnapshot - NOT an index
what I would do is create a Query:
const MyOrderedQuery = FirebaseInstance.collection().orderBy()
Then first execute
MyOrderedQuery.limit(n).get()
or
MyOrderedQuery.limit(n).get().onSnapshot()
which will return one way or the other a QuerySnapshot, which will contain an array of the DocumentSnapshots. Let's save that array
let ArrayOfDocumentSnapshots = QuerySnapshot.docs;
Warning Will Robinson! javascript settings is usually by reference,
and even with spread operator pretty shallow - make sure your code actually
copies the full deep structure or that the reference is kept around!
Then to get the "rest" of the documents as you ask above, I would do:
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).get()
or
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).onSnapshot()
which will start AFTER the last returned document snapshot of the FIRST query. Note the re-use of the MyOrderedQuery
You can get something like a "pagination" by saving the ordered Query as above, then repeatedly use the returned Snapshot and the original query
MyOrderedQuery.startAfter(ArrayOfDocumentSnapshots[n-1]).limit(n).get() // page forward
MyOrderedQuery.endBefore(ArrayOfDocumentSnapshots[0]).limit(n).get() // page back
This does make your state management more complex - you have to hold onto the ordered Query, and the last returned QuerySnapshot - but hey, now you're paginating.
BIG NOTE
This is not terribly efficient - setting up a listener is fairly "expensive" for Firestore, so you don't want to do it often. Depending on your document size(s), you may want to "listen" to larger sections of your collections, and handle more of the paging locally (Redux or whatever) - Firestore Documentation indicates you want your listeners around at least 30 seconds for efficiency. For some applications, even pages of 10 can be efficient; for others you may need 500 or more stored locally and paged in smaller chucks.

Get Deleted Object in AbstractMongoEventListener

I want to run some logic when an Object get deleted from MongoDB. I am using SpringData Mongo.
I am using AbstractMongoEventListener as the object can be deleted from collection through number of ways and I am overriding the
public void onBeforeDelete(BeforeDeleteEvent<Object> event)
method. But there are no method in event object which will return the Object I am going to delete.
event.getSource() and event.getDocument() returns the document. How can I get the object.
Somehow this Event seems to be messed up. In difference to the other MongoMappingEvent<T> descendents, this one inherits a MongoMappingEvent<Document> through AbstractDelteEvent<T>. I cannot explain this difference.
But as I also was in need to retrive the Documents before deleting them, I used the debugger to find, it is possible to retrive the Document Ids, using some hackish shit undocumented get("Key")-chain.
event.getDocument()
.get("_id", Document.class) // BSON Document!
.getList("$in", ObjectId.class) // ObjectId.class or what ever Type your Id is.
With that you can retrive a list of the ids of your documents. Take the repository or what ever, and use those ids to fetch the documents.
I do really not like using those string-key-things that I have not found in a documentation, as who knows when they will be removed.
I would love to remove this answer as soon as someone provides a less hackish way.
Be aware, that when you are using an #EventHandler, it can not consider the type parameter.

MongoDb and expressJS trying to undertand Code

function _allUsers(callback){
var db = connect.get();
db.collection("users").find({}).toArray(function(err,data){
if(err){
callback(err);
}else{
callback(null,data);
}
});
}
I am trying to understand this code, I have been looking around the web but I find the explanations kinda defficult to understand ( I am new at Mean stack), so my questions are:
What does the Collection method do? I am not sure but the string "users" is it just the name of our collection with all users?
Why do we have to use a callback in this situation? (I find callbacks very confusing).
And why do we have to give toArray function, an annonymous function?
Instead of toArray could I use pretty method() without any annonymous function as a parameter?
MEAN Stack is a software bundle of software programs supporting applications written in all javascript. This means you can use javascript from your database, to your back-end and front-end.
MEAN actually stands for the first characters of each software program included in the stack. MongoDB, Expressjs, AngularJS and NodeJS.
1
MongoDB is a NoSQL database which uses BSON (similar to JSON) to store so called documents. Look at a document as if it is a single entity or row in a traditional database. These entities (or rows) are stored in collections (a collection of documents) which can be compared to tables.
So the answer to your 1st question is opens up the users collection, which grants access to all the user documents.
2
NodeJS is asynchronous by design. This allows NodeJS to perform a lot of operations while running on a single thread*. Because NodeJS is single-threaded we need a way to write our code non-blocking meaning we can start an operation, proceed with executing other code and come back whenever that operation is finished.
In your case we request access to the users collection, this takes some time. In order to allow other parts of our application to continue processing we use a callback. When we have access to our collection, our callback is executed and we can perform whatever operation we wanted to do when we first requested access.
*NodeJS actually runs on multiple threads but a developer never has to worry about multithreading, NodeJS does that for us.'
3
This is exactly what the previous point is about.
The .toArray() method returns an array that contains all the documents from a cursor. The method iterates completely the cursor, loading all the documents into RAM and exhausting the cursor. Source
.toArray() is a computionally intensive operation. Since we do not want to wait untill .toArray() is finished but proceed processing the rest of our code, we give it a callback so that we can come back to our collection processing whenever it's ready.
4
From what I can read from the docs I guess you could indeed write blocking code and do it this way:
var users = db.collection("users").find({}).toArray();
This however will block your code entirely. There is never a good reason to do this.
Disclaimer: I left out or oversimplified details in this explanation for ease of understanding.
db.collection('users') this will return the users collection instance
we are using callback for asynchronous
the annonymous function in toArray is its callback
this is dependent on the library in use..
without any annonymous function as a parameter
expressjs is an asynchronous programming, we need callback || Promises
You can think of the collection as of table in MySQL. A collection consists of documents (rows/items/records in MySQL). Your example calls the Users collection and finds all documents (records) in it.
About the callbacks - NodeJS/Express are commonly callbacks-oriented. This is the pattern they use and most of the code is using it, because it is asynchronous. If you need to be sure that some snippet is executed right after some other snippet, you have to use callback (or promise).
Calling toArray() depends on what your callback expects. You can skip calling this method if the callback expects the Query object returned by the find() method. All that depends on your callback.
You can use non-anonymous function, too, but you have to have in mind the asynchronous logic and continue using callbacks/promises. You can read more about callbacks and promises in this Quora's article.
Here you can find more about the find() method.

What is the most optimal method of querying a reliable dictionary collection

I would like to use service fabric reliable dictionaries to store data that we intend to query via a rest interface. I was wondering what was the most optimal approach to querying data held in those collections.
The documentation doesn't seem to provide any methods to support querying a IReliableDictionary interface apart from simply enumerating over the collection.
Is the only option to fetch every object out of the reliable dictionary and pop into a List<> collection for querying?
Thanks
As Vaclav mentions in his answer, you can always enumerate the entire dictionary and filter as you wish. Additionally, take a look at the docs for IReliableDictionary, specifically CreateEnumerableAsync(ITransaction, Func<TKey, Boolean>, EnumerationMode). This allows you to filter the keys you wish to include in the enumeration up-front, which may improve perf (e.g. by avoiding paging unnecessary values back in from disk).
Simple Example (exception handling/retry/etc. omitted for brevity):
var myDictionary = await this.StateManager.GetOrAddAsync<IReliableDictionary<int, string>>(new Uri("fabric:/myDictionary"));
using (var tx = this.StateManager.CreateTransaction())
{
// Fetch all key-value pairs where the key an integer less than 10
var enumerable = await myDictionary.CreateEnumerableAsync(tx, key => key < 10, EnumerationMode.Ordered);
var asyncEnumerator = enumerable.GetAsyncEnumerator();
while (await asyncEnumerator.MoveNextAsync(cancellationToken))
{
// Process asyncEnumerator.Current.Key and asyncEnumerator.Current.Value as you wish
}
}
You could create canned or dynamic key filters corresponding to your REST queries which search based on key - for value-based queries you will need to fall back to full enumeration (e.g. by using the same pattern as above, but omitting the key filter) or use notifications in order to build your own secondary index.
IReliableDictionary implements IEnumerable, which means you can query it with LINQ expressions, same way you would with an IDictionary.
Note that an enumeration over IReliableDictionary uses snapshot isolation, so it is lock free. Basically this means when you start the enumeration you are enumerating over the dictionary as it is at the time you started. If items are added or removed while you're enumerating, those changes won't appear in that enumeration. More info about this and Reliable Collections in general here: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-services-reliable-collections/
Look here What would be the best way to search an IReliableDictionary?
There is two way to do it
Eli Arbel has async extension methods on Gist that provide a
bridge to Ix-Async and also async implementations of Select,
SelectMany, and Where.
You can wrap IAsyncEnumerable in a regular
IEnumerable that simply wraps the asynchronous
This answer: Convert IReliableDictionary to IList shows how to convert SerivceFabric's IAsyncEnumerable to a generic IAsyncEnumerable. Then you can use Async Linq provided by dotnet.

Returning custom fields in MongoDB

I have a mongoDB collection with an array field that represents the lists the user is member of.
user {
screen_name: string
listed_in: ['list1', 'list2', 'list3', ...] //Could be more than 10000 elements (I'm aware of the BSON 16MB limits)
}
I am using the *listed_in* field to get the members list
db.user.find({'listed_in': 'list2'});
I also need to query for a specific user and know if he is member of certain lists
var user1 = db.findOne({'screen_name': 'user1'});
In this case I will get the *listed_in* field with all its members.
My question is:
Is there a way to pre-compute custom fields in mongoDB?
I would need to be able to get fields like these, user1.isInList1, user1.isInList2
Right now I have to do it in the client side by iterating through the *listed_in* array to know if the user is member of "list1" but *listed_in* could have thousand elements.
My question is: Is there a way to pre-compute custom fields in mongoDB?
Not really. MongoDB does not have any notion of "computed columns". So the query you're looking for doesn't exist.
Right now I have to do it in the client side by iterating through the *listed_in* array to know if the user is member of "list1" but *listed_in* could have thousand elements
In your case you're basically trying to push a client-side for loop onto the server. However, some process still has to do the for loop. And frankly, looping through 10k items is not really that much work for either client or server.
The only real savings here is preventing extra data on the network.
If you really want to save that network traffic, you will need to restructure your data model. This re-structure will likely involve two queries to read and write, but less data over the wire. But that's the trade-off.