MongoDB: Global read preference (set via connection string) vs Query level read preference (set as options in query), which takes precedence? - mongodb

I am running a mongoDB cluster. A nodejs application connects to this db and I have set readPreference= secondaryPreferred. However there is one critical flow which writes the document and then subsequently reads the same document. While we read it, we are getting stale reads,to avoid this, we wanted to set read preference to primary in that specific query which is fetching the same document. Since the global read preference(via connection string) is set to secondary and one of the query is setting read preference via query, does mongoDb prioritise between two? or will global preference override query specific preference?
I couldn't get this in below mongodb docs.
https://www.mongodb.com/docs/manual/core/read-preference/

We were able to provide the read preference as an option to find function of mongoose.
await UserModel.findOne(query, {}, {readPreference : 'primary'});
The global read preference was set to secondary in connection string as below:
mongodb://db0.example.com,db1.example.com,db2.example.com/?replicaSet=myRepl&readPreference=secondary
The query did honor the parameter sent over query options and was reading data from primary

Related

Is it mandatory to restart MongoDB after adding new index on collection?

A MongoDB collection is slow to provide data as it has grown huge overtime.
I need to add an index on a few fields and to reflect it immediately in search. So I seek for clarification on followings things:
Is it mandatory to restart MongoDB after indexing?
If yes, then is there any way to add index without restarting the server? I don't want any downtime...
MongoDB does not need to be restarted after indexing.
However, by default, the createIndex operation blocks read/write on the affected database (note that it is not only the collection but the db). You may change the behaviour using background mode like this:
db.collectionName.createIndex( { collectionKey: 1 }, { background: true } )
It might seem that your client is blocked when creating the index. The mongo shell session or connection where you are creating the index will block, but if there are more connections to the database, these will still be able to query and operate on the database.
Docs: https://docs.mongodb.com/manual/core/index-creation/
There is no need to restart MongoDB after you add an index!
However,an index could be created in the foreground which is the default.
What does it mean? MongoDB documentation states: ‘By default, creating an index on a populated collection blocks all other operations on a database. When building an index on a populated collection, the database that holds the collection is unavailable for reading or write operations until the index build completes. Any operation that requires a read or writes lock on all databases will wait for the foreground index build to complete’.
For potentially long-running index building operations on standalone deployments, the background option should be used. In that case, the MongoDB database remains available during the index building operation.
To create an index in the background, the following snippet should be used, see the image below.

Caching query results in MongoDB

I will be working on a large data set that changes slowly so I want to optimize the query result time by using a caching mechanism. For example , if I want to see some metrics about the data from the last 360 days I don't need to query the database again because I can reuse the last query result.
Does MongoDB natively support caching or do I have to use another database , for example Redis as mentioned here
EDIT : my question is different from Caching repeating query results in MongoDB because I asked about external caching systems and the response in the late question was specific to working with MongoDB and Tornado
The author of the Motor (MOngo + TORnado) package gives an example of caching his list of categories here: http://emptysquare.net/blog/refactoring-tornado-code-with-gen-engine/
Basically, he defines a global list of categories and queries the database to fill it in; then, whenever he need the categories in his pages, he checks the list: if it exists, he uses it, if not, he queries again and fills it in. He has it set up to invalidate the list whenever he inserts to the database, but depending on your usage you could create a global timeout variable to keep track of when you need to re-query next. If you're doing something complicated, this could get out of hand, but if it's just a list of the most recent posts or something, I think it would be fine.

Dynamically changing the database shard that I am connecting too

I want to have a pool of database connections, connecting to various sharded databases.
On a per query basis I will pass in the tenant/customerId, and based on the customerId I will choose which database to connect and use for the current query.
Is this something that can be done with Slick out of the box?
Not supported out of the box, but shouldn't be too hard to implement I think. I created a ticket: https://github.com/slick/slick/issues/703

Any way to get streaming reads from a Database?

Is there someway where if I want to read some data from the database with certain constraints that instead of waiting to get all the results at once, the database can start "streaming" it's results to me.
Think of a large list.
Instead of making users wait for the entire list, I want to start filling is data quickly, even if I only get one row at a time.
I only know of MongoDB with limit(x) and skip(y).
Any way to get the streaming result from any database? I want to know for curiosity, and for a project I'm currently thinking about.
here's example of python connection to mongodb and getting data line by line
from pymongo import MongoClient
client = MongoClient()
db = client.blog
col = db.posts
for r in col.find():
print r
raw_input("press any key to continue...")
All standard MongoDB drivers return a cursor on queries (find() command), which allows your application to stream the documents by using the cursor to pull back the results on demand. I would check out the documentation on cursors for the specific driver that you're planning to use as syntax varies between different programming languages.
There's also a special type of cursor specific for certain streaming use cases. MongoDB has a concept of a "Tailable Cursor," which will stream documents to the client as documents are inserted into a collection (also see AWAIT_DATA option). Note that Tailable cursors only work on "capped collections" as they've been optimized for this special usage. Documentation can be found on the www.mongodb.org site. Below is a link to some code samples for tailable cursors:
http://docs.mongodb.org/manual/tutorial/create-tailable-cursor/

Can MongoDB be a consistent event store?

When storing events in an event store, the order in which the events are stored is very important especially when projecting the events later to restore an entities current state.
MongoDB seems to be a good choice for persisting the event store, given its speed and flexibel schema (and it is often recommended as such) but there is no such thing as a transaction in MongoDB meaning the correct event order can not be garanteed.
Given that fact, should you not use MongoDB if you are looking for a consistent event store but rather stick with a conventional RDMS, or is there a way around this problem?
I'm not familiar with the term "event store" as you are using it, but I can address some of the issues in your question. I believe it is probably reasonable to use MongoDB for what you want, with a little bit of care.
In MongoDB, each document has an _id field which is by default in ObjectId format, which consists of a server identifier, and then a timestamp and then a sequence counter. So you can sort on that field and you'll get your objects in their creation order, provided the ObjectIds are all created on the same machine.
Most MongoDB client drivers create the _id field locally before sending an insert command to the database. So if you have multiple clients connecting to the database, sorting by _id won't do what you want since it will sort first by server-hash, which is not what you want.
But if you can convince your MongoDB client driver to not include the _id in the insert command, then the server will generate the ObjectId for each document and they will have the properties you want. Doing this will depend on what language you're working in since each language has its own client driver. Read the driver docs carefully or dive into their source code -- they're all open source. Most drivers also include a way to send a raw command to the server. So if you construct an insert command by hand this will certainly allow you to do what you want.
This will break down if your system is so massive that a single database server can't handle all of your write traffic. The MongoDB solution to needing to write thousands of records per second is to set up a sharded database. In this case the ObjectIds will again be created by different machines and won't have the nice sorting property you want. If you're concerned about outgrowing a single server for writes, you should look to another technology that provides distributed sequence numbers.