How to locally test MongoDB multicollection transactions when standalone mode does not support them - mongodb

I am just starting out with MongoDB and am using the docker mongo instance for local development and testing.
My code has to update 2 collections in the same transaction so that the data is logically consistent:
using (var session = _client.StartSession())
{
session.StartTransaction();
ec.InsertOne(evt);
sc.InsertMany(snapshot.Selections.Select(ms => new SelectionEntity(snapshot.Id, ms)));
session.CommitTransaction();
}
This is failing with the error:
'Standalone servers do not support transactions
The error is obvious, my standalone docker container does not support transactions. I am confused though as this means it's impossible to test code such as the above unless I have a replica set running. This doesn't appear to be listed as a requirement in the documentation - and it refers to the fact that transactions could be multi-document OR distributed:
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions. With distributed transactions, transactions can be used across multiple operations, collections, databases, documents, and shards.
It's not clear to me how to create a multi-document transaction that does not require a replica based server to exist or how to properly test code locally that may not have a mongo replica cluster to work against.
How do people handle this?

For testing puirposes, you could set up a local replica set using docker-compose. There are various blog posts on the topic available, e.g. Create a replica set in MongoDB with docker-compose.
Another option is to use a cluster on MongoDB Atlas. There is a free tier available so you can test this without any extra cost.
In addition, you could change your code so that transactions can be disabled depending on the configuration. This way, you can test the code without transactions locally and enable them on staging or production.

Related

Will an application written for standalone MongoDB work for replica-set or sharded cluster without any changes?

Currently we are working with standalone mongodb without any replication or sharding, Now we are considering moving to replica-set for production purposes.
Will an application written for standalone mongodb will work for replica-set or sharded replica-set without any changes or are there some standalone/replica-set specific features in mongodb ?
Provided the MongoDB uses the default ports (27017 for standalone mongod and mongos) you don't need to touch your client application at all, it will work in either case.
Of course, when you connect to a MongoDB then a sharded cluster has more options, but the defaults are fine.
Will an application written for standalone mongodb will work for
replica-set or sharded replica-set without any changes or are there
some standalone/replica-set specific features in mongodb ?
Here are some things to think about when an application is to run on a replica-set or a sharded cluster. In addition, replica-sets and sharded clusters has some features not available in standalone deployment (see the Transactions and Change Streams topic at the bottom).
Replica Sets
A replica-set is cluster with multiple database servers - with replicated data on each server. The topology of the replica-set has one primary node (or member) and remaining members are secondaries (there can be other special purpose nodes like arbiters).
The data redundancy and failover features of replica-sets give your applications additional capabilties - for example, an application always runs even if a server is down.
The data is always written to the primary and read from it, by default. You can configure that the data can be read from the secondary nodes also from your application - this the Read Preference. This configuration can be used by the applications accessing a replica-set in some scenarios (see Read Preference Use Cases). This is for replica-sets and has no usage for standalone deployment.
Also, see Replica Set Read and Write Semantics:
From the perspective of a client application, whether a MongoDB
instance is running as a single server (i.e. “standalone”) or a
replica set is transparent. However, MongoDB provides additional read
and write configurations for replica sets.
Then, there are some things like, the Connection String URI, which uses different format for replica-set and sharded clusters - this is used by the applications to connect.
Sharded Cluster
The application should not be run in sharded cluster deployment as it is. It will require design level changes - and will affect the queries. Sharding is about distributing the data among shards. Note that in sharded cluster each shard is a replica-set. A sharded database can have sharded and un-sharded collections. Sharded collections are the distributed data.
To create a sharded collection, you must figure a shard key - this is the most important aspect of your application accessing a sharded collection. Shard key determines how the queries access particular shard to get the data. So, your application must take into consideration the shard key - the queries need to be created with shard key usage. Shard key affects the performance of your application queries, primarily.
Also, in the sharded cluster environment the application accesses the database via a mongos router - not the servers directly.
There are many other finer aspects when working with sharded databases and accessing for applications - the topic is too broad to discuss here. Changing from standalone to sharded cluster is an architectural change. Some aspects that can affect the application due to migrating from standalone to a replica-set also apply here (as each shard is a replica-set).
Also, see Operational Restrictions in Sharded Clusters - these are specific to sharded clusters and not applicable to standalone deployments.
Transactions and Change Streams
Features like transactions and change streams are available with replica-sets and sharded clusters only (and not on single standalone servers). This gives additional capabilites to your applications and can solve complex business logic and scenarios.

In MongoDB, can I run the compact command without shutting down each instance?

In the server structure, primary, secondary, and arbiter are each physically operated.
mongo db version is 4.2.3.
Some of the documents were deleted in the oldest order because too many documents were accumulated in a specific collection.
However, even deleting documents did not release the storage area.
Upon checking, I found that mongodb's mechanism retains reusable bytes even if the document is deleted.
Also, I found out that unnecessary disk space can be freed with the compact command in the WiredTiger engine.
Currently, all clients connected to the db are querying using the arbiter ip and port.
Since the DB is composed only of replication, not sharding, if each individual executes the compact command independently, Even if each instance is locked, it is expected that the arbiter will distribute the query to the currently available instances.
Is this possible?
Or, Should I shutdown each instance, run it standalone, run the compact command, and then reconfigure psa?
You may upgrade your MonogDB to latest version 4.4. Documentation of compact:
Blocking
Changed in version 4.4.
Starting in v4.4, on WiredTiger, compact only blocks the following
metadata operations:
db.collection.drop
db.collection.createIndex and db.collection.createIndexes
db.collection.dropIndex and db.collection.dropIndexes
compact does not block MongoDB CRUD Operations for the database it is
currently operating on.
Before v4.4, compact blocked all operations for the database it was
compacting, including MongoDB CRUD Operations, and was therefore
recommended for use only during scheduled maintenance periods.
Starting in v4.4, the compact command is appropriate for use at any
time.
To anyone looking for the answer with 4.4 please see this bug and the documentation entry as the compact routine still forces the node to recovery state if you are running in replica set (and I assume this is the default use case for most projects)

Why is replica set mandatory for transactions in MongoDB?

As per MongoDB documentation, transactions only works for replica sets and not single node. Why such requirement? Isn't it is easier to do transaction stuff on a single node rather than a distributed system?
Implementation of transactions uses sessions which in turn require an oplog. Oplog is provided by replica sets for data synchronization between nodes.
Isn't it is easier to do transaction stuff on a single node rather than a distributed system?
This is true but in practice, MongoDB positions itself as a high-availability database therefore there are rather few production deployments using a standalone server (as far as I know this isn't even an option in Atlas, for example). Hence lack of transaction support on standalone servers typically doesn't affect anything.
Conversely, implementing transactions only on standalone servers would not address the needs of the vast majority of MongoDB deployments/customers that use replica sets and sharded clusters.
For development purposes you can run a single-node replica set which gives you an oplog required for sessions and transactions but still only one mongod process.

Mongo DB - difference between standalone & 1-node replica set

I needed to use Mongo DB transactions, and recently I understood that transactions don't work for Mongo standalone mode, but only for replica sets
(Mongo DB with C# - document added regardless of transaction).
Also, I read that standalone mode is not recommended for production.
So I found out that simply defining a replica set name in the mongod.cfg is enough to run Mongo DB as a replica set instead of standalone.
After changing that, Mongo transactions started working.
However, it feels a bit strange using it as replica-set although I'm not really using the replication functionality, and I want to make sure I'm using a valid configuration.
So my questions are:
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality? (as said I need it to allow transactions)
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
I've read that standalone mode is not recommended for production, although it sounds like it's the most basic configuration. I understand that this configuration is not used in most scenarios, but sometimes you may want to use it as a standard DB on a local machine. So why is standalone mode not recommended? Is it not stable enough, or other reasons?
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality?
You don't have high availability afforded by a proper replica set. Thus it's not recommended for a production deployment. This is fine for development though.
Note that a replica set's function is primarily about high availability instead of scaling.
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
A single-node replica set would have the oplog. This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification).
So why is standalone mode not recommended? Is it not stable enough, or other reasons?
MongoDB in production was designed with a replica set deployment in mind, for:
High availability in the face of node failures
Rolling maintenance/upgrades with no downtime
Possibility to scale-out reads
Possibility to have a replica of data in a special-purpose node that is not part of the high availability nodes
In short, MongoDB was designed to be a fault-tolerant distributed database (scales horizontally) instead of the typical SQL monolithic database (scales vertically). The idea is, if you lose one node of your replica set, the others will immediately take over. Most of the time your application don't even know there's a failure in the database side. In contrast, a failure in a monolithic database server would immediately disrupt your application.
I think kevinadi answered well, but I still want to add it.
A standalone is an instance of mongod that runs on a single server but is not part of a replica set. Standalone instances used for testing and development, but always recomended to use replica sets in production.
A single-node replica set would have the oplog which records all changes to its data sets . This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification). It also supports point in time recovery.
Please follow Convert a Standalone to a Replica Set if you would like to convert the standalone database to replicaset.
Transactions have been introduced in MongoDB version 4.0. Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets. The transaction is not available in standalone because it requires oplog to maintain strong consistency within a cluster.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.