Mongo c# driver freezes and never returns a value on Update() - mongodb

I have a long running operation that inserts thousands of sets of entries, each time a set is inserted using the code below.
After a while of this code running, the collection.Update() method freezes (does not return) and the entire process grinds to a halt.
Can't find any reasonable explanation for this anywhere.
I've looked at the mongod logs, nothing unusual, it just stops receiving requests from this process.
Mongo version: 2.4.1, C# driver version: 1.8.0
using (_mongoServer.RequestStart(_database))
{
var collection = GetCollection<BsonDocument>(collectionName);
// Iterate over all records
foreach (var recordToInsert in recordsDescriptorsToInsert)
{
var query = new QueryDocument();
var update = new UpdateBuilder();
foreach (var property in recordToInsert)
{
var field = property.Item1;
var value = BsonValue.Create(property.Item2);
if (keys.Contains(field))
query.Add(field, value);
update.Set(field, value);
}
collection.Update(query, update, UpdateFlags.Upsert); // ** NEVER RETURNS **
}
}

This is may related to this: CSHARP-717
It was fixed for driver 1.8.1

Related

Official Javascript MongoDB driver is slower than Python PyMongo for the same query?

I've been trying out Javascript for my new backend, but I noticed something odd. When I use Python's PyMongo library, fetching all my data runs twice as fast (33.5s -> 16.44s) as when I use the official Javascript MongoDB module. The test setups are as follows:
mongo = MongoClient(URI) # initializing client
rn = time() # starting timer
for ID in LIST_OF_IDS:
results = mongo["current_day_info"][ID].find() # fetches all the documents in the collection named <ID>
results = [[result["time"].timestamp() / 10, result["buy_price"], result["sell_price"], result["buy_volume"], result["sell_volume"], result["week_buy_amount"], result["week_sell_amount"]] for result in results] # expands list of documents into array of their values
print("mongodb", time()-rn) # ending timer
const client = new MongoClient(URI); // initializing client
async function main() {
await client.connect(); // connecting to client
const database = client.db("current_day_info"); // goes to proper database
console.time("mongodb"); // starting timer
for (let ID of LIST_OF_IDS) {
let results = [];
const documents = database.collection(ID).find({}); // fetches all the documents in the collection named <ID>
documents.forEach(item => results.push([new Date(item.time).getTime(), item.buy_price, item.sell_price, item.buy_volume, item.sell_volume, item.week_buy_amount, item.week_sell_amount])); // expands list of documents into array of their values
}
console.timeEnd("mongodb"); // ending timer
}
main();
My best guess is that PyMongo has some parts written in C, but I wouldn't think that results in such a drastic increase?
Versions:
Python 3.8.10, PyMongo 4.0.2, MongoDB version 5.0.6, MongoDB Node Driver version 4.4.1, NodeJS version 17.7.1
Copied from comments below:
Example document:
{
"time":"ISODate(""2022-03-12T23:23:45.000Z"")",
"_id":"ObjectId(""622d2b8c83d4c06e792895cb"")",
"buy_volume":2079625,
"sell_price":5.5,
"sell_volume":10210328,
"week_sell_amount":68184205,
"week_buy_amount":10783950,
"buy_price":7.4
}
Also, tried the same in MongoSH and it took significantly longer (7 minutes) I assume I messed something up there:
async function main() {
console.time("mongodb");
for (let ID of LIST_OF_IDS) {
let results = [];
const collection = db[ID];
await collection.find({}).forEach((item) => results.push([cut for charlimit]));
}
console.timeEnd("mongodb");
}

How do I confirm I am reading the data from Mongo secondary server from Java

For performance optimisation we are trying to read data from Mongo secondary server for selected scenarios. I am using the inline query using "withReadPreference(ReadPreference.secondaryPreferred())" to read the data, PFB the code snippet.
What I want to confirm the data we are getting is coming from secondary server after executing the inline query highlighted, is there any method available to check the same from Java or Springboot
public User read(final String userId) {
final ObjectId objectId = new ObjectId(userId);
final User user = collection.withReadPreference(ReadPreference.secondaryPreferred()).findOne(objectId).as(User.class);
return user;
}
Pretty much the same way in Java. Note we use secondary() not secondaryPrefered(); this guarantees reads from secondary ONLY:
import com.mongodb.ReadPreference;
{
// This is your "regular" primaryPrefered collection:
MongoCollection<BsonDocument> tcoll = db.getCollection("myCollection", BsonDocument.class);
// ... various operations on tcoll, then create a new
// handle that FORCES reads from secondary and will timeout and
// fail if no secondary can be found:
MongoCollection<BsonDocument> xcoll = tcoll.withReadPreference(ReadPreference.secondary());
BsonDocument f7 = xcoll.find(queryExpr).first();
}

Parallel.Foreach and BulkCopy

I have a C# library which connects to 59 servers of the same database structure and imports data to my local db to the same table. At this moment I am retrieving data server by server in foreach loop:
foreach (var systemDto in systems)
{
var sourceConnectionString = _systemService.GetConnectionStringAsync(systemDto.Ip).Result;
var dbConnectionFactory = new DbConnectionFactory(sourceConnectionString,
"System.Data.SqlClient");
var dbContext = new DbContext(dbConnectionFactory);
var storageRepository = new StorageRepository(dbContext);
var usedStorage = storageRepository.GetUsedStorageForCurrentMonth();
var dtUsedStorage = new DataTable();
dtUsedStorage.Load(usedStorage);
var dcIp = new DataColumn("IP", typeof(string)) {DefaultValue = systemDto.Ip};
var dcBatchDateTime = new DataColumn("BatchDateTime", typeof(string))
{
DefaultValue = batchDateTime
};
dtUsedStorage.Columns.Add(dcIp);
dtUsedStorage.Columns.Add(dcBatchDateTime);
using (var blkCopy = new SqlBulkCopy(destinationConnectionString))
{
blkCopy.DestinationTableName = "dbo.tbl";
blkCopy.WriteToServer(dtUsedStorage);
}
}
Because there are many systems to retrieve data, I wonder if it is possible to use Pararel.Foreach loop? Will BulkCopy lock the table during WriteToServer and next WriteToServer will wait until previous will complete?
-- EDIT 1
I've changed Foreach to Parallel.Foreach but I face one problem. Inside this loop I have async method: _systemService.GetConnectionStringAsync(systemDto.Ip)
and this line returns error:
System.NotSupportedException: A second operation started on this
context before a previous asynchronous operation completed. Use
'await' to ensure that any asynchronous operations have completed
before calling another method on this context. Any instance members
are not guaranteed to be thread safe.
Any ideas how can I handle this?
In general, it will get blocked and will wait until the previous operation complete.
There are some factors that may affect if SqlBulkCopy can be run in parallel or not.
I remember when adding the Parallel feature to my .NET Bulk Operations, I had hard time to make it work correctly in parallel but that worked well when the table has no index (which is likely never the case)
Even when worked, the performance gain was not a lot faster.
Perhaps you will find more information here: MSDN - Importing Data in Parallel with Table Level Locking

At what point does the MongoDB C# driver open a connection?

I'm having a problem with lots of connections being opened to the mongo db.
The readme on the Github page for the C# driver gives the following code:
using MongoDB.Bson;
using MongoDB.Driver;
var client = new MongoClient("mongodb://localhost:27017");
var server = client.GetServer();
var database = server.GetDatabase("foo");
var collection = database.GetCollection("bar");
collection.Insert(new BsonDocument("Name", "Jack"));
foreach(var document in collection.FindAll())
{
Console.WriteLine(document["Name"]);
}
At what point does the driver open the connection to the server? Is it at the GetServer() method or is it the Insert() method?
I know that we should have a static object for the client, but should we also have a static object for the server and database as well?
Late answer... but the server connection is created at this point:
var client = new MongoClient("mongodb://localhost:27017");
Everything else is just getting references for various objects.
See: http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-csharp-driver/
While using the latest MongoDB drivers for C#, the connection happens at the actual database operation. For eg. db.Collection.Find() or at db.collection.InsertOne().
{
//code for initialization
//for localhost connection there is no need to specify the db server url and port.
var client = new MongoClient("mongodb://localhost:27017/");
var db = client.GetDatabase("TestDb");
Collection = db.GetCollection<T>("testCollection");
}
//Code for db operations
{
//The connection happens here.
var collection = db.Collection;
//Your find operation
var model = collection.Find(Builders<Model>.Filter.Empty).ToList();
//Your insert operation
collection.InsertOne(Model);
}
I found this out after I stopped my mongod server and debugged the code with breakpoint. Initialization happened smoothly but error was thrown at db operation.
Hope this helps.

Error inserting document into mongodb from scala

Trying to insert into a mongodb database from scala. the below codes dont create a db or collection. tried using the default test db too. how do i perform CRUD operations?
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
}
On a first glance things look OK in your code although you have that stray def addMongo(): Unit = {
code at the top. I'll defer to a suggestion on looking for errors here.... Two items of note:
1) save() and insert() are complementary operations - you only need one. insert() will always attempt to create a new document ... save() will create one if the _id field isn't set, and update the represented _id if it does.
2) Mongo clients do not wait for an answer to a write operation by default. It is very possible & likely that an error is occurring within MongoDB causing your write to fail. the getLastError() command will return the result of the last write operation on the current connection. Because MongoDB's Java driver uses connection pools you have to tell it to lock you onto a single connection for the duration of an operation you want to run 'safely' (e.g. check result). This is the easiest way from the Java driver (in Scala, sample code wise, though):
mongo.requestStart() // lock the connection in
coll.insert(obj) // attempt the insert
getLastError.throwOnError() // This tells the getLastError command to throw an exception in case of an error
mongo.requestDone() // release the connection lock
Take a look at this excellent writeup on MongoDB's Write Durability, which focuses specifically on the Java Driver.
You may also want to take a look at the Scala driver I maintain (Casbah) which wraps the Java driver and provides more scala functionality.
We provide among other things an execute-around-method version of the safe write concept in safely() which makes things a lot easier for testing for writes' success.
You just missed the addMongo call in main. The fix is trivial:
object Store {
def main(args: Array[String]) = {
def addMongo(): Unit = {
var mongo = new Mongo()
var db = mongo.getDB("mybd")
var coll = db.getCollection("somecollection")
var obj = new BasicDBObject()
obj.put("name", "Mongo")
obj.put("type", "db")
coll.insert(obj)
coll.save(obj)
println("Saved") //to print to console
}
addMongo // call it!
}