try_join to make mongodb transactions sent at the same time

try_join to make mongodb transactions sent at the same time - mongodb

I'm new to Rust and I'm using the default MongoDB driver
https://docs.rs/mongodb/2.0.0/mongodb/
I remember when coding with Node.js, there was a possibility to send transactions with some Promise.all() in order to execute all transactions at the same time for optimization purposes, and if there are no errors, to make a commit to the transaction.
(Node.js example here: https://medium.com/#alkor_shikyaro/transactions-and-promises-in-node-js-ca5a3aeb6b74)
I'm trying to implement the same logic in Rust now, using try_join! but I'm always opposed to the problem:
error: cannot borrow session as mutable more than once at a time;
label: first mutable borrow occurs here
use mongodb::{bson::oid::ObjectId, Client, Database, options};
use async_graphql::{
validators::{Email, StringMaxLength, StringMinLength},
Context, ErrorExtensions, Object, Result,
};
use futures::try_join;
//use tokio::try_join; -> same thing
#[derive(Default)]
pub struct UserMutations;
#[Object]
impl UserMutations {
async fn user_followed<'ctx>(
&self,
ctx: &Context<'ctx>,
other_user_id: ObjectId,
current_user_id: ObjectId,
) -> Result<bool> {
let mut session = Client::with_uri_str(dotenv!("URI"))
.await
.expect("DB not accessible!")
.start_session(Some(session_options))
.await?;
session.start_transaction(Some(options::TransactionOptions::builder()
.read_concern(Some(options::ReadConcern::majority()))
.write_concern(Some(
options::WriteConcern::builder()
.w(Some(options::Acknowledgment::Majority))
.w_timeout(Some(Duration::new(3, 0)))
.journal(Some(false))
.build(),
))
.selection_criteria(Some(options::SelectionCriteria::ReadPreference(
options::ReadPreference::Primary
)))
.max_commit_time(Some(Duration::new(3, 0)))
.build())).await?;
let db = Client::with_uri_str(dotenv!("URI"))
.await
.expect("DB not accessible!").database("database").collection::<Document>("collection");
try_join!(
db.update_one_with_session(
doc! {
"_id": other_user_id
},
doc! {
"$inc": { "following_number": -1 }
},
None,
&mut session,
),
db.update_one_with_session(
doc! {
"_id": current_user_id
},
doc! {
"$inc": { "followers_number": -1 }
},
None,
&mut session,
)
)?;
Ok(true)
}
}
849 | | &mut session,
| | ------------ first mutable borrow occurs here
... |
859 | | &mut session,
| | ^^^^^^^^^^^^ second mutable borrow occurs here
860 | | )
861 | | )?;
| |_____________- first borrow later captured here by closure
Is there any way to send transaction functions sync to not lose any time on independent mutations? Does anyone have any ideas?
Thanks in advance!

Thanks, Patrick and Zeppi for your answers, I did some more research on this topic and also did my own testing. So, let's start.
First, my desire was to optimize transactional writes as much as possible, since I wanted the complete rollback possibility required by code logic.
In case you missed my comments to Patrick, I'll restate them here to better reflect what was my way of thinking about this:
I understand why this would be a limitation for multiple reads, but if
all actions are on separate collections (or are independent atomic
writes to multiple documents with different payloads) I don't see why
it's impossible to retain casual consistency while executing them
concurrently. This kind of transaction should never create race
conditions / conflicts / weird lock behaviour, and in case of error
the entire transaction is rolled back before being committed anyways.
Making an analogy with Git (which might be wrong), no merge conflicts
are created when separate files / folders are updated. Sorry for being
meticulous, this just sounds like a major speed boost opportunity.
But, after lookups I was opposed to this documentation:
https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst#why-does-a-network-error-cause-the-serversession-to-be-discarded-from-the-pool
An otherwise unrelated operation that just happens to use that same
server session will potentially block waiting for the previous
operation to complete. For example, a transactional write will block a
subsequent transactional write.
Basically, this means that even if you will send transaction writes concurrently, you won't gain much efficiency because MongoDB itself is a blocker. I decided to check if this was true, and since NodeJS driver setup allows to send transactions concurrently (as per: https://medium.com/#alkor_shikyaro/transactions-and-promises-in-node-js-ca5a3aeb6b74) I did a quick setup with NodeJS pointing to the same database hosted by Atlas in the free tier.
Second, statistics and code: That's the NodeJS mutation I will be using for tests (each test has 4 transactional writes). I enabled GraphQL tracing to benchmark this, and here are the results of my tests...
export const testMutFollowUser = async (_parent, _args, _context, _info) => {
try {
const { user, dbClient } = _context;
isLoggedIn(user);
const { _id } = _args;
const session = dbClient.startSession();
const db = dbClient.db("DB");
await verifyObjectId().required().validateAsync(_id);
//making sure asked user exists
const otherUser = await db.collection("users").findOne(
{ _id: _id },
{
projection: { _id: 1 }
});
if (!otherUser)
throw new Error("User was not found");
const transactionResult = session.withTransaction(async () => {
//-----using this part when doing concurrency test------
await Promise.all([
await createObjectIdLink({ db_name: 'links', from: user._id, to: _id, db }),
await db.collection('users').updateOne(
{ _id: user._id },
{ $inc: { following_number: 1 } },
),
await db.collection('users').updateOne(
{ _id },
{
$inc: { followers_number: 1, unread_notifications_number: 1 }
},
),
await createNotification({
action: 'USER_FOLLOWED',
to: _id
}, _context)
]);
//-----------end of concurrency part--------------------
//------using this part when doing sync test--------
//this as a helper for db.insertOne(...)
const insertedId = await createObjectIdLink({ db_name: 'links', from: user._id, to: _id, db });
const updDocMe = await db.collection('users').updateOne(
{ _id: user._id },
{ $inc: { following_number: 1 } },
);
const updDocOther = await db.collection('users').updateOne(
{ _id },
{
$inc: { followers_number: 1, unread_notifications_number: 1 }
},
);
//this as another helper for db.insertOne(...)
await createNotification({
action: 'USER_FOLLOWED',
to: _id
}, _context);
//-----------end of sync part---------------------------
return true;
}, transactionOptions);
if (transactionResult) {
console.log("The reservation was successfully created.");
} else {
console.log("The transaction was intentionally aborted.");
}
await session.endSession();
return true;
}
And related performance results:
format:
Request/Mutation/Response = Total (all in ms)
1) For sync writes in the transaction:
4/91/32 = 127
4/77/30 = 111
7/71/7 = 85
6/66/8 = 80
2/74/9 = 85
4/70/8 = 82
4/70/11 = 85
--waiting more time (~10secs)
9/73/34 = 116
totals/8 = **96.375 ms in average**
//---------------------------------
2) For concurrent writes in transaction:
3/85/7 = 95
2/81/14 = 97
2/70/10 = 82
5/81/11 = 97
5/73/15 = 93
2/82/27 = 111
5/69/7 = 81
--waiting more time (~10secs)
6/80/32 = 118
totals/8 = ** 96.75 ms ms in average **
Conclusion: the difference between the two is within the margin of error (but still on the sync side).
My assumption is with the sync way, you're spending time to wait for DB request/response, while in a concurrent way, you're waiting for MongoDB to order the requests, and then execute them all, which at the end of the day will cost the same time.
So with current MongoDB policies, I guess, the answer to my question will be "there is no need for concurrency because it won't affect the performance anyway." However, it would be incredible if MongoDB would allow parallelization of writes in transactions in future releases with locks on document level (at least for WiredTiger engine) instead of database level, as it is currently for transactions (because you're waiting for the whole write to finish until next one).
Feel free to correct me if I missed/misinterpreted something. Thanks!

This limitation is actually by design. In MongoDB, client sessions cannot be used concurrently (see here and here), and so the Rust driver accepts them as &mut to prevent this from happening at compile time. The Node example is only working by chance and is definitely not recommended or supported behavior. If you would like to perform both updates as part of a transaction, you'll have to run one update after the other. If you'd like to run them concurrently, you'll need to execute them without a session or transaction.
As a side note, a client session can only be used with the client that it was created from. In the provided example, the session is being used with a different one, which will cause an error.

Related

Mongo transactions make a snapshot for the reading operations?

Imagine the following scenario:
Start a session
Start a transaction for that session
run an read on Document A
a different session made an update on Document A (During execution)
write Document B based on the original read of Document A
Commit the transaction
End the session
Will the update on Document A be atomic between read and write, or is there a concurrency problem? I understand transaction does a snapshot of all write operations but not sure what happens on the reading side.
await session.withTransaction(async () => {
const coll1 = client.db('mydb1').collection('foo');
const coll2 = client.db('mydb2').collection('bar');
const docA = await coll1.findOne({ abc: 1 }, { session });
// docA is deleted by other session on this point
if (docA){
//Does this runs on an outdated condition?
await coll2.insertOne({ xyz: 999 }, { session });
}
}, transactionOptions)

Prisma fails on Postgres query with findMany

I have the following query in Prisma that basically returns all users where campaign id is one from the array I provide and they are added to the system within the defined time range. Also I have another entity Click for each user that should be included in the response.
const users = await this.prisma.user.findMany({
where: {
campaign: {
in: [
...campaigns.map((campaign) => campaign.id),
...campaigns.map((campaign) => campaign.name),
],
},
createdAt: {
gte: dateRange.since,
lt: dateRange.until,
},
},
include: {
clicks: true,
},
});
The problem is this query runs fine in localhost where I don't have much data, but in the production database there are nearly 500.000 users and 250.000 clicks in total, so I am not sure if that is the root case but the query fails with the following exception:
Error:
Invalid `this.prisma.user.findMany()` invocation in
/usr/src/app/dist/xyx/xyx.service.js:135:58
132 }
133 async getUsers(campaigns, dateRange) {
134 try {
→ 135 const users = await this.prisma.user.findMany(
Can't reach database server at `xyz`:`25060`
Please make sure your database server is running at `xyz`:`25060`.
Prisma error code is P1001.
xyz replaced for obvious reasons in the paths and connection string to the DB.

the only solution we found is to check what is the limit for your query and then use pagination (skip, take) params in loop to download data part by part and glue them back together then ... not optimal, but, it works. See existing bug report for example
https://github.com/prisma/prisma/issues/8832

Update with upsert, but only update if date field of document in db is less than updated document

I am having a bit of an issue trying to come up with the logic for this. So, what I want to do is:
Bulk update a bunch of posts to my remote MongoDB instance BUT
If update, only update if lastModified field on the remote collection is less than lastModified field in the same document that I am about to update/insert
Basically, I want to update my list of documents if they have been modified since the last time I updated them.
I can think of two brute force ways to do it...
First, querying my entire collection, trying to manually remove and replace the documents that match the criteria, add the new ones, and then mass insert everything back to the remote collection after deleting everything in remote.
Second, query each item and then deciding, if there is one in remote, if I want to update it or no. This seems like it would be very tasking when dealing with remote collections.
If relevant, I am working on a NodeJS environment, using the mondodb npm package for database operations.

You can use the bulkWrite API to carry out the updates based on the logic you specified as it handles this better.
For example, the following snippet shows how to go about this assuming you already have the data from the web service you need to update the remote collection with:
mongodb.connect(mongo_url, function(err, db) {
if(err) console.log(err);
else {
var mongo_remote_collection = db.collection("remote_collection_name");
/* data is from http call to an external service or ideally
place this within the service callback
*/
mongoUpsert(mongo_remote_collection, data, function() {
db.close();
})
}
})
function mongoUpsert(collection, data_array, cb) {
var ops = data_array.map(function(data) {
return {
"updateOne": {
"filter": {
"_id": data._id, // or any other filtering mechanism to identify a doc
"lastModified": { "$lt": data.lastModified }
},
"update": { "$set": data },
"upsert": true
}
};
});
collection.bulkWrite(ops, function(err, r) {
// do something with result
});
return cb(false);
}
If the data from the external service is huge then consider sending the writes to the server in batches of 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.
For bulk operations MongoDB imposes a default internal limit of 1000 operations per batch and so the choice of 500 documents is good in the sense that you have some control over the batch size rather than let MongoDB impose the default, i.e. for larger operations in the magnitude of > 1000 documents. So for the above case in the first approach one could just write all the array at once as this is small but the 500 choice is for larger arrays.
var ops = [],
counter = 0;
data_array.forEach(function(data) {
ops.push({
"updateOne": {
"filter": {
"_id": data._id,
"lastModified": { "$lt": data.lastModified }
},
"update": { "$set": data },
"upsert": true
}
});
counter++;
if (counter % 500 === 0) {
collection.bulkWrite(ops, function(err, r) {
// do something with result
});
ops = [];
}
})
if (counter % 500 != 0) {
collection.bulkWrite(ops, function(err, r) {
// do something with result
}
}

Average Aggregation Queries in Meteor

Ok, still in my toy app, I want to find out the average mileage on a group of car owners' odometers. This is pretty easy on the client but doesn't scale. Right? But on the server, I don't exactly see how to accomplish it.
Questions:
How do you implement something on the server then use it on the client?
How do you use the $avg aggregation function of mongo to leverage its optimized aggregation function?
Or alternatively to (2) how do you do a map/reduce on the server and make it available to the client?
The suggestion by #HubertOG was to use Meteor.call, which makes sense and I did this:
# Client side
Template.mileage.average_miles = ->
answer = null
Meteor.call "average_mileage", (error, result) ->
console.log "got average mileage result #{result}"
answer = result
console.log "but wait, answer = #{answer}"
answer
# Server side
Meteor.methods average_mileage: ->
console.log "server mileage called"
total = count = 0
r = Mileage.find({}).forEach (mileage) ->
total += mileage.mileage
count += 1
console.log "server about to return #{total / count}"
total / count
That would seem to work fine, but it doesn't because as near as I can tell Meteor.call is an asynchronous call and answer will always be a null return. Handling stuff on the server seems like a common enough use case that I must have just overlooked something. What would that be?
Thanks!

As of Meteor 0.6.5, the collection API doesn't support aggregation queries yet because there's no (straightforward) way to do live updates on them. However, you can still write them yourself, and make them available in a Meteor.publish, although the result will be static. In my opinion, doing it this way is still preferable because you can merge multiple aggregations and use the client-side collection API.
Meteor.publish("someAggregation", function (args) {
var sub = this;
// This works for Meteor 0.6.5
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
// Your arguments to Mongo's aggregation. Make these however you want.
var pipeline = [
{ $match: doSomethingWith(args) },
{ $group: {
_id: whatWeAreGroupingWith(args),
count: { $sum: 1 }
}}
];
db.collection("server_collection_name").aggregate(
pipeline,
// Need to wrap the callback so it gets called in a Fiber.
Meteor.bindEnvironment(
function(err, result) {
// Add each of the results to the subscription.
_.each(result, function(e) {
// Generate a random disposable id for aggregated documents
sub.added("client_collection_name", Random.id(), {
key: e._id.somethingOfInterest,
count: e.count
});
});
sub.ready();
},
function(error) {
Meteor._debug( "Error doing aggregation: " + error);
}
)
);
});
The above is an example grouping/count aggregation. Some things of note:
When you do this, you'll naturally be doing an aggregation on server_collection_name and pushing the results to a different collection called client_collection_name.
This subscription isn't going to be live, and will probably be updated whenever the arguments change, so we use a really simple loop that just pushes all the results out.
The results of the aggregation don't have Mongo ObjectIDs, so we generate some arbitrary ones of our own.
The callback to the aggregation needs to be wrapped in a Fiber. I use Meteor.bindEnvironment here but one can also use a Future for more low-level control.
If you start combining the results of publications like these, you'll need to carefully consider how the randomly generated ids impact the merge box. However, a straightforward implementation of this is just a standard database query, except it is more convenient to use with Meteor APIs client-side.
TL;DR version: Almost anytime you are pushing data out from the server, a publish is preferable to a method.
For more information about different ways to do aggregation, check out this post.

I did this with the 'aggregate' method. (ver 0.7.x)
if(Meteor.isServer){
Future = Npm.require('fibers/future');
Meteor.methods({
'aggregate' : function(param){
var fut = new Future();
MongoInternals.defaultRemoteCollectionDriver().mongo._getCollection(param.collection).aggregate(param.pipe,function(err, result){
fut.return(result);
});
return fut.wait();
}
,'test':function(param){
var _param = {
pipe : [
{ $unwind:'$data' },
{ $match:{
'data.y':"2031",
'data.m':'01',
'data.d':'01'
}},
{ $project : {
'_id':0
,'project_id' : "$project_id"
,'idx' : "$data.idx"
,'y' : '$data.y'
,'m' : '$data.m'
,'d' : '$data.d'
}}
],
collection:"yourCollection"
}
Meteor.call('aggregate',_param);
}
});
}

If you want reactivity, use Meteor.publish instead of Meteor.call. There's an example in the docs where they publish the number of messages in a given room (just above the documentation for this.userId), you should be able to do something similar.

You can use Meteor.methods for that.
// server
Meteor.methods({
average: function() {
...
return something;
},
});
// client
var _avg = { /* Create an object to store value and dependency */
dep: new Deps.Dependency();
};
Template.mileage.rendered = function() {
_avg.init = true;
};
Template.mileage.averageMiles = function() {
_avg.dep.depend(); /* Make the function rerun when _avg.dep is touched */
if(_avg.init) { /* Fetch the value from the server if not yet done */
_avg.init = false;
Meteor.call('average', function(error, result) {
_avg.val = result;
_avg.dep.changed(); /* Rerun the helper */
});
}
return _avg.val;
});

Auto increment in MongoDB to store sequence of Unique User ID

I am making a analytics system, the API call would provide a Unique User ID, but it's not in sequence and too sparse.
I need to give each Unique User ID an auto increment id to mark a analytics datapoint in a bitarray/bitset. So the first user encounters would corresponding to the first bit of the bitarray, second user would be the second bit in the bitarray, etc.
So is there a solid and fast way to generate incremental Unique User IDs in MongoDB?

As selected answer says you can use findAndModify to generate sequential IDs.
But I strongly disagree with opinion that you should not do that. It all depends on your business needs. Having 12-byte ID may be very resource consuming and cause significant scalability issues in future.
I have detailed answer here.

You can, but you should not
https://web.archive.org/web/20151009224806/http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Each object in mongo already has an id, and they are sortable in insertion order. What is wrong with getting collection of user objects, iterating over it and use this as incremented ID? Er go for kind of map-reduce job entirely

I know this is an old question, but I shall post my answer for posterity...
It depends on the system that you are building and the particular business rules in place.
I am building a moderate to large scale CRM in MongoDb, C# (Backend API), and Angular (Frontend web app) and found ObjectId utterly terrible for use in Angular Routing for selecting particular entities. Same with API Controller routing.
The suggestion above worked perfectly for my project.
db.contacts.insert({
"id":db.contacts.find().Count()+1,
"name":"John Doe",
"emails":[
"john#doe.com",
"john.doe#business.com"
],
"phone":"555111322",
"status":"Active"
});
The reason it is perfect for my case, but not all cases is that as the above comment states, if you delete 3 records from the collection, you will get collisions.
My business rules state that due to our in house SLA's, we are not allowed to delete correspondence data or clients records for longer than the potential lifespan of the application I'm writing, and therefor, I simply mark records with an enum "Status" which is either "Active" or "Deleted". You can delete something from the UI, and it will say "Contact has been deleted" but all the application has done is change the status of the contact to "Deleted" and when the app calls the respository for a list of contacts, I filter out deleted records before pushing the data to the client app.
Therefore, db.collection.find().count() + 1 is a perfect solution for me...
It won't work for everyone, but if you will not be deleting data, it works fine.
Edit
latest versions of pymongo:
db.contacts.count() + 1

First Record should be add
"_id" = 1 in your db
$database = "demo";
$collections ="democollaction";
echo getnextid($database,$collections);
function getnextid($database,$collections){
$m = new MongoClient();
$db = $m->selectDB($database);
$cursor = $collection->find()->sort(array("_id" => -1))->limit(1);
$array = iterator_to_array($cursor);
foreach($array as $value){
return $value["_id"] + 1;
}
}

I had a similar issue, namely I was interested in generating unique numbers, which can be used as identifiers, but doesn't have to. I came up with the following solution. First to initialize the collection:
fun create(mongo: MongoTemplate) {
mongo.db.getCollection("sequence")
.insertOne(Document(mapOf("_id" to "globalCounter", "sequenceValue" to 0L)))
}
An then a service that return unique (and ascending) numbers:
#Service
class IdCounter(val mongoTemplate: MongoTemplate) {
companion object {
const val collection = "sequence"
}
private val idField = "_id"
private val idValue = "globalCounter"
private val sequence = "sequenceValue"
fun nextValue(): Long {
val filter = Document(mapOf(idField to idValue))
val update = Document("\$inc", Document(mapOf(sequence to 1)))
val updated: Document = mongoTemplate.db.getCollection(collection).findOneAndUpdate(filter, update)!!
return updated[sequence] as Long
}
}
I believe that id doesn't have the weaknesses related to concurrent environment that some of the other solutions may suffer from.

// await collection.insertOne({ autoIncrementId: 1 });
const { value: { autoIncrementId } } = await collection.findOneAndUpdate(
{ autoIncrementId: { $exists: true } },
{
$inc: { autoIncrementId: 1 },
},
);
return collection.insertOne({ id: autoIncrementId, ...data });

I used something like nested queries in MySQL to simulate auto increment, which worked for me. To get the latest id and increment one to it you can use:
lastContact = db.contacts.find().sort({$natural:-1}).limit(1)[0];
db.contacts.insert({
"id":lastContact ?lastContact ["id"] + 1 : 1,
"name":"John Doe",
"emails": ["john#doe.com", "john.doe#business.com"],
"phone":"555111322",
"status":"Active"
})
It solves the removal issue of Alex's answer. So no duplicate id will appear if any record is removed.
More explanation: I just get the id of the latest inserted document, add one to it, and then set it as the id of the new record. And ternary is for cases that we don't have any records yet or all of the records are removed.

this could be another approach
const mongoose = require("mongoose");
const contractSchema = mongoose.Schema(
{
account: {
type: mongoose.Schema.Types.ObjectId,
required: true,
},
idContract: {
type: Number,
default: 0,
},
},
{ timestamps: true }
);
contractSchema.pre("save", function (next) {
var docs = this;
mongoose
.model("contract", contractSchema)
.countDocuments({ account: docs.account }, function (error, counter) {
if (error) return next(error);
docs.idContract = counter + 1;
next();
});
});
module.exports = mongoose.model("contract", contractSchema);

// First check the table length
const data = await table.find()
if(data.length === 0){
const id = 1
// then post your query along with your id
}
else{
// find last item and then its id
const length = data.length
const lastItem = data[length-1]
const lastItemId = lastItem.id // or { id } = lastItem
const id = lastItemId + 1
// now apply new id to your new item
// even if you delete any item from middle also this work
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse