MongoIO Apache beam GCP Dataflow with Mongo Upsert Pipeline example - mongodb

I am looking for an example to implement Apache beam GCP dataflow Pipeline to Update the data in Mongo DB using upsert operation i.e if the value exsit it should update the value and if not it should insert.
Syntax like below :
pipeline.apply(...)
.apply(MongoDbIO.write()
.withUri("mongodb://localhost:27017")
.withDatabase("my-database")
.withCollection("my-collection")
.withUpdateConfiguration(UpdateConfiguration.create().withUpdateKey("key1")
.withUpdateFields(UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
//pushes entire input doc to the dest field
UpdateField.fullUpdate("$push", "dest-field3") )));
Below is my Pipeline Code where i am currently inserting the document after preapring the collection like below
{"_id":{"$oid":"619632693261e80017c44145"},"vin":"SATESTCAVA74621","timestamp":"2021-11-18T10:48:59.889Z","key":"EV_CHARGE_NOW_SETTING","value":"DEFAULT"}
Now i want to Update the 'value' and 'timestamp' if the combination of 'vin' and 'key' are present, if 'vin' and 'key' combination is not present then Insert the new document using upsert.
PCollection<PubsubMessage> pubsubMessagePCollection= pubsubMessagePCollectionMap.get(topic);
pubsubMessagePCollection.apply("Convert pubsub to kv,k=vin", ParDo.of(new ConvertPubsubToKVFn()))
.apply("group by vin key",GroupByKey.<String,String>create())
.apply("filter data for alerts, status and vehicle data", ParDo.of(new filterMessages()))
.apply("converting message to document type", ParDo.of(
new ConvertMessageToDocumentTypeFn(list_of_keys_str, collection, options.getMongoDBHostName(),options.getMongoDBDatabaseName())).withSideInputs(list_of_keys_str))
.apply(MongoDbIO.write()
.withUri(options.getMongoDBHostName())
.withDatabase(options.getMongoDBDatabaseName())
.withCollection(collection));
Now if i want to use this below lines of code:
.withUpdateConfiguration(UpdateConfiguration.create().withUpdateKey("key1")
.withUpdateFields(UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
//pushes entire input doc to the dest field
UpdateField.fullUpdate("$push", "dest-field3") )));
What will be my key1, "source-field1", "dest-field1", "source-field2", "dest-field2", "dest-field3" ?
I am confused with this values. Please help !
Below code i am trying to update
MongoDbIO.write()
.withUri(options.getMongoDBHostName())
.withDatabase(options.getMongoDBDatabaseName())
.withCollection(collection)
.withUpdateConfiguration(UpdateConfiguration.create()
.withIsUpsert(true)
.withUpdateKey("vin")
.withUpdateKey("key")
.withUpdateFields(UpdateField.fieldUpdate("$set", "vin", "vin"),
UpdateField.fieldUpdate("$set", "key", "key"),
UpdateField.fieldUpdate("$set", "timestamp", "timestamp"),
UpdateField.fieldUpdate("$set", "value", "value")))
Using above code My document is not updating instead adding with id = vin , it should update based the exsiting record with vin and key match, also if insert it should insert with auto generated _id value.
Please suggest what to do here ?

upsert configuration is read from here, you can configure it with withIsUpsert(true).
In your original syntax, add the extra line to enable upsert.
pipeline.apply(...)
.apply(MongoDbIO.write()
.withUri("mongodb://localhost:27017")
.withDatabase("my-database")
.withCollection("my-collection")
.withUpdateConfiguration(
UpdateConfiguration.create()
.withIsUpsert(true)
.withUpdateKey("key1")
.withUpdateFields(
UpdateField.fieldUpdate("$set", "source-field1", "dest-field1"),
UpdateField.fieldUpdate("$set","source-field2", "dest-field2"),
//pushes entire input doc to the dest field
UpdateField.fullUpdate("$push", "dest-field3"))));

Related

MongoDB Complex Query with Java

We have following structure in MongoDB documents.
{
"id":"1111",
"keys":[
{
"name":"Country",
"value":"USA"
},
{
"name":"City",
"value":"LongIsland"
},
{
"name":"State",
"value":"NewYork"
}
]
}
Now using Springframework Query object, I figured out a way to pull the details using below syntax
query.addCriteria(
Criteria.where("keys.value").is(countryparam).
andOperator(
Criteria.where("keys.value").is(stateparam)
)
);
Two issue with this query model.
First issue is it is irrelevant if countryparam and stateparam are actually meant to match Country key name and State key name respectively. If just the values matches, the query returns the document. Means, if I have Country and City params, this just works if user passes Country and City values, even if they are swapped. So how can I exactly compare City to cityparam and State to Stateparam?
More complexity is if I have to extract the document basing on multiple key value pairs, I should be correspondingly able to match key name with respective value and query the document. How can I do this?
Thanks in advance!

How can I upsert a record and array element at the same time?

That is meant to be read as a dual upsert operation, upsert the document then the array element.
So MongoDB is a denormalized store for me (we're event sourced) and one of the things I'm trying to deal with is the concurrent nature of that. The problem is this:
Events can come in out of order, so each update to the database need to be an upsert.
I need to be able to not only upsert the parent document but an element in an array property of that document.
For example:
If the document doesn't exist, create it. All events in this stream have the document's ID but only part of the information depending on the event.
If the document does exist, then update it. This is the easy part. The update command is just written as UpdateOneAsync and as an upsert.
If the event is actually to update a list, then that list element needs to be upserted. So if the document doesn't exist, it needs to be created and the list item will be upserted (resulting in an insert); if the document does exist, then we need to find the element and update it as an upsert, so if the element exists then it is updated otherwise it is inserted.
If at all possible, having it execute as a single atomic operation would be ideal, but if it can only be done in multiple steps, then so be it. I'm getting a number of mixed examples on the net due to the large change in the 2.x driver. Not sure what I'm looking for beyond the UpdateOneAsync. Currently using 2.4.x. Explained examples would be appreciated. TIA
Note:
Reiterating that this is a question regarding the MongoDB C# driver 2.4.x
Took some tinkering, but I got it.
var notificationData = new NotificationData
{
ReferenceId = e.ReferenceId,
NotificationId = e.NotificationId,
DeliveredDateUtc = e.SentDate.DateTime
};
var matchDocument = Builders<SurveyData>.Filter.Eq(s => s.SurveyId, e.EntityId);
// first upsert the document to make sure that you have a collection to write to
var surveyUpsert = new UpdateOneModel<SurveyData>(
matchDocument,
Builders<SurveyData>.Update
.SetOnInsert(f => f.SurveyId, e.EntityId)
.SetOnInsert(f => f.Notifications, new List<NotificationData>())){ IsUpsert = true};
// then push a new element if none of the existing elements match
var noMatchReferenceId = Builders<SurveyData>.Filter
.Not(Builders<SurveyData>.Filter.ElemMatch(s => s.Notifications, n => n.ReferenceId.Equals(e.ReferenceId)));
var insertNewNotification = new UpdateOneModel<SurveyData>(
matchDocument & noMatchReferenceId,
Builders<SurveyData>.Update
.Push(s => s.Notifications, notificationData));
// then update the element that does match the reference ID (if any)
var matchReferenceId = Builders<SurveyData>.Filter
.ElemMatch(s => s.Notifications, Builders<NotificationData>.Filter.Eq(n => n.ReferenceId, notificationData.ReferenceId));
var updateExistingNotification = new UpdateOneModel<SurveyData>(
matchDocument & matchReferenceId,
Builders<SurveyData>.Update
// apparently the mongo C# driver will convert any negative index into an index symbol ('$')
.Set(s => s.Notifications[-1].NotificationId, e.NotificationId)
.Set(s => s.Notifications[-1].DeliveredDateUtc, notificationData.DeliveredDateUtc));
// execute these as a batch and in order
var result = await _surveyRepository.DatabaseCollection
.BulkWriteAsync(
new []{ surveyUpsert, insertNewNotification, updateExistingNotification },
new BulkWriteOptions { IsOrdered = true })
.ConfigureAwait(false);
The post linked as being a dupe was absolutely helpful, but it was not the answer. There were a few things that needed to be discovered.
The 'second statement' in the linked example didn't work
correctly, at least when translated literally. To get it to work, I had to match on the
element and then invert the logic by wrapping it in the Not() filter.
In order to use 'this index' on the match, you have to use a
negative index on the array. As it turns out, the C# driver will
convert any negative index to the '$' character when the query is
rendered.
In order to ensure they are run in order, you must include bulk write
options with IsOrdered set to true.

Doctrine ODM(mongo) - upsert embeded document if not exist

A mongo document exists which links Users and Cars. It contains the following fields:
User.userId
User.cars[]
User.updated
User.created
User.cars is an array of embedded documents. A query needs to be written to only insert a Car to this field if it does not already contain a Car with id $carId
The following query will create a new record each time for $userId and $carId. What it should do is either not insert a new record or update the value for driven.
$qb
->findAndUpdate()
->upsert(true)
->field('userId')->equals($userId)
->field('cars.id')->notEqual($carId)
->field('cars')->addToSet([
'id' => $carId,
'driven' => new \DateTime(),
])
->field('updated')->set(new \DateTime())
->field('created')->setOnInsert(new \DateTime())
->limit(1)
;
return $qb->getQuery()->execute();
Removing notEqual($carId) from the query will always insert the record into User.cars, which is also not desired.
The end result should be that User.cars only contains one record for each carId.
If you don't need to search the array of cars later you could store it as an object instead and use $carId as key:
$qb
->findAndUpdate()
->upsert(true)
->field('userId')->equals($userId)
->field("cars.$carId")->set([
'id' => $carId,
'driven' => new \DateTime(),
])
->field('updated')->set(new \DateTime())
->field('created')->setOnInsert(new \DateTime())
->limit(1)
;
return $qb->getQuery()->execute();
If cars are mapped in the document as EmbedMany you may need to change your collection's strategy to either set or atomicSet (I'd suggest the latter) as otherwise ODM will reindex your array each time it's saved.
Once again, this solution has its downsides which can be considered serious depending on how you want to use your data but it solves the problem in question.
Offhand, ->limit(1) is superfluous as ->findAndUpdate() is making a query run findAndModify command which by its nature operates on single document.

Update an array using Jongo

I have a mongodb collection of the form
{
"_id":"id",
"userEmail":"userEmailFromCustomerCollection",
"customerFavs":[
"www.xyz.com",
"www.xyz.com",
"www.xyz.com"
]
}
I need to add an element to the customers favs array using Jongo , I am using the following code snippet to do so .
String query = "{userEmail:'"+emailId+"'}";
customerFavCollection.update(query).with("{$addToSet:{customerFavs:#}}", favUrl);
My problem , is that I need to upsert the document if the document does not
exist already , how can I do so using Jongo, I know an easier option would be to retrieve the document by Id and if it does not exist then insert the document using save() , but I am trying to avoid the extra retrieve.
You can add upsert() on the query.
customerFavCollection.update("userEmail:#", emailId)
.with("{$addToSet:{customerFavs:#}}", favUrl)
.upsert();

Multi level MongoDB object querying by key

If you only know the key name (say "nickname"), but not the exact path to that key in the object.
e.g. nickname may be at the first level like:
{"nickname":"Howie"}
or at the second level:
{"user":{"nickname":"Howie"}}
Is it possible to query for nickname equal "Howie" that would return both documents?
Unfortunately there is no wild card that allows you to search for a field at any level in the db. If the position is not relevant and you can modify the document structure you have 2 choices here. You can store your document as
{ fieldname:"nickname", value : "Howie" }
{ fieldname:"user/nickname", value: "Howie" }
You can then query using
db.so.find({fieldname:/nickname/, value:"Howie"})
Alternatively you can store as
db.so.insert({value:"Howie", fieldpath:["nickname"]})
db.so.insert({value:"Howie", fieldpath:["user", "nickname"]})
The advantage with the second approach is that you can now index {fieldpath:1, value:1} and a query on it such as
db.so.find({fieldpath:"nickname", value:"Howie"})
will be an indexed query.