Error inserting embedded documents into MongoDB using Scala - mongodb

I use mongo-scala-driver 2.9.0 and this is a function saving a user's Recommendation List to MongoDB. The argument streamRecs is An Array of (productId:Int, score:Double). Now i want to insert a document consisting of an useId and its relevant reconmendation list recs. However, there is an error in the line val doc:Document = Document("userId" -> userId,"recs"->recs). Does anyone know what goes wrong?
def saveRecsToMongoDB(userId: Int, streamRecs: Array[(Int, Double)])(implicit mongoConfig: MongoConfig): Unit ={
val streamRecsCollection = ConnHelper.mongoClient.getDatabase(mongoConfig.db).getCollection(STREAM_RECS_COLLECTION)
streamRecsCollection.findOneAndDelete(equal("userId",userId))
val recs: Array[Document] = streamRecs.map(item=>Document("productId"->item._1,"score"->item._2))
val doc:Document = Document("userId" -> userId,"recs"->recs)
streamRecsCollection.insertOne(doc)
}
the document i want to insert into MongoDB is like this(it means an
user and his recommendation products and scores):
{
"_id":****,
"userId":****,
"recs":[
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
}
]
}

When creating a BSON document, declare the Bson type explicitly for each value in the key/value pair, like so:
/* Compose Bson document */
val document = Document(
"email" -> BsonString("email#domain.com"),
"token" -> BsonString("some_random_string"),
"created" -> BsonDateTime(org.joda.time.DateTime.now.toDate)
)
To see an example, please check https://code.linedrop.io/articles/Storing-Object-in-MongoDB.

Related

How to optimize MongoDB find query?

I have a mongo read query which looks like following.
data class Resource( val assignmentId: String, val referrerId: String, val studentIds: Set<String>)
fun findSubmissions(resources: List<Resource>): Flux<Submission> {
val criteria = resources.map { (assignmentId, referrerId, students) ->
Criteria.where("assignmentId").`is`(assignmentId).and("referrerId").`is`(referrerId).and("studentId").`in`(students)
}.toTypedArray()
val query = Query().addCriteria(Criteria().orOperator(*criteria))
return template.find(query, Submission::class.java)
}
}
Document schema of Submission looks like this
{
assignmentId: "",
referrerId: "",
studentId: "",
status: ""
}
Looks like this query is affecting the performance and hogging the CPU, which could be because of the way OR clause has been written.
Any ideas how this query can be rewritten in a better way?

Strict parsing into POJOs with KMongo

When I find documents in my collections and parse them into POJOs, I would like to see exceptions, if additional keys are available in the MongoDB, that do not correspondent to my POJO.
Can't find a way to configure that.
What I do
data class MyPojo(var a: Int)
val mongoClient = KMongo.createClient(...)
val collection = mongoClient...
val results = collection.aggregate<MyPojo>(...)
and if a result document is
{ "a": 1, "b": 2 }
What I get:
MyPojo(a=1)
I would like to see an exception of sort
kotlinx.serialization.json.JsonDecodingException: Invalid JSON...: Encountered an unknown key b
Does anyone know how to do that?
You have to specify strictMode = true in your JsonConfiguration for example:
install(ContentNegotiation) {
serialization(
contentType = ContentType.Application.Json,
json = Json(
JsonConfiguration(
strictMode = true,
prettyPrint = true
)
)
)
}

How to avoid adding duplicate data in Scrapy using MongoDB?

I want to avoid adding duplicate data and just 1) update one field (number of views) or 2) all the fields that had changed in the website. To do so I'm using an ID (origin_id) that I have found in the website that I'm scraping.
Pipelines
class MongoDBPipeline(object):
def __init__(self):
connection = pymongo.MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT']
)
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self, item, spider):
valid = True
for data in item:
if not data:
valid = False
raise DropItem("Missing {0}!".format(data))
if valid:
# Update item if it is in the database and insert otherwise.
self.collection.update({'origin_id': item['origin_id']}, dict(item), upsert=True)
return item
MongoDB record
{
"_id" : ObjectId("59725e919a1a6b7f0350027a"),
"origin_id" : "12256699",
"views":"556",
"url":"...",
"title":"...",
}
Please let me know if you want more details ...
You need to increment views field by 1 if the origin_id exists in the document.
Note that you can only set the other fields as they hold non-numeric values.
This is also necessary in order to skip an extra query that checks if a document with that origin_id exists in the collection.
self.collection.update({
'origin_id': item['origin_id']},
{
'$set': {'url': item['url'], 'title': item['title']},
'$inc': {'views': 1}
}
},
upsert=True)

Invoke db.eval in FindAndModify using MongoDB C# Client

I have the following Document:
{
"_id": 100,
"Version": 1,
"Data": "Hello"
}
I have a function which return a number from a sequence:
function getNextSequence(name) {
var ret = db.Counter.findAndModify(
{
query: { _id: name },
update: { $inc: { value: 1 } },
new: true,
upsert: true
}
);
return ret.value;
}
I can use this for optimistic concurrency by performing the following Mongo command:
db.CollectionName.findAndModify({
query: { "_id" : NumberLong(100), "Version" : 1 },
update: { "$set" : {
"Data": "Here is new data!",
"Version" : db.eval('getNextSequence("CollectionName")') }
},
new: true
}
);
This will update the document (as the _id and Version) match, with the new Data field, and also the new number out of the eval call.
It also returns a modified document, from which I can retrieve the new Version value if I want to make another update later (in the same 'session').
My problem is:
You cannot create an Update document using the MongoDB C# client that will serialize to this command.
I used:
var update = Update.Combine(
new UpdateDocument("$set", doc),
Update.Set(versionMap.ElementName, new BsonJavaScript("db.eval('getNextSequence(\"Version:CollectionName\")')")))
);
If you use what I first expected to perform this task, BsonJavascript, you get the following document, which incorrectly sets Version to a string of javascript.
update: { "$set" : {
"Data": "Here is new data!",
"Version" : { "$code" : "db.eval('getNextSequence(\"Version:CollectionName\")')" }
}
}
How can I get MongoDB C# client to serialize an Update document with my db.eval function call in it?
I have tried to add a new BsonValue type in my assembly which I would serialize down to db.eval(''); However there is a BsonType enum which I cannot modify, without making a mod to MongoDB which I would not like to do incase of any issues with the change, compatibility etc.
I have also tried simply creating the Update document myself as a BsonDocument, however FindAndModify will only accept an IMongoUpdate interface which a simply a marker that at present I find superfluous.
I have just tried to construct the command manually by creating a BsonDocument myself to set the Value: db.eval, however I get the following exception:
A String value cannot be written to the root level of a BSON document.
I see no other way now than drop down to the Mongo stream level to accomplish this.
So I gave up with trying to get Mongo C# Client to do what I needed and instead wrote the following MongoDB function to do this for me:
db.system.js.save(
{
_id : "optimisticFindAndModify" ,
value : function optimisticFindAndModify(collectionName, operationArgs) {
var collection = db.getCollection(collectionName);
var ret = collection.findAndModify(operationArgs);
return ret;
}
}
);
This will get the collection to operate over, and execute the passed operationArgs in a FindAndModify operation.
Because I could not get the shell to set a literal value (ie, not a "quoted string") on a javascript object, I had to to this in my C# code:
var counterName = "Version:" + CollectionName;
var sequenceJs = string.Format("getNextSequence(\"{0}\")", counterName);
var doc = entity.ToBsonDocument();
doc.Remove("_id");
doc.Remove(versionMap.ElementName);
doc.Add(versionMap.ElementName, "SEQUENCEJS");
var findAndModifyDocument = new BsonDocument
{
{"query", query.ToBsonDocument()},
{"update", doc},
{"new", true},
{"fields", Fields.Include(versionMap.ElementName).ToBsonDocument() }
};
// We have to strip the quotes from getNextSequence.
var findAndModifyArgs = findAndModifyDocument.ToString();
findAndModifyArgs = findAndModifyArgs.Replace("\"SEQUENCEJS\"", sequenceJs);
var evalCommand = string.Format("db.eval('optimisticFindAndModify(\"{0}\", {1})');", CollectionName, findAndModifyArgs);
var modifiedDocument = Database.Eval(new EvalArgs
{
Code = new BsonJavaScript(evalCommand)
});
The result of this is that I can now call my Sequence Javascript, the getNextSequence function, inside the optimisticFindAndModify function.
Unforunately I had to use a string replace in C# as again there is no way of setting a BsonDocument to use the literal type db.eval necessary, although Mongo Shell likes it just fine.
All is now working.
EDIT:
Although, if you really want to push boundaries, and are actually awake, you will realize this same action can be accomplished by performing an $inc on the Version field.... and none of this is necessary....
However: If you want to follow along to the MongoDB tutorial on how they to say to implement concurrency, or you just want to use a function in a FindAndModify, this will help you. I know I'll probably refer back to it a few times in this project!

How should I structure my nested reactivemongo calls in my play2 application?

I'm in the process of trying to combine some nested calls with reactivemongo in my play2 application.
I get a list of objects returned from createObjects. I then loop over them, check if the object exist in the collection and if not insert them:
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
for(object <- objectsReadyForSave) {
collection.find(BSONDocument("cId" -> object.get.cId,"userId" ->
object.userId.get)).cursor.headOption.map { maybeFound =>
maybeFound.map { found =>
Logger.info("Found record, do not insert")
} getOrElse {
collection.insert(object)
}
}
}
Future(Ok(views.html.invite(form)))
}
})
}
I feel that this way is not as good as it can be and feels not "play2" and "reactivemongo".
So my question is: How should I structure my nested calls to get the result I want
and get the information of which objects that have been inserted?
I am not an expert in mongoDB neither in ReactiveMongo but it seems that you are trying to use a NoSQL database in the same way as you would use standard SQL databases. Note that mongoDB is asynchronous which means that operations may be executed in some future, this is why insertion/update operations do not return affected documents. Regarding your questions:
1 To insert the objects if they do not exist and get the information of which objects that have been inserted?
You should probably look at the mongoDB db.collection.update() method and call it with the upsert parameter as true. If you can afford it, this will either update documents if they already exist in database or insert them otherwise. Again, this operation does not return affected documents but you can check how many documents have been affected by accessing the last error. See reactivemongo.api.collections.GenericCollection#update which returns a Future[LastError].
2 For all the objects that are inserted, add them to a list and then return it with the Ok() call.
Once again, inserted/updated documents will not be returned. If you really need to return the complete affected document back, you will need to make another query to retrieve matching documents.
I would probably rewrite your code this way (without error/failure handling):
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
val operations = for {
data <- objectsReadyForSave
} yield collection.update(BSONDocument("cId" -> data.cId.get, "userId" -> data.userId.get), data, upsert = true)
Future.sequence(operations).map {
lastErrors =>
Ok("Documents probably inserted/updated!")
}
}
}
)
}
See also Scala Futures: http://docs.scala-lang.org/overviews/core/futures.html
This is really useful! ;)
Here's how I'd rewrote it.
def dostuff() = Action { implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
createObjects(form.companyId,
form.companyName,
sms_pattern.findAllIn(form.phoneNumbers).toSet).map(ƒ)
Ok(views.html.invite(form))
}
)
}
// ...
// In the model
// ...
def ƒ(cId: Option[String], userId: Option[String], logger: Logger) = {
// You need to handle the case where obj.cId or obj.userId are None
collection.find(BSONDocument("cId" -> obj.cId.get, "userId" -> obj.userId.get))
.cursor
.headOption
.map { maybeFound =>
maybeFound map { _ =>
logger.info("Record found, do not insert")
} getOrElse {
collection.insert(obj)
}
}
}
There may be some syntax errors, but the idea is there.