How return document from elasticsearch after update? - scala

I am trying to update several document fields and return full document after update.
I use elastic4s 1.3.4, elasticsearch 1.4.3 (as server).
Here is a code:
import scala.concurrent.ExecutionContext.Implicits.global
object ElasticsearchTester extends App {
private val settings: Settings = ImmutableSettings.settingsBuilder().put("cluster.name", "clustername").build()
private val client: ElasticClient = ElasticClient.remote(settings, ("localhost", 9300))
val initial = """
|{
| "name":"jojn",
| "surname":"olol"
|}
""".stripMargin
val updateString = """
|{
| "surname":"123",
| "global": {
| "new":"fiedl"
| }
|}
""".stripMargin
import com.sksamuel.elastic4s.ElasticDsl._
val future = client.execute {
create index "my_index"
}.flatMap { r=>
client.execute {
index into "my_index/user" doc StringDocumentSource(initial)
}.flatMap { re=>
println("Ololo indexed is: " + initial)
println("Ololo indexed id: " + re.getId)
client.execute {
update id re.getId in "my_index/user" doc StringDocumentSource(updateString) docAsUpsert true params ("fields" -> "_source")
}.map{res=>
println("Ololo result is: " + res.getGetResult.sourceAsString())
}
}
}
Await.result (future, 20.seconds)
println("Ololo ok")
}
Why I get NullPointerException in line res.getGetResult.sourceAsString()? It seems that update response do not contains a document after update operation.
Is it possible to return document _source from update response?

Elastic4s seem has no api in UpdateDefinition (for now 23.07.2015) to set fields. However it's builder support this operation, the code below is like dirty hack, but it works as exepcted, just set fields directly into _builder:
val updateRequest = update id re.getId in "my_index/user" doc StringDocumentSource(updateString) docAsUpsert true
updateRequest._builder.setFields("_source")
client.execute {
updateRequest
}.map { res=>
println("Ololo result is: " + res.getGetResult.sourceAsString())
}
}
prints
Ololo indexed id: AU66n1yiYVxOgU2h4AoG
Ololo result is: {"name":"jojn","surname":"123","global":{"new":"fiedl"}}
Ololo ok
Note
Elasticsearch does support returning fields after update request.
Updated
Elastic4s after this commit support this via UpdateDsl.includeSource or UpdateDsl.setFields methods.

Related

Error inserting embedded documents into MongoDB using Scala

I use mongo-scala-driver 2.9.0 and this is a function saving a user's Recommendation List to MongoDB. The argument streamRecs is An Array of (productId:Int, score:Double). Now i want to insert a document consisting of an useId and its relevant reconmendation list recs. However, there is an error in the line val doc:Document = Document("userId" -> userId,"recs"->recs). Does anyone know what goes wrong?
def saveRecsToMongoDB(userId: Int, streamRecs: Array[(Int, Double)])(implicit mongoConfig: MongoConfig): Unit ={
val streamRecsCollection = ConnHelper.mongoClient.getDatabase(mongoConfig.db).getCollection(STREAM_RECS_COLLECTION)
streamRecsCollection.findOneAndDelete(equal("userId",userId))
val recs: Array[Document] = streamRecs.map(item=>Document("productId"->item._1,"score"->item._2))
val doc:Document = Document("userId" -> userId,"recs"->recs)
streamRecsCollection.insertOne(doc)
}
the document i want to insert into MongoDB is like this(it means an
user and his recommendation products and scores):
{
"_id":****,
"userId":****,
"recs":[
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
},
{
"productId":****,
"score":****
}
]
}
When creating a BSON document, declare the Bson type explicitly for each value in the key/value pair, like so:
/* Compose Bson document */
val document = Document(
"email" -> BsonString("email#domain.com"),
"token" -> BsonString("some_random_string"),
"created" -> BsonDateTime(org.joda.time.DateTime.now.toDate)
)
To see an example, please check https://code.linedrop.io/articles/Storing-Object-in-MongoDB.

Strict parsing into POJOs with KMongo

When I find documents in my collections and parse them into POJOs, I would like to see exceptions, if additional keys are available in the MongoDB, that do not correspondent to my POJO.
Can't find a way to configure that.
What I do
data class MyPojo(var a: Int)
val mongoClient = KMongo.createClient(...)
val collection = mongoClient...
val results = collection.aggregate<MyPojo>(...)
and if a result document is
{ "a": 1, "b": 2 }
What I get:
MyPojo(a=1)
I would like to see an exception of sort
kotlinx.serialization.json.JsonDecodingException: Invalid JSON...: Encountered an unknown key b
Does anyone know how to do that?
You have to specify strictMode = true in your JsonConfiguration for example:
install(ContentNegotiation) {
serialization(
contentType = ContentType.Application.Json,
json = Json(
JsonConfiguration(
strictMode = true,
prettyPrint = true
)
)
)
}

SpringBoot ReactiveMongoTemplate updating document partially

I am working on a kotlin reactive spring-boot mongodb project. I'm trying to update a document but it does not work well.
My problem is pretty similar to the following question in stackoverflow.
Spring reactive mongodb template update document partially with objects
So I have a document in mongo
{
"id": 1,
"name": "MYNAME",
"email": "MYEMAIL",
"encryptedPassword": "12345",
...........................
}
And when I call PATCH on the uri localhost:8080/user/1 with the one of following header
{
"name": "NEW NAME"
}
{
"email": "NEW EMAIL"
}
I want to update my document with received fields only.
My handler code
fun update(serverRequest: ServerRequest) =
userService
.updateUser(serverRequest.pathVariable("id").toLong(), serverRequest.bodyToMono())
.flatMap {
ok().build()
}
My Service Implement code
override fun updateUser(id: Long, request: Mono<User>): Mono<UpdateResult> {
val changes = request.map { it -> PropertyUtils.describe(it) }
val updateFields: Update = Update()
changes.subscribe {
for (entry in it.entries) {
updateFields.set(entry.key, entry.value)
}
}
return userRepository.updateById(id, updateFields)
}
My repository code
fun updateById(id: Long, partial: Update) = template.updateFirst(Query(where("id").isEqualTo(id)), partial, User::class.java)
My user code
#Document
data class User(
#Id
val id: Long = 0,
var name: String = "",
val email: String = "",
val encryptedPassword: ""
)
I have followed the advice that Spring reactive mongodb template update document partially with objects gave.
My code do updates, but it updates to the initial constructor of my User class.
Could anyone help with this?
I guess you should consider this problem as a general problem of patching an object in Java/Kotlin. I found an article about this: https://cassiomolin.com/2019/06/10/using-http-patch-in-spring/#json-merge-patch. Even if you won't update partially an object, there should not be such a big impact on performance of your application.
I figured out how to partially update my data.
First I changed the body request to string. (using bodyToMono(String::class.java)
Then I changed the changed JSON string to JSONObject(org.json).
And for each of its JSONObject's key I created Update that will be the partial data to update my entity.
Following is how I implemented this.
override fun updateUser(id: Long, request: Mono<String>): Mono<UpdateResult> {
val update = Update()
return request.map { JSONObject(it) }
.map {
it.keys().forEach { key -> update.set(key, it[key]) }
update
}
.flatMap { it -> userRepository.updateById(id, it) }
}
Please share more idea if you have more 'cleaner' way to do this. Thank you

Read data from huge Mongo DB

Scenario:
Collection A has 40 million records and each record has almost 20 fields.
Get 5 (defined)fields from A and change the field name and populate in collection B.
Example:
A
"_id" is the primary key here
{
"_id":123
"id":123
"title":"test"
"summary": "test"
"version":1
"parentid":12
}
B
{
"_id":123
"p$id":123
"p$parentid":12
"p$title":"test"
}
Can someone please suggest a good way to write a code for this scenario?
I wrote the code but it took 5 hrs to complete.
My Code:
config.py:
It has all Mongo DB related details.
Actual code:
from pymongo import MongoClient
import operator
import datetime
print "Start time", datetime.datetime.now()
primary_dict = {}
primary_list = []
secondary_dict = {}
secondary_list = []
missing_id = []
mismatch_id = []
alias_dict = {
"_id": "_id",
"id":"p$id"
"title": "p$title"
"parentid":"p$parentid"
}
def mongo_connect(host, port, db, collection):
client = MongoClient(host, port)
db_obj = client[db]
collection_obj = db_obj[collection]
return collection_obj
def primary():
global primary_list
global primary_dict
global secondary_dict
global secondary_list
global missing_id
primary_collection = mongo_connect(config.mongo_host, config.mongo_port, config.mongo_primary_db, config.mongo_primary_collection)
secondary_collection = mongo_connect(config.mongo_host, config.mongo_port, config.mongo_secondary_db, config.mongo_secondary_collection)
for dict1 in primary_collection.find({},{"_id":1,"title":1}).batch_size(1000):
count = 0
target_id = ''
primary_list = []
secondary_list = []
target_id = dict1['_id']
primary_list.insert(count, dict1)
if (secondary_collection.find_one({"_id":target_id})) is None:
missing_id.append(target_id)
continue
else:
secondary_list.insert(count,secondary_collection.find_one({"_id":target_id}))
compare(primary_list, secondary_list)
def compare(list1, list2):
global alias_dict
global mismatch_id
global missing_id
for l1, l2 in zip(primary_list,secondary_list):
if len(l1) != len(l2):
mismatch_id.append(l1['_id'])
continue
else:
for key, value in l1.items():
if value != l2[alias_dict[key]]:
mismatch_id.append(l1['_id'])
primary()
print "Mismatch id list", mismatch_id
print "Missing Id list", missing_id
print "End time", datetime.datetime.now()
Well you could do this:
db.eval(function(){
db.primary_collection.find({},
{ id: 1, parentid: 1, title: 1 }).forEach(function(doc){
var newDoc = {};
Object.keys(doc).forEach(function(key) {
var newKey = ( key == "_id" ) ? key : "p$" + key;
newDoc[newKey] = doc[key];
});
db.secondary_collection.insert(newDoc);
});
})
Which uses db.eval() to execute the code on the server, which will be as fast as you will get.
But please read the documentation on this as you will be "locking" the database while this operation takes place. And of course you cannot do this across servers if that is your intent.

How should I structure my nested reactivemongo calls in my play2 application?

I'm in the process of trying to combine some nested calls with reactivemongo in my play2 application.
I get a list of objects returned from createObjects. I then loop over them, check if the object exist in the collection and if not insert them:
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
for(object <- objectsReadyForSave) {
collection.find(BSONDocument("cId" -> object.get.cId,"userId" ->
object.userId.get)).cursor.headOption.map { maybeFound =>
maybeFound.map { found =>
Logger.info("Found record, do not insert")
} getOrElse {
collection.insert(object)
}
}
}
Future(Ok(views.html.invite(form)))
}
})
}
I feel that this way is not as good as it can be and feels not "play2" and "reactivemongo".
So my question is: How should I structure my nested calls to get the result I want
and get the information of which objects that have been inserted?
I am not an expert in mongoDB neither in ReactiveMongo but it seems that you are trying to use a NoSQL database in the same way as you would use standard SQL databases. Note that mongoDB is asynchronous which means that operations may be executed in some future, this is why insertion/update operations do not return affected documents. Regarding your questions:
1 To insert the objects if they do not exist and get the information of which objects that have been inserted?
You should probably look at the mongoDB db.collection.update() method and call it with the upsert parameter as true. If you can afford it, this will either update documents if they already exist in database or insert them otherwise. Again, this operation does not return affected documents but you can check how many documents have been affected by accessing the last error. See reactivemongo.api.collections.GenericCollection#update which returns a Future[LastError].
2 For all the objects that are inserted, add them to a list and then return it with the Ok() call.
Once again, inserted/updated documents will not be returned. If you really need to return the complete affected document back, you will need to make another query to retrieve matching documents.
I would probably rewrite your code this way (without error/failure handling):
def dostuff() = Action {
implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
val objectsReadyForSave = createObjects(form.companyId, form.companyName, sms_pattern.findAllIn(form.phoneNumbers).toSet)
Async {
val operations = for {
data <- objectsReadyForSave
} yield collection.update(BSONDocument("cId" -> data.cId.get, "userId" -> data.userId.get), data, upsert = true)
Future.sequence(operations).map {
lastErrors =>
Ok("Documents probably inserted/updated!")
}
}
}
)
}
See also Scala Futures: http://docs.scala-lang.org/overviews/core/futures.html
This is really useful! ;)
Here's how I'd rewrote it.
def dostuff() = Action { implicit request =>
form.bindFromRequest.fold(
errors => BadRequest(views.html.invite(errors)),
form => {
createObjects(form.companyId,
form.companyName,
sms_pattern.findAllIn(form.phoneNumbers).toSet).map(ƒ)
Ok(views.html.invite(form))
}
)
}
// ...
// In the model
// ...
def ƒ(cId: Option[String], userId: Option[String], logger: Logger) = {
// You need to handle the case where obj.cId or obj.userId are None
collection.find(BSONDocument("cId" -> obj.cId.get, "userId" -> obj.userId.get))
.cursor
.headOption
.map { maybeFound =>
maybeFound map { _ =>
logger.info("Record found, do not insert")
} getOrElse {
collection.insert(obj)
}
}
}
There may be some syntax errors, but the idea is there.