Avoid MongoDB read skew in reactive transactions with Spring?

Avoid MongoDB read skew in reactive transactions with Spring? - mongodb

I'm using MongoDB 4.2 with Spring Boot 2.3.1 and I'm looking for a way to avoid read skew in my scenario.
I have a collection named "metadata" and one named "messages". The latter contains messages like this:
{
"aggregateId" : "myAggregateId",
"messageId" : "uuid",
"message" : "some message"
}
and "metadata" contains the version for each "aggregate":
{
"aggregateId" : "myAggregateId",
"version" : NumberLong(40)
}
The reason for not just storing messages in a subarray is among other things that the number of messages per aggregate can be greater than 16Mb (which is the document limit in MongoDB).
When issuing a query I think I'd like to create an interface like this for the users:
public interface MyRepository {
Mono<Aggregate> findByAggregateId(String aggregateId);
}
where Aggregate is defined like this:
public class Aggregate {
private final String aggregateId;
private final int version;
private Flux<Message> messages;
}
The problem now is that I'd like Aggregate to be consistent when reading! I.e. if there are writes to the same aggregate before messages are subscribed to then I don't want the new messages to be included (those written after I've subscribed to Mono<Aggregate>).
Let's look at an example. This is one attempt at an implementation:
public Mono<Aggregate> findByAggregateId(String aggregateId) {
return transactionalOperator.execute(status ->
reactiveMongoTemplate.findOne(query(where("aggregateId").is(aggregateId)), Document.class, "metadata")
.map(metadata -> {
Aggregate aggregate = new Aggregate(metadata.getString("aggregateId"), metadata.getLong("version"));
Flux<Message> messages = reactiveMongoTemplate.find(query, Message.class, "messages");
aggregate.setMessages(messages);
return aggregate;
})
);
}
I totally understand that this won't work since the messages Flux is not subscribed to in the transaction. But I can't figure out how I should combine the outer Aggregate that is a Mono and an inner Flux (messages) and retain the non-blocking capabilities AND consistency (i.e. avoid read skew)?
One approach would be to change the Aggregate class to this:
public class Aggregate {
private final String aggregateId;
private final int version;
private Stream<Message> messages;
}
and change the findByAggregateId implementation to this:
public Mono<Aggregate> findByAggregateId(String aggregateId) {
return transactionalOperator.execute(status ->
reactiveMongoTemplate.findOne(query(where("aggregateId").is(aggregateId)), Document.class, "metadata")
.flatMap(metadata -> {
Aggregate aggregate = new Aggregate(metadata.getString("aggregateId"), metadata.getLong("version"));
Stream<Message> messages = reactiveMongoTemplate.find(query, Message.class, "messages").toStream();
aggregate.setMessages(messages);
return aggregate;
})
);
}
but calling toStream is a blocking operation so this is not right.
So what is the correct way to deal with this?

Related

SpringBoot ReactiveMongoTemplate updating document partially

I am working on a kotlin reactive spring-boot mongodb project. I'm trying to update a document but it does not work well.
My problem is pretty similar to the following question in stackoverflow.
Spring reactive mongodb template update document partially with objects
So I have a document in mongo
{
"id": 1,
"name": "MYNAME",
"email": "MYEMAIL",
"encryptedPassword": "12345",
...........................
}
And when I call PATCH on the uri localhost:8080/user/1 with the one of following header
{
"name": "NEW NAME"
}
{
"email": "NEW EMAIL"
}
I want to update my document with received fields only.
My handler code
fun update(serverRequest: ServerRequest) =
userService
.updateUser(serverRequest.pathVariable("id").toLong(), serverRequest.bodyToMono())
.flatMap {
ok().build()
}
My Service Implement code
override fun updateUser(id: Long, request: Mono<User>): Mono<UpdateResult> {
val changes = request.map { it -> PropertyUtils.describe(it) }
val updateFields: Update = Update()
changes.subscribe {
for (entry in it.entries) {
updateFields.set(entry.key, entry.value)
}
}
return userRepository.updateById(id, updateFields)
}
My repository code
fun updateById(id: Long, partial: Update) = template.updateFirst(Query(where("id").isEqualTo(id)), partial, User::class.java)
My user code
#Document
data class User(
#Id
val id: Long = 0,
var name: String = "",
val email: String = "",
val encryptedPassword: ""
)
I have followed the advice that Spring reactive mongodb template update document partially with objects gave.
My code do updates, but it updates to the initial constructor of my User class.
Could anyone help with this?

I guess you should consider this problem as a general problem of patching an object in Java/Kotlin. I found an article about this: https://cassiomolin.com/2019/06/10/using-http-patch-in-spring/#json-merge-patch. Even if you won't update partially an object, there should not be such a big impact on performance of your application.

I figured out how to partially update my data.
First I changed the body request to string. (using bodyToMono(String::class.java)
Then I changed the changed JSON string to JSONObject(org.json).
And for each of its JSONObject's key I created Update that will be the partial data to update my entity.
Following is how I implemented this.
override fun updateUser(id: Long, request: Mono<String>): Mono<UpdateResult> {
val update = Update()
return request.map { JSONObject(it) }
.map {
it.keys().forEach { key -> update.set(key, it[key]) }
update
}
.flatMap { it -> userRepository.updateById(id, it) }
}
Please share more idea if you have more 'cleaner' way to do this. Thank you

How MongoClient::save(...) might change the _id field of document parameter

I have a class User that embeds a JsonObject to represent the user's fields. This class looks like that:
class User {
private JsonObject data;
public User(...) {
data = new JsonObject();
data.put("...", ...).put(..., ...);
}
public String getID() { return data.getString("_id"); }
// more getters, setters
// DB access methods
public static userSave(MongoClient mc, User user){
// some house keeping
mc.save("users", user.jsonObject(), ar -> {
if(ar.succeeded()) { ... } else { ... }
});
}
}
I've just spent more than half a day trying to figure out why a call to user.getID() sometimes produced the following error: ClassCastException: class io.vertx.core.json.JsonObject cannot be cast to class java.lang.CharSequence. I narrowed down to the userSave() method and more specifically to MongoClient::save() which actually produces a side effect which transforms the data._id from something like
"_id" : "5ceb8ebb9790855fad9be2fc"
into something like
"_id" : {
"$oid" : "5ceb8ebb9790855fad9be2fc"
}
This is confirmed by the vertx documentation which states that "This operation might change _id field of document parameter". This actually is also true for other write methods like inserts.
I came with two solutions and few questions about doing the save() properly while keeping the _id field up to date.
S1 One way to achieve that is to save a copy of the Json Object rather than the object itself, in other words : mc.save("users", user.jsonObject().copy(), ar -> {...});. This might be expensive on the long run.
S2 An other way is to "remember" _id and then to reinsert it into the data object in the if(ar.succeeded()) {data.put("_id", oidValue); ...} section. But as we are asynchronous, I don't think that the interval between save() and the data.put(...) is atomic ?
Q1: Solution S1 make the assumption that the ID doesn't change, i.e., the string 5ceb8ebb9790855fad9be2fc will not change. Do we have a warranty about this ?
Q2: What is the right way to implement the saveUser() properly ?
EDIT: The configuration JSON object user for the creation of the MongoClient is as follows (in case there is something wrong) :
"main_pool" : {
"pool_name" : "mongodb",
"host" : "localhost",
"port" : 27017,
"db_name" : "appdb",
"username" : "xxxxxxxxx",
"password" : "xxxxxxxxx",
"authSource" : "admin",
"maxPoolSize" : 5,
"minPoolSize" : 1,
"useObjectId" : true,
"connectTimeoutMS" : 5000,
"socketTimeoutMS" : 5000,
"serverSelectionTimeoutMS" : 5000
}

How to do raw queries in Laravel Jenssegers Mongodb ORM?

I am using Laravel with jessenger mongodb (https://github.com/jenssegers/laravel-mongodb), and I have the following object collections in mongodb:
User: (id,name,email)
and
Message (from_id,to_id,text)
I need to run the following query over the two collections:
db.User.aggregate([
{
$lookup:
{
from: "Message",
localField: "id",
foreignField: "from_id",
as: "user_message"
}
}
,{$match:{id:1}}
])
What I am wondering is how to do this in jesenger laravel (ORM/Object-oriented style)... Normally I would do Message::where('from_id', '=', $user->_id)->get(['to_id']) .... etc... but how do I run or translate the above query? thanks.

In your users model you can define the relationship to messages using embedsMany:
/**
* messages that were received by the user
*/
public function messagesSent()
{
return $this->hasMany(User::class, 'from_id');
}
/**
* Messages that were sent to the user
*/
public function messagesReceived()
{
return $this->hasMany(User::class, 'to_id');
}
You will also want to have the relationships on the messages model to indicate who the sender/receiver were, so that you can eager load those:
public function sender()
{
return $this->belongsTo(User::class, 'from_id');
}
You may as well define the receiver as well:
public function receiver()
{
return $this->belongsTo(User::clas, 'to_id');
}
Now if you want, say, latest messages from top to bottom, don't use the User model to define this, start with the Message model so you're not stuck trying to order by a relationship, when it's easier in your case not to:
Message::with(['sender'])
->where('from_id', $user->id)
->limit(20)
->orderBy('created_at')
->take(50);
If you just want all messages for the user with their senders relationship loaded:
User::with(['messagesReceived', 'messagesReceived.sender'])
->where('id', $user_id)
->firstOrFail();

How to search for list of keyword with Spring Data Elasticsearch?

I am using ElasticsearchRepository and I want to search some keywords. What I want to query is like;
//Get all results which contains at least one of these keywords
public List<Student> searchInBody(List<String> keywords);
I have already created a query for single keyword and It works but I don't know how to create a query for multiple keywords. Is there any way to do this?
#Repository
public interface StudentRepository extends
ElasticsearchRepository<Student, String> {
public List<Student> findByNameOrderByCreateDate(String name);
#Query("{\"query\" : {\"match\" : {\"_all\" : \"?0\"}}}")
List<ParsedContent> searchInBody(String keyword);
}

Yes, You can pass an array of String objects in ElasticsearchRepository.
Elasticsearch provides terms query for that.
Also You have to use JSONArray instead of List<String> i.e. you have to convert your List<String> to JsonArray. (Reason: check syntax of elastic query provided below)
Here is how you can use it in your code:
#Query("{\"bool\": {\"must\": {\"terms\": {\"your_field_name\":?0}}}}")
List<ParsedContent> searchInBody(JSONArray keyword);
Result will contain objects with atleast one keyword provided in your keyword array.
Following is rest request representation of above java code that you can use in your kibana console or in terminal:
GET your_index_name/_search
{
"query" : {
"bool": {
"must": {
"terms": {
"your_field_name":["keyword_1", "keyword_2"]
}
}
}
}
}
Note: For more options, You can check terms-set-query

How to pagination with spring data mongo #DBRef

I want to pagination with Spring Data Mongo.From docs spring data mongo can do :
public interface TwitterRepository extends MongoRepository<Twitter, String> {
List<Twitter> findByNameIn(List<String> names, Pageable pageable);
}
If Twitter Document Object like this:
#Document
public class Twitter {
String name;
#DBRef
List<Comment> comments
}
Does spring data mongo support pagination with comments?

Note: The code specified is not tested, it will just serve as a pointer for you
The following mongo query limits the the array size to be returned:
db.Twitter.find( {}, { comments: { $slice: 6 } } )
The above mechanism can be used to enforce pagination like so:
db.Twitter.find( {}, { comments: { $slice: [skip, limit] } } )
You can try by annotating your method
#Query(value="{ 'name' : {'$in': ?0} }", fields="{ 'comments': { '$slice': [?1,?2] } }")
List<Twitter> findByNameIn(List<String> names, int skip, int limit);
}
You can specify that in your query like so:
Query query = new Query();
query.fields().slice("comments", 1, 1);
mongoTemplate.find(query, DocumentClass.class);
or you can try and execute the command directly using:
mongoTemplate.executeCommand("db.Twitter.find( {}, { comments: { $slice: [skip, limit] } } )")
General Pagination Mechanisms:
General Pagination mechanisms only work at the document level, examples of which are given below.
For them you will have to manually splice the returned comments at the application level.
If you using the MongoTemplate class (Spring-Data Docs) then:
Use org.springframework.data.mongodb.core.query.Query class's skip() and limit() method to perform pagination
Query query = new Query();
query.limit(10);
query.skip(10);
mongoTemplate.find(query, DocumentClass.class);
If you are using Repository (Spring-Data-Reposioty) then use PagingAndSortingRepository

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Avoid MongoDB read skew in reactive transactions with Spring? - mongodb

Related

SpringBoot ReactiveMongoTemplate updating document partially

How MongoClient::save(...) might change the _id field of document parameter

How to do raw queries in Laravel Jenssegers Mongodb ORM?

How to search for list of keyword with Spring Data Elasticsearch?

How to pagination with spring data mongo #DBRef

Categories

Resources