Vertx to mongoDB connections - mongodb

I'm working on a Java/vertx project where the backend is MongoDB (I used to work with Elixir/Erlang since some time, and I'm quite new to vertx but I believe it's the best fit). Basically, I have an http API handled by some HttpServerVerticles which need to store data to (or retrieve data from) the mongo db and to send the appropriate reply to the API caller. I'm looking for the right pattern to implement the queries and the handling of the replies.
From the official guide and some tutorials, I see that for a relational JDBC database, it is necessary to define a dedicated verticle that will handle queries asynchronously. This was my first try with the mongo client but it introduces a lot of boilerplate.
On the other hand, from the mongo client documentation I read that it's Completely non-blocking and that it has its own connection pool. Does that mean that we can safely (from vertx event loop point of view), define and use the mongo client directly in the http verticle ?
Is there any alternative pattern ?
Versions : vertx:3.5.4 / mongodb:4.0.3

It's like that: mongo connection pool is exactly like SQL-db pool synchronous and blocking in it's nature, but is wrapped with non-blocking vert.x API around.
So, instead of a normal blocking way of
JsonObject obj = mongo.get( someQuery )
you have rather a non-blocking call out of the box:
mongo.findOne( 'collectionName', someQuery ){ AsyncResult<JsonObject> res ->
JsonObject obj = res.result()
doStuff( obj )
}
That means, that you can safely use it directly on the event-loop in any type of verticle without reinventing the asyncronous wheel over and over again.

At our client we use mongodb-driver-rx. Vertx has support for RX (vertx-rx-java) and it fits pretty well on mongodb-driver-rx.
For more information see:
https://mongodb.github.io/mongo-java-driver-rx/
https://vertx.io/docs/vertx-rx/java/
https://github.com/vert-x3/vertx-examples/blob/master/rxjava-2-examples/src/main/java/io/vertx/example/reactivex/database/mongo/Client.java

Related

Can I assume that all databases will return a promise?

I am designing a set of unit and integration tests with a friend of mine, and we had a doubt. We know the answer, at least, what is more likely to be true. However, we would like to hear your thoughts.
We are designing a test for MongoDB, and we expect that we should receive a promise, after asking to save a document. So far so good.
What about if we change the database??? can we assume for sure that all databases when queried will return a promise??
I guess regarding the _id, it depends from database to database, we are using _id from MongoDB for testing reasons.
We are using the following test using in Jest
create://this method do exist on service, however, at the moment of testing, it is empty, just a placeholder
jest.fn().mockImplementation((cat: CreateCatDto) =>
Promise.resolve({ _id: 'a uuid', ...cat })
)
The idea is to design backends that do not depend on the database, but for testing and developing reasons, we are using MongoDB and PostgreSQL.
Keep in mind the term promise doesn't exist as a concept for all databases, and to provide a conclusive answer to your question for all databases is not possible.
That being said, if by promise you mean the primary key or identity (in general database terms) after inserting new data, then the answer is no, there is no guarantee, not even on PostgreSQL that'll you be able to do that. It's possible for tables, even in PostgreSQL, to exist without those constraints.
Otherwise, if by promise, you mean specifically the concept in a procedural or functional language like JavaScript (as your example code in your update indicates), then yes, you should always receive a promise object if your application code is utilizing asynchronous calls appropriately.
But that would be regardless of what the asynchronous call was to whether it's a database directly (and regardless of what database system that was), an API endpoint, or another piece of application code. Also, in that case, your question (or any follow up questions) would be better suited for StackOverflow.com.
Can I assume that all databases return a promise?
No. Most if not all database wire protocols are synchronous, meaning the client blocks until it gets a response. Even the databases that expose some sort of RESTful APIs are synchronous, because HTTP.
Some client-side drivers may wrap this synchronous logic and exhibit asynchronous behaviour by returning something like JavaScrpt Promises or Java Futures, but it is entirely up to the driver implementation you choose to use.

Can I reuse an connection in mongodb? How this connections actually work?

Trying to do some simple things with mongodb my mind got stuck in something that feels kinda strange for me.
client = MongoClient(connection_string)
db = client.database
print(db)
client.close()
I thought that when make a connection it is used only this one along the rest of the code until the close() method. But it doesn't seem to work that way... I don't know how I ended up having 9 connections when it supposed to be a single one, and even if each 'request' is a connection there's too many of them
For now it's not a big problem, just bothers me the fact that I don't know exactly how this works!
When you do new MongoClient(), you are not establishing just one connection. In fact you are creating the client, that will have a connection pool. When you do one or multiple requests, the driver uses an available connection from the pool. When the use is complete, the connection goes back to the pool.
Calling MongoClient constructor every time you need to talk to the db is a very bad practice and will incur a penalty for the handshake. Use dependency injection or singleton to have MongoClient.
According to the documentation, you should create one client per process.
Your code seems to be the correct way if it is a single thread process. If you don't need any more connections to the server, you can limit the pool size by explicitly specifying the number:
client = MongoClient(host, port, maxPoolSize=<num>).
On the other hand, if the code might later use the same connection, it is better to simply create the client once in the beginning, and use it across the code.

Rust TCP Socket Pooling

I'm writing a client library for a custom TCP-based protocol. I'd like the library to easily permit users to make requests on multiple threads, which is the reason for my question. My library is intended to replace a slow, Python-based implementation which I've identified as a bottleneck in our pipeline, so request throughput is crucial here. So far, I've used tokio's networking facilities so that, once I've got concurrent sockets, I'll be able to execute requests in parallel. After the application opens a socket, the protocol allows it to re-use the connection for subsequent requests.
When a session begins, the applications sends an initial request and receives an authorization token in return, which it must include with future requests. Requests are frequent and small but irregular in their contents, so I'd like to pool my TCP sockets, like (I imagine) a web browser does to support the keep-alive mechanism. I very much want to avoid having each request spin up a new socket -- even though it could re-use the existing authorization token -- because of the delays associated with the TCP's three-way handshake and slow start.
The classic solution to this problem is a connection pool, like you'd use in a web application to connect to a database. The trouble is, I haven't been able to find a good connection pool for sockets, and I'd prefer to avoid introducing the complexity of one I write myself. There's r2d2 (see here), for instance, but that only supports database connections. Meanwhile, tk_pool (here) hasn't been updated in two years, which is not encouraging.
This feels like a common task, so I'm surprised I haven't yet found a simple way to do this. I'm new to Rust's async/await features and tokio, so I may well be missing something essential. Here's the question, simply:
How can I distribute many bits of IO across several sockets, each connected to the same host? Put another way, how can I have a pool of workers take temporary ownership of (or gain a mutable reference to) the first available of a set of equivalent resources?
I'm open to all manner of suggestions, but to avoid making this question opinion based, I think the central question is one of fact: What is the idiomatic, async Rust way to do connection pooling?
Thanks!
Here's some pseudo-code that outlines how I'm imagining my code would look, once I've identified the right way to do this:
struct Session {
pool: ConnectionPool<tokio::net::TcpStream>,
// authorization token, etc.
}
impl Session {
async fn make_request(&mut self, parameters...) -> Result<Response, Error> {
let sock = self.pool.borrow_socket(); // probably requires &mut self. Won't be able to distribute the Session object if it requires mutable references. Will I need to use Cell?
sock.write(format!("request: {}", parameters)).await?;
let results = sock.read().await?;
Ok(parse(results)?)
// sock is dropped, which returns it to the pool; alternatively, maybe you've got to call, e.g., sock.release().
}
}

better solution than timertask with scala in play framework

I need to regularly check the database for updated records. I currently use TimerTask which works fine. However, I've found its efficiency is not good and consumes a lot of server resouces. Is there a solution which can fulfill my requirement but is better?
def checknewmessages() = Action{
request =>
TimerTask(5000){
//code to check database
}
}
I can think of two solutions:
You can use the ReactiveMongo driver for Play which is completely non-blocking and async and capped collection in Mongo DB.
Please see this for an example -
https://github.com/sgodbillon/reactivemongo-tailablecursor-demo
How to listen for changes to a MongoDB collection?
If you are using a database that doesn't support a push mechanisms you can implement that using an Actor by scheduling messages to itself at regular intervals.
If your logic is in your database (stored procedures etc) you could simply create a cron job.
You could also create a command line script that encapsulates the logic and schedule (cron again).
If you have your logic in your web application, you could again create a cron job that simply makes an API call to your app.

MSMQ querying for a specific message

I have a questing regarding MSMQ...
I designed an async arhitecture like this:
CLient - > WCF Service (hosted in WinService) -> MSMQ
so basically the WCF service takes the requests, processes them, adds them to an INPUT queue and returns a GUID. The same WCF service (through a listener) takes first message from queue (does some stuff...) and then it puts it back into another queue (OUTPUT).
The problem is how can I retrieve the result from the OUTPUT queue when a client requests it... because MSMQ does not allow random access to it's messages and the only solution would be to iterate through all messages and push them back in until I find the exact one I need. I do not want to use DB for this OUTPUT queue, because of some limitations imposed by the client...
You can look in your Output-Queue for your message by using
var mq = new MessageQueue(outputQueueName);
mq.PeekById(yourId);
Receiving by Id:
mq.ReceiveById(yourId);
A queue is inherently a "first-in-first-out" kind of data structure, while what you want is a "random access" data structure. It's just not designed for what you're trying to achieve here, so there isn't any "clean" way of doing this. Even if there was a way, it would be a hack.
If you elaborate on the limitations imposed by the client perhaps there might be other alternatives. Why don't you want to use a DB? Can you use a local SQLite DB, perhaps, or even an in-memory one?
Edit: If you have a client dictating implementation details to their own detriment then there are really only three ways you can go:
Work around them. In this case, that could involve using a SQLite DB - it's just a file and the client probably wouldn't even think of it as a "database".
Probe deeper and find out just what the underlying issue is, ie. why don't they want to use a DB? What are their real concerns and underlying assumptions?
Accept a poor solution and explain to the client that this is due to their own restriction. This is never nice and never easy, so it's really a last resort.
You may could use CorrelationId and set it when you send the message. Then, to receive the same message you can pick the specific message with ReceiveByCorrelationId as follow:
message = queue.ReceiveByCorrelationId(correlationId);
Moreover, CorrelationId is a string with the following format:
Guid()\\Number