Reusing context.WithTimeout in deferred function - mongodb

The below code snippet (reduced for brevity) from MongoDB's Go quickstart blog post creates context.WithTimeout at the time of connecting with the database and reuses the same for the deferred Disconnect function, which I think is buggy.
func main() {
client, _ := mongo.NewClient(options.Client().ApplyURI("<ATLAS_URI_HERE>"))
ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
_ = client.Connect(ctx)
defer client.Disconnect(ctx)
}
My train of thought-
context.WithTimeout sets a deadline in UNIX time at the point it is created.
So, passing it to Connect makes sense as we want to cancel the process of establishing the connection if it exceeds the time limit (ie, the derived UNIX time).
Now, passing the same ctx to the deferred Disconnect, which will most probably be called late in the future, will result in the ctx's time being in the past. Meaning, it is already expired when the function starts executing. This is not what is expected and breaks the logic as- quoting the doc for Disconnect-
If the context expires via cancellation,
deadline, or timeout before the in use connections have returned, the in use
connections will be closed, resulting in the failure of any in flight read
or write operations.
Please tell me if and how I am wrong and/or missing something.

Your understanding is correct.
In the example it is sufficient because the example just connects to the database, performs some example operation (e.g. lists databases), then main() ends, so running the deferred disconnect with the same context will cause no trouble (the example will/should run well under 10 seconds).
In "real-world" applications this won't be the case of course. So you will likely not use the same context for connecting and disconnecting (unless that context has no timeout).

Related

Non-blocking TCP socket fails to connect using socket2

I am currently in the process of converting some of my code from blocking to non-blocking using the sockets2 crate, however I am running into issues with connecting the socket. The socket always fails to connect before the timeout is exceeded. Despite my attempts to search for examples, I have yet to find any Rust code showing how a non-blocking TCP stream is created.
To give you an idea what I am attempting to do, the code I am currently converting looks looks roughly like this. This gives me no issues and works fine, but it is getting too costly to create a new thread for every socket.
let address = SocketAddr::from(([x, y, z, v], port));
let mut socket = TcpStream::connect_timeout(&address, timeout)?;
At the moment, my code to connect the socket looks like this. Since connect_timeout can only be executed in blocking mode, I use connect instead and regularly poll the socket to check if it is connected. At the moment, I keep getting WouldBlock errors when calling connect, but I do not know what this means. At first I assumed that the connect was proceeding, but returning the result immediately would require blocking so a WouldBlock error was given instead. However, due to the issues getting the socket to connect, I am second guessing those assumptions.
let address = SocketAddr::from(([x, y, z, v], port));
// Create socket equivalent to TcpStream
let socket = Socket::new(Domain::IPV4, Type::STREAM, Some(Protocol::TCP))?;
// Enable non-blocking mode on the socket
socket.set_nonblocking(true)?;
// What response should I expect? Do I need to bind an address first?
match socket.connect(&address.into()) {
Ok(_) => {}
Err(e) if e.kind() == ErrorKind::WouldBlock => {
// I keep getting this error, but I don't know what this means.
// Is non-blocking connect unavailable?
// Do I need to keep trying to connect until it succeeds?
},
// Are there any other types of errors I should be looking for before failing the connection?
Err(e) => return Err(e),
}
I am also unsure what the correct approach is to determine if a socket is connected. At the moment, I attempt to read to a zero length buffer and check if I get a NotConnected error. However, I am unsure what WouldBlock means in this context and I have never gotten a positive response from this approach.
let mut buffer = [0u8; 0];
// I also tried self.socket.peer_addr(), but ran into issues where it returned a positive
// response despite not being connected.
match self.socket.read(&mut buffer) {
Ok(_) => Ok(true),
// What does WouldBlock mean in this context?
Err(e) if e.kind() == ErrorKind::WouldBlock => Ok(false),
Err(e) if e.kind() == ErrorKind::NotConnected => Ok(false),
Err(e) => Err(e),
}
Each socket is periodically checked until an arbitrary timeout is reached to determine if it has connected. So far, no socket has passed the connected before reaching its timeout (20 sec) when connecting to a known-good server. These tests are all performed in a single threaded application on Windows using a known-good server that has been checked to work with the blocking version of my program.
Edit: Here is a minimum reproducible example for this issue. However, it likely won't work if you run it on Rust playground due to network restrictions. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a08c22574a971c0032fd9dd37e10fd94
WouldBlock is the expected error when a non-blocking connect() (or other operation) is successfully started in the background. You can then wait up to your desired timeout interval for the operation to finish (use select() or epoll() or other platform-specific notification to detect this). If the timeout elapses, close the socket and handle the timeout accordingly. Otherwise, check the socket's SO_ERROR option to see if the operation was successful or failed, and act accordingly.
To give you an idea what I am attempting to do, the code I am currently converting looks looks roughly like this. This gives me no issues and works fine, but it is getting too costly to create a new thread for every socket.
This sounds to me strongly like an XY-Problem.
I think you misunderstand what 'nonblocking' means. What it does not mean is that you can simply and without worrying run multiple sockets in parallel. What it does mean is that every operation that would block returns an error instead and you have to retry it at a later time.
Actual non-blocking sockets usually don't get used at enduser level. They are meant for libraries that depend on them and provide some higher level interface for asynchronism. Non-blocking sockets are hard to get right. They need to be paired with events, because otherwise you can only implement them with 100% cpu hungry busy loops, which is most likely not what you want.
There's good news, though! Remember the high-level libraries I talked about that use nonblocking sockets internally? The most famous one right now is called tokio and does exactly what you want. It will require you to learn a programming mechanism called asynchronism, but you will grasp it, I'm sure :)
I recommend this read: https://tokio.rs/tokio/tutorial

Should I pass context.Context to underlying DB methods in Go?

I use semi-code here just to show my intention of whats going on in code and not complicating things here in the question.
I have a main.go file that calls a method that connects to mongoDB database:
mStore := store.NewMongoStore()
In NewMongoStore I have context that client.Connect uses to connect to database:
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
Now in main.go I pass the store to my router controller file this way:
routes.GenericRoute(router, mStore)
In GenericRoute I get the mStore and pass it to function handlers:
func GenericRoute(router *gin.Engine, mStore store.Store) {
router.POST("/users", controllers.CreateUser(mStore))
}
Now in CreateUser I again create a context as below to insert document into MongoDB:
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
insertedId, err := repo.CreateUser(ctx, newUser{"John", "Doe"})
Here I passed context to createUser to insert a new document.
As you see in some parts I have passed context and in some parts I did not. I really do not have any idea what I should do? What is the correct way to work with contexts? Should I always pass context around or it is totally ok to create new contexts like this to not pass context in method parameters.
What is the best practice for this kind of coding? And which one is better from performance point of view?
Based on my experience, Context has two major use cases:
Pass down information. For your question, you might want to generate a request_id for each request and passing it down to the lowest part of your code, and logging this request_id to do error tracing across the whole code base.
This feature is not always useful, for example you want to initialise a MongoDB connection, but it's done during service start-up. At this time there's no meaningful context, a context.Background with timeouts should be good enough.
Be cautious with mutating values retrieved from Context, this could cause concurrent access if you're passing the Context all around.
Auto-cancellation and timeouts. These two features doesn't come from nothing, you need to tune your code to handle these informations from the Context. But most third-party and standard libraries with an Context parameter can handle these two features gracefully (e.g. database libraries, HTTP call libraries). With this feature you can auto reclaim resources once the Context invalidated.
Sometime you'll want to stop this cascading behaviour, for example writing logs in the background goroutine, then you need to create a new context.Background() to avoid these writes get cancelled once upstream context cancelled. context.Background() also clears the information context so sometime you need to extract the context information from the upstream context, and manually append them to this new context.
It's a bit overkill to force a Context parameter to all functions, (there's no point to add Context to a simple greatestCommonDivisor function) but adding a Context parameter to anywhere you need it never hurts. Context has good enough performance, for your use case (HTTP server & database writing), it should not cause visible overhead to your service.
I reached to an interesting answer to my own question, so I prefer to put here for future users if having the same question in mind.
If I pass the SAME context that I have connected to Mongo with to userController and pass it down further to CreateUser function:
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
NOTE: instead of cancelling context in NewMongoStore function I defer cancel() it in main function.
After 10 seconds if you call POST /users you will get context deadline exceeded, so basically you cannot use this context to do other stuff and you have to create new context on each CreateUser call.
So what I have written is fine. I wait for 10 seconds to connect to mongo in my example and 1 second for my insert operation context.

Go Unit Test irrelevant error "hostname resolving error"

I am trying to write unit test for this project
It appears that i need to refactor a lot and currently working on it. In order to test functions in project/api/handlers.go i was trying to write some unit test code however always taking error related with DB initializing. DB is from Psql Docker container. Error says given hostname is not valid, however without testing it works as no problem. Also, for Dockerized postgresql, container name is being used as hostname and this shouldn't be a problem.
The error is:
DB connection error: failed to connect to host=postgresdbT user=postgres database=worth2watchdb: hostname resolving error (lookup postgresdbT: no such host)
Process finished with the exit code 1
Anyway, i did a couple refactor and managed abstracting functions from DB query functions however this error still occurs and i cannot perform the test. So finally i decided to perform a totally blank test within same package simply checks with bcrypt package.
func TestCheckPasswordHash(t *testing.T) {
ret, err := HashPassword("password")
assert.Nil(t, err)
ok := CheckPasswordHash("password", ret)
if !ok {
t.Fail()
}
}
//HashPassword function hashes password with bcrypt algorithm as Cost value and return hashed string value with an error
func HashPassword(password string) (string, error) {
bytes, err := bcrypt.GenerateFromPassword([]byte(password), 4)
return string(bytes), err
}
//CheckPasswordHash function checks two inputs and returns TRUE if matches
func CheckPasswordHash(password, hash string) bool {
err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password))
return err == nil
}
However when I've tried to perform test for only TestCheckPasswordHash function with command of go test -run TestCheckPasswordHash ./api, it still gives same error. Btw, File is handlers_test.go, functions are at handlers.go file, package name is api for both .
There is no contact with any kind of DB related functions however i am having same error again and again. When i run this TestCheckPasswordHash code in another project or at project/util/util_test.go, it checks and passes as expected.
I don't know what to do, it seems that i cannot perform any test in this package unless figure this out.
Thanks in advance. Best wishes.
Was checking your repo, nice implementation, neat an simple good job!
I think your issue is in the init function, please try commenting it out and see if it work for that single test.
Is a bit complex to explain how the init function works without a graph of files as example but you can check the official documentation:
https://go.dev/doc/effective_go#init
PD: if this doesn't work please write me back
I've found the reason why this error occured and now it's solved.
This is partially related with Docker's working principle of how DB_Hostname must be declared but we can do nothing about it. So we need a better approach to overcome.
When go test command executed for (even) a function, Go testing logic checks whole package(not whole project) as per execution order, such as firstly variable declarations and init() function later. In case of calling an external package item, runtime detours that package for the item also.
Even if you are testing only an irrelevant function, you should consider that, before performing the test, Go runtime will evaluate whole package.
In order to prevent this, i wrapped the package level variables(even though they are in another package) that directly calls DB and cache. On initial stage, these can be allocated as a new variable. However, their connection will be ensured by main or main.init()
And now, prior to testing, all relevant variables (in same package or via external) are checked. Let's say if DB agent (redis.client or pgxpool.Pool) is needed, we are creating a new agent, compiler doesn't complain and testing begins. Agent will be operational only by a function call from main or somewhere we want to.
This is a better(may be not best) practice for a more maintainable code. Initialization of dependants should be able to be handled cautiously and in context with the functional needs. At least, with a simple refactoring, problem should be solvable.

Is it safe to invoke a workflow activity from defer() within the workflow code?

func MyWorkflow(ctx Context) (retErr error) {
log := workflow.GetLogger()
log.Info("starting my workflow")
defer func() {
if retErr != nil {
DoActivityCleanupError(ctx, ..)
} else {
DoActivityCleanupNoError(ctx, ...)
}
}
err := DoActivityA(ctx, ...)
if err != nil {
return err
}
...
err := DoActivityB(ctx, ...)
if err != nil {
return err
}
}
Basically there are catchall activities, ActivityCleanupNoError and ActivityCleanupError, that we want to execute whenever the workflow exits (particularly in the error case which we don't want to repeatedly call ActivityCleanupError in all error returns.
Does this work with distributed decision making? For example, if ownership of workflow decision move from one worker to another, is it going to trigger defer on the original worker?
Bonus Question: Does the logger enforce logging only once per workflow run? Even if decisions are moved from one worker to another? Do you expect to see the log line appear in both worker's log? or is there magic behind the scene to prevent this from happening?
Yes it is.
But it's quite complicated to understand why it is safe. This is how to get the conclusion:
In no-sticky mode(without sticky cache), Cadence SDK will always execute workflow code to make(collect) workflow decisions, and release all the goroutines/stack. When releasing them, the defer will be executed which mean the cleanup activity code path will run -- HOWEVER, those decision will be ignored. Hence it will not impact the actual correctness.
In sticky mode, if workflow is not closing, Cadence SDK will be blocking on somewhere; If the workflow is actually closing, then the defer will be executed and the decisions will be collected.
When the sticky cache(goroutines/stack) is evicted, what will happen is exactly the same as 1. so it's also safe.
Does the logger enforce logging only once per workflow run? Even if decisions are moved from one worker to another? Do you expect to see the log line appear in both worker's log? or is there magic behind the scene to prevent this from happening?
Each log line will only appear in the worker that actually executed the code as making decision -- in other words, in non-replay mode. That's the only magic :)

How to deal with ZMQ sockets lack of thread safety?

I've been using ZMQ in some Python applications for a while, but only very recently I decided to reimplement one of them in Go and I realized that ZMQ sockets are not thread-safe.
The original Python implementation uses an event loop that looks like this:
while running:
socks = dict(poller.poll(TIMEOUT))
if socks.get(router) == zmq.POLLIN:
client_id = router.recv()
_ = router.recv()
data = router.recv()
requests.append((client_id, data))
for req in requests:
rep = handle_request(req)
if rep:
replies.append(rep)
requests.remove(req)
for client_id, data in replies:
router.send(client_id, zmq.SNDMORE)
router.send(b'', zmq.SNDMORE)
router.send(data)
del replies[:]
The problem is that the reply might not be ready on the first pass, so whenever I have pending requests, I have to poll with a very short timeout or the clients will wait for more than they should, and the application ends up using a lot of CPU for polling.
When I decided to reimplement it in Go, I thought it would be as simple as this, avoiding the problem by using infinite timeout on polling:
for {
sockets, _ := poller.Poll(-1)
for _, socket := range sockets {
switch s := socket.Socket; s {
case router:
msg, _ := s.RecvMessage(0)
client_id := msg[0]
data := msg[2]
go handleRequest(router, client_id, data)
}
}
}
But that ideal implementation only works when I have a single client connected, or a light load. Under heavy load I get random assertion errors inside libzmq. I tried the following:
Following the zmq4 docs I tried adding a sync.Mutex and lock/unlock on all socket operations. It fails. I assume it's because ZMQ uses its own threads for flushing.
Creating one goroutine for polling/receiving and one for sending, and use channels in the same way I used the req/rep queues in the Python version. It fails, as I'm still sharing the socket.
Same as 2, but setting GOMAXPROCS=1. It fails, and throughput was very limited because replies were being held back until the Poll() call returned.
Use the req/rep channels as in 2, but use runtime.LockOSThread to keep all socket operations in the same thread as the socket. Has the same problem as above. It doesn't fail, but throughput was very limited.
Same as 4, but using the poll timeout strategy from the Python version. It works, but has the same problem the Python version does.
Share the context instead of the socket and create one socket for sending and one for receiving in separate goroutines, communicating with channels. It works, but I'll have to rewrite the client libs to use two sockets instead of one.
Get rid of zmq and use raw TCP sockets, which are thread-safe. It works perfectly, but I'll also have to rewrite the client libs.
So, it looks like 6 is how ZMQ was really intended to be used, as that's the only way I got it to work seamlessly with goroutines, but I wonder if there's any other way I haven't tried. Any ideas?
Update
With the answers here I realized I can just add an inproc PULL socket to the poller and have a goroutine connect and push a byte to break out of the infinite wait. It's not as versatile as the solutions suggested here, but it works and I can even backport it to the Python version.
I opened an issue a 1.5 years ago to introduce a port of https://github.com/vaughan0/go-zmq/blob/master/channels.go to pebbe/zmq4. Ultimately the author decided against it, but we have used this in production (under VERY heavy workloads) for a long time now.
This is a gist of the file that had to be added to the pebbe/zmq4 package (since it adds methods to the Socket). This could be re-written in such a way that the methods on the Socket receiver instead took a Socket as an argument, but since we vendor our code anyway, this was an easy way forward.
The basic usage is to create your Socket like normal (call it s for example) then you can:
channels := s.Channels()
outBound := channels.Out()
inBound := channels.In()
Now you have two channels of type [][]byte that you can use between goroutines, but a single goroutine - managed within the channels abstraction, is responsible for managing the Poller and communicating with the socket.
The blessed way to do this with pebbe/zmq4 is with a Reactor. Reactors have the ability to listen on Go channels, but you don't want to do that because they do so by polling the channel periodically using a poll timeout, which reintroduces the same exact problem you have in your Python version. Instead you can use zmq inproc sockets, with one end held by the reactor and the other end held by a goroutine that passes data in from a channel. It's complicated, verbose, and unpleasant, but I have used it successfully.