What is the best way to tell MongoDB has started? - mongodb

I have some automated tests that I run in order to test a MongoDB-related library. In order to do that, I start a Mongo server with a temporary data directory and on an ephemeral port, connect to it, and run some tests.
This leads to a race condition, obviously. So in my first version of these tests, I paused for a fixed amount of time and waited to make sure mongod had time to start before the tests began.
This was frustrating (and inefficient), so I decided to monitor the standard output and wait for a line on mongod's standard output stream matching the regular expression:
/\[initandlisten\] waiting for connections/
This got it working. So good, then I prepared to circle back and try to find a more robust way to do it. I recalled that a Java library called "embedmongo" ran MongoDB-based tests, and figured it must solve the problem. And it does this (GitHub):
protected String successMessage() {
return "waiting for connections on port";
}
... and uses that to figure out whether the process has started correctly.
So, are we right? Is examining the mongod process output log (is it ever internationalized? could the wording of the message ever change?) the very best way to do this? Or is there something more robust that we're both missing?

What we do in a similar scenario is:
Try to connect to the configured port (simply new Socket(host, port)) in a loop until it works (10 ms delay) - this ensures, that the mongo client, which starts an internal monitoring thread, does not throw exceptions due to "connection refused"
Connect to the mongodb and query something. This is important, as all mongo client objects are lazy init. (Simple listDatabaseNames() on the client is enough, but make sure to actually read the result.)
All the time, check the process for not being terminated.

I just wrote a small untilMongod command which does just that, which you can use in bash scripting: https://github.com/FGM/untilMongod
Includes a bash + Node.JS example use case.

Related

Perl IO::Socket::UNIX Connect with Timeout gives EAGAIN/EWOULDBLOCK

Ubuntu Linux, 2.6.32-45 kernel, 64b, Perl 5.10.1
I connect many new IO::Socket::UNIX stream sockets to a server, and mostly they work fine. But sometimes in a heavily threaded environment on a faster processor, they return "Resource temporarily unavailable" (EAGAIN/EWOULDBLOCK). I use a timeout on the Connect, so this causes the sockets to be put into non-blocking mode during the connect. But my timeout period isn't occurring - it doesn't wait any noticeable time, it returns quickly.
I see that inside IO::Socket, it tries the connect, and if it fails with EINPROGRESS or EAGAIN/EWOULDBLOCK, it does a select to wait for the write bit to be set. This seems normal so far. In my case the select quickly succeeds, implying that the write bit is set, and the code then tries a re-connect. (I guess this is an attempt to get any error via error slippage?) Anyway, the re-connect fails again with the EAGAIN/EWOULDBLOCK.
In my code this is easy to fix with a re-try loop. But I don't understand why, when the socket becomes writeable, that the socket is not re-connectable. I thought the select guard was always sufficient for a non-blocking connect. Apparently not; so my questions are:
What conditions cause the connect to fail when the select works (the write bit gets set)?
Is there a better way than spinning and retrying, to wait for the connect to succeed? The spinning is wasting cycles. Instead I'd like it to block on something like a select/poll, but I still need a timeout.
Thanx,
-- Steve
But I don't understand why, when the socket becomes writeable, that the socket is not re-connectable.
I imagine it's because whatever needed resource became free got snatched up before you were able to connect again. Replacing the select with a spin loop would not help that.

Memcached memory leak

i'm building some application, where i have to use memcached.
I found quite nice client:
net.spy.memcached.MemcachedClient
Under this cliend everything works greate except one - i have problem with close connection, and after a while i'm startign to fight with memory leak.
I was looking for possibility for close connection, and i foud "shutdown" method. But if i use this method like this:
MemcachedClient c = new MemcachedClient(new InetSocketAddress(
memcachedIp, memcachedPort));
c.set(something, sessionLifeTime, memcache.toJSONString());
c.shutdown();
I have problem with adding anything do memcached - in logs i see that this method is opening connection, and before it will add anything to memcached, it's closeing the connection.
Do you have any idea, what to do?
Additionally - i found method: c.shutdown(2, TimeUnit.SECONDS); - which should close connection after 2 seconds, but i have connected jmx monitor to my tomcat and i see, that Memcached thread isn't finished after 2 seconds - this thread isn't finished at all...
The reason you are having an issue adding things to memcached like this is that the set(...) function is asynchronous and all it does is put that operation into a queue to be sent to memcached. Since you call shutdown right after this the operation doesn't actually have time to make it out onto the wire. You need to call set(...).get() in order to make your application thread actually wait for the operation to complete before calling shutdown.
Also, I haven't experience IO threads not dying after calling shutdown with a timeout. One way you can confirm that this is an actual bug is by running a standalone program with Spymemached. If the process doesn't terminate when it's completed then you've found an issue.

perl File::Tail syncronization

im having this situation:
Im parsing some log files with perl daemon. This daemon writes data to mysql db.
Log file can:
be rotated ('solved by filesize and some logic')
doesnt exist ('ignore_nonexistant' parameter in Tail)
Daemon:
Can be killed
Can became dead by some reazon.
Im using File::Tail to tail tha file. For file rotation mechanism of date of creation or filesize can help. and what mechanism should i use to start tail from some position in file? (asume that there is a lot of such daemons, no write access to filesystem).
I've think about position variable in DB, but this wont help me.
Maybe some mechanism to pass position parameter to parrent process?
I just dont want to reinvent bicycle.
File::Tail already detects rotation and continues reading from the new file.
To deal with the daemon dying and restarting, can you query the database for the last record written when the daemon restarts, and just skip logfile lines until you get to a later line?
Try http://search.cpan.org/dist/Log-Unrotate/.
You'll have to implement your own Log::Unrotate::Cursor class if you wish to store position files in DB instead of local filesystem, but that should be trivial.
We wrote and used Log::Unrotate for 5 years in production and it tries really hard to never skip any data. (It tries so hard that it throws exception if your cursor becomes invalid, for example if log got rotated several times while reader didn't work for some reason. You may want to enable autofix_cursor option to change this behavior).
Also take a look at http://search.cpan.org/dist/File-LogReader/. I never used it but it's supposed to solve the same task.

Embedded Linux LED-flashing daemon: does it exist?

I've seen embedded boards before that have an LED that flashes like a heartbeat to show that the board is still executing code. I'd like to do something similar on an embedded Linux board I'm working on. Given that it's a fairly trivial bit of code, it seems likely to me that someone has already written a daemon for Linux that does this, but I haven't been able to find any evidence.
Note that OS X Server's heartbeatd and the High-Availability Linux heartbeat daemon are not what I'm looking for-- they both coordinate system availability over IP networks, or something like that.
Assuming what I'm looking for doesn't exist, I'm also interested in advice about how to write a daemon that toggles a pin while minimizing resource usage. At what update rate does cron become a stupid idea?
(I'd also rather not hear gushing about the LED on the sleeping MacBook Pro, if that seems relevant for some reason.)
Thanks.
The LED heartbeat is a built-in kernel function. Assuming you have a device driver for your LED, turning on the heartbeat is done thus:
$ echo "heartbeat" > /sys/class/leds/MyLed/trigger
To see the list of triggers (MMC activity, heartbeat, etc.)
$ cat /sys/class/leds/MyLed/trigger
See drivers/leds/ledtrig-heartbeat.c and http://www.avrfreaks.net/wiki/index.php/Documentation:Linux/LEDs
The interesting thing about the heartbeat is that the pattern is dynamic. The basic pattern is thump-thump-pause, just like a human heartbeat. But the rate of the heartbeat is controlled by the load average! Light loads beat at about 50 beats per minute. Heavier loads cause faster beating until it maxes out at about 180 bpm.
I wouldn't use the cron. Its just not the right tool. A very simple solution is to just run a
shell script from your inittab.
Example:
#!/bin/sh
while [ true ];
do
logger "blink!" # to be replaced
sleep 1
done
Save this to /bin/blink.sh, add the following line to your inittab and have init reread the tab be running init q.
bl:2345:respawn:/bin/blink.sh
Of course you have to adjust the blink.sh script to your environment. Its highly depended on the
particular board how an LED can be toggled from user space (device driver file, sysfs entry, ....).
If you need something more efficient you might redo the while thing in C but it might not be worth the effort.
One thing to think about is what you want to signal with a pulsing LED. With the approach outlined above we can only show that the board is still alive (kernel is running, the process executing blink.sh is scheduled and blink.sh is doing what it is supposed to do). For some use cases this might be fine but more often you actually want to signal that the application running on an embedded board is still OK (doesn't hang, hasn't crashed, ...). To implement such functionality you need to integrate the code that toggles the LED into the main loop of your application.

Socket Read In Multi-Threaded Application Returns Zero Bytes or EINTR (104)

Am a c-coder for a while now - neither a newbie nor an expert. Now, I have a certain daemoned application in C on a PPC Linux. I use PHP's socket_connect as a client to connect to this service locally. The server uses epoll for multiplexing connections via a Unix socket. A user submitted string is parsed for certain characters/words using strstr() and if found, spawns 4 joinable threads to different websites simultaneously. I use socket, connect, write and read, to interact with the said webservers via TCP on their port 80 in each thread. All connections and writes seems successful. Reads to the webserver sockets fail however, with either (A) all 3 threads seem to hang, and only one thread returns -1 and errno is set to 104. The responding thread takes like 10 minutes - an eternity long:-(. *I read somewhere that the 104 (is EINTR?), which in the network context suggests that ...'the connection was reset by peer'; or (B) 0 bytes from 3 threads, and only 1 of the 4 threads actually returns some data. Isn't the socket read/write thread-safe? I use thread-safe (and reentrant) libc functions such as strtok_r, gethostbyname_r, etc.
*I doubt that the said webhosts are actually resetting the connection, because when I run a single-threaded standalone (everything else equal) all things works perfectly right, but of course in series not parallel.
There's a second problem too (oops), I can't write back to the client who connect to my epoll-ed Unix socket. My daemon application will hang and hog CPU > 100% for ever. Yet nothing is written to the clients end. Am sure the client (a very typical PHP socket application) hasn't closed the connection whenever this is happening - no error(s) detected either. Any ideas?
I cannot figure-out whatever is wrong even with Valgrind, GDB or much logging. Kindly help where you can.
Yes, read/write are thread-safe. But beware of gethostbyname() and getservbyname() if you're using them - they return pointers to static data, and may not be thread-safe.
errno 104 is ECONNREFUSED (not EINTR). Use strerror or perror to get the textual error message (like 'Connection reset by peer') for a particular errno code.
The best way to figure out what's going wrong is often to do very detailed logging - log the results of every operation, plus details like the IP address/port connecting to, the number of bytes read/written, the thread id, and so forth. And, of course, make sure your logging code is thread-safe :-)
Getting an ECONNRESET after 10 minutes sounds like the result of your connection timing out. Either the web server isn't sending the data or your app isn't receiving it.
To test the former, hookup a program like Wireshark to the local loopback device and look for traffic to and from the port you are using.
For the later, take a look at the epoll() man page. They mention a scenario where using edge triggered events could result in a lockup, because there is still data in the buffer, but no new data comes in so no new event is triggered.