IPC: Ramdisk V.S. socket - sockets

I need to transfer huge amount of data between Java and C++ programs under Linux(CentOS). Performance is the first concern.
What will be the best choice? RAMDisk (/dev/shm/) or local socket?

A socket is fastest because the other end can start processing the data (on a separate cpu core) before you have finished sending data.
Say you're sending 100KB of data, the other end can begin processing as soon as it recieves a couple of kilobytes. And by the time all 100KB has been sent, it has probably finished processing 90KB or thereabouts, so it only has 10KB left.
While with a RAM disk, you have to write the entire 100KB before it can even start processing data. Making it about 10x faster to use a socket than a ram disk, assuming both ends need to do about the same amount of work.
Maybe it takes 1 millisecond to write 100KB to a RAM disk and then 1 millisecond to process it. With a socket it would take 1 millisecond to send the data but only 0.1 millisecond to finish processing after all the data has been sent.
The larger the amount of data being sent, the bigger the performance gain for sockets. 10 seconds to write all the data, and another 0.1 millisecond to fnish processing after all data has been sent.
However, a RAM disk is easier to work with. Sockets use streams of data, which is more time consuming in terms of writing the code and debugging/testing it.
Also, don't assume you need a ram disk. Depending on how the operating system has been configured writing 100MB to a spinning platter hard drive might simply write it to a RAM cache and then put it on the hard drive later on. You can read it from the temporary RAM cache immediately without waiting for the data to be written to the HDD. Always test before making performance assumptions. Do not assume a HDD is slower than RAM, because it might be optimised out for you silently.
The mac I'm typing this on, which is UNIX just like CentOS, currently has about 8GB of RAM dedicated to holding copies of files it guesses I'm going to read at some point in the near future. I didn't have to create a RAM disk manually, it just put them in RAM heuristically. CentOS does the same sort of thing, you have to test it to see how fast it actually is.
But sockets are definitely the fastest option, since you do not need to write all the data to start processing it.

Related

Why program is executed on Memory not HardDisk?

when I study in Computer Architecture and System Programming, some question rises up.
First of all, program is in SSD or Hard Disk but when it executed, this load to memory(RAM). Why program is not executed on HardDisk directly?? Why need to load on RAM?
Thanks
This is simply done because your RAM is way faster than your Hard Disk.
When your computer executes a program, the CPU reads all the instructions from memory one after another and executes them. The CPU itself cannot store the whole program while executing it, so it has to be read from somewhere else. If the CPU had to read the instructions from a hard disk, it would be crazy slow.
Now that we have SSDs this is becoming somewhat less relevant, but in the old days the difference between RAM ("Random Access Memory") and HDD ("Hard Disk Drive") was that the RAM could access any memory address at any point in time, thus "Random Access". The HDD would have to rotate the hard disk your data is stored on to read from a certain address. Accessing random memory addresses is very hard for an HDD.
When the CPU executes a program it has to jump around all the time. It also has to store the program's memory somewhere and access that as quickly as possible whenever needed. An HDD is very bad at those two things, a RAM is very good.
So why did we use HDDs at all? Because RAM
is way to expensive
does not persist data when turned off
And what about SSDs? They are a lot better at random access that HDDs, but they're still considerably slower than RAM.
Also, you have to take swap files into account. The computer can use some of your HDD or SSD storage as system memory if it needs to. This can be very useful if the data that's using up your RAM does not get accessed by the CPU very often.

Scala concurrency performance issues

I have a data mining app.
There is 1 Mining Actor which receives and processes a Json containing 1000 objects. I put this into a list and foreach, I log the data by sending it to 1 Logger Actor which logs data into many files.
Processing the list sequentially, my app uses 700MB and takes ~15 seconds of 20% cpu power to process (4 core cpu). When I parallelize the list, my app uses 2GB and ~ the same amount of time and cpu to process.
My questions are:
Since I parallelized the list and thus the computation, shouldn't the compute-time decrease?
I think having only one Logger Actor is a bottleneck in this case. The computation may be faster but the bottleneck hides the speed increase. So if I add more Loggers to the pool, the app time should decrease?
Why does the memory usage jump to 2GB? Does the JVM have to store the entire collection in memory to parallelize it? And after the computation is done, the JVM garbage collector should deal with it?
Without more details, any answer is a guess. However, even a guess might point you to the right direction.
Parallelized execution should decrease the running time but your problem might lie elsewhere. For some reason, your CPU is idling a lot even in the single-threaded mode. You do not specify whether you read the input from disk or the network or where you write your output to. You explicitly say that you write logs to a lot of files. Disk and network reading/writing might in your case take much longer than data processing. Most probably your process is idle due to this I/O waiting. You should not expect any speedups from parallelizing a job that spends 80% of its time waiting on I/O. I therefore also suspect that loggers are not the bottleneck here.
The memory usage might jump if your threads allocate a lot of memory each. In that case, the more threads you have the more memory will be required. I don't know what kind of collection you are parallelizing on, but most are stored in memory, completely. Yes, the garbage collector will free any resources that do not require you to explicitly free them, such as files.
How many threads for reading and writing to the hard disk?
The memory increases because I send messages faster than the Logger can write, so the Mailbox balloons in size until the Logger has processed the messages and the GC kicks in.
I solved this by writing state to a protocol buffer file. Before doing any writes, I compare with the protobuf file because reads are significantly cheaper than writes. My resource usage is now 10% for 2 seconds, and less than 400MB RAM.

Most efficient memory type for kdb+

I am currently configuring a server that will run a kdb+ tickerplant with several subscription processes. Is there an optimal physical memory type for realtime kdb data?
Checkout the type sizes at http://code.kx.com/q/ref/card/#datatypes
Answer depends on what you mean by "efficient" - by the far the largest hit you take in latency is memory allocation, so the less you have to allocate the better. That means smaller types.
But of course you have to weigh that up against your use cases.
For your realtime always make sure the tickerplant inserts the time column so that #s is maintained on the time column for efficient querying.
The tickerplant itself publishes on a timer - the longer the timer the less hit on cpu, but then the tp is collecting data for a while before publishing. Again, weigh up against use cases. BTW make sure your tickerplant is writing the log file to a fast local disk so as to decrease pub delay and iowait.
If you're operating high load from multiple sources, consider OS tweaks too like tcp quickack ( http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/). There's similar tweaks for memory allocation and disk i/o.

PostgreSQL Table in memory

I created a database containing a total of 3 tables for a specific purpose. The total size of all tables is about 850 MB - very lean... out of which one single table contains about 800 MB (including index) of data and 5 million records (daily addition of about 6000 records).
The system is PG-Windows with 8 GB RAM Windows 7 laptop with SSD.
I allocated 2048MB as shared_buffers, 256MB as temp_buffers and 128MB as work_mem.
I execute a single query multiple times against the single table - hoping that the table stays in RAM (hence the above parameters).
But, although I see a spike in memory usage during execution (by about 200 MB), I do not see memory consumption remaining at at least 500 MB (for the data to stay in memory). All postgres exe running show 2-6 MB size in task manager. Hence, I suspect the LRU does not keep the data in memory.
Average query execution time is about 2 seconds (very simple single table query)... but I need to get it down to about 10-20 ms or even lesser if possible, purely because there are just too many times, the same is going to be executed and can be achieved only by keeping stuff in memory.
Any advice?
Regards,
Kapil
You should not expect postgres processes to show large memory use, even if the whole database is cached in RAM.
That is because PostgreSQL relies on buffered reads from the operating system buffer cache. In simplified terms, when PostgreSQL does a read(), the OS looks to see whether the requested blocks are cached in the "free" RAM that it uses for disk cache. If the block is in cache, the OS returns it almost instantly. If the block is not in cache the OS reads it from disk, adds it to the disk cache, and returns the block. Subsequent reads will fetch it from the cache unless it's displaced from the cache by other blocks.
That means that if you have enough free memory to fit the whole database in "free" operating system memory, you won't tend to hit the disk for reads.
Depending on the OS, behaviour for disk writes may differ. Linux will write-back cache "dirty" buffers, and will still return blocks from cache even if they've been written to. It'll write these back to the disk lazily unless forced to write them immediately by an fsync() as Pg uses at COMMIT time. When it does that it marks the cached blocks clean, but doesn't flush them. I don't know how Windows behaves here.
The point is that PostgreSQL can be running entirely out of RAM with a 1GB database, even though no PostgreSQL process seems to be using much RAM. Having shared_buffers too high just leads to double-caching and can reduce the amount of RAM available for the OS to cache blocks.
It isn't easy to see exactly what's cached in RAM because Pg relies on the OS cache. That's why I referred you to pg_fincore.
If you're on Windows and this won't work, you really just have to rely on observing disk activity. Does performance monitor show lots of uncached disk reads? Does operating system memory monitoring show lots of memory used for disk cache in the OS?
Make sure that effective_cache_size correctly reflects the RAM used for disk cache. It will help PostgreSQL choose appropriate query plans.
You are making the assumption, without apparent evidence, that the query performance you are experiencing is explained by disk read delays, and that it can be improved by in-memory caching. This may not be the case at all. You need to look at explain analyze output and system performance metrics to see what's going on.

MongoDB Stops Responding During Background Flush

Mongodb Background Flushing blocks all the requests:
Server: Windows server 2008 R2
CPU Usage: 10 %
Memory: 64G, Used 7%, 250MB for Mongod
Disk % Read/Write Time: less than 5% (According to Perfmon)
Mongodb Version: 2.4.6
Mongostat Normally:
insert:509 query:608 update:331 delete:*0 command:852|0 flushes:0 mapped:63.1g vsize:127g faults:6449 locked db:Radius:12.0%
Mongostat Before(maybe while) Flushing:
insert:1 query:4 update:3 delete:*0 command:7|0 flushes:0 mapped:63.1g vsize:127g faults:313 locked db:local:0.0%
And Mongostat After Flushing:
insert:1572 query:1849 update:1028 delete:*0 command:2673|0 flushes:1 mapped:63.1g vsize:127g faults:21065 locked db:.:99.0%
As you see when flushes happening lock is 99% just at this point mongod stops responding any read/write operation (mongotop and mongostat also stop). The flushing takes about 7 to 8 seconds to complete which does not increase disk load more than 10%.
Is there any suggestions?
Under Windows server 2008 R2 (and other versions of Windows I would suspect, although I don't know for sure), MongoDB's (2.4 and older) background flush process imposes a global lock, doing substantial blocking of reads and writes, and the length of the flush time tends to be proportional to the amount of memory MongoDB is using (both resident and system cache for memory-mapped files), even if very little actual write activity is going on. This is a phenomenon we ran into at our shop.
In one replica set where we were using MongoDB version 2.2.2, on a host with some 128 GBs of RAM, when most of the RAM was in use either as resident memory or as standby system cache, the flush time was reliably between 10 and 15 seconds under almost no load and could go as high as 30 to 40 seconds under load. This could cause Mongo to go into long pauses of unresponsiveness every minute. Our storage did not show signs of being stressed.
The basic problem, it seems, is that Windows handles flushing to memory-mapped files differently than Linux. Apparently, the process is synchronous under Windows and this has a number of side effects, although I don't understand the technical details well enough to comment.
MongoDb, Inc., is aware of this issue and is working on optimizations to address it. The problem is documented in a couple of tickets:
https://jira.mongodb.org/browse/SERVER-13444
https://jira.mongodb.org/browse/SERVER-12401
What to do?
The phenomenon is tied, to some degree, to the minimum latency of the disk subsystem as measured under low stress, so you might try experimenting with faster disks, if you can. Some improvements have been reported with this approach.
A strategy that worked for us in some limited degree is avoiding provisioning too much RAM. It happened that we really didn't need 128 GBs of RAM, so by dialing back on the RAM, we were able to reduce the flush time. Naturally, that wouldn't work for everyone.
The latest versions of MongoDB (2.6.0 and later) seem to handle the
situation better in that writes are still blocked during the long
flush but reads are able to proceed.
If you are working with a sharded cluster, you could try dividing the RAM by putting multiple shards on the same host. We didn't try this ourselves, but it seems like it might have worked. On the other hand, careful design and testing would be highly recommended in any such scenario to avoid compromising performance and/or high availability
We tried playing with syncdelay. Reducing it didn't help (the long flush times just happened more frequently). Increasing it helped a little (there was more time between flushes to get work done), but increasing it too much can exacerbate the problem severely. We boosted the syncdelay to five minutes (300 seconds), at one point, and were rewarded with a background flush of 20 minutes.
Some optimizations are in the works at MongoDB, Inc. These may be available soon.
In our case, to relieve the pressure on the primary host, we periodically rebooted one of the secondaries (clearing all memory) and then failed over to it. Naturally, there is some performance hit due to re-caching, and I think this only worked for us because our workload is write-heavy. Moreover, this technique not in any sense a solution. But if high flush times are causing serious disruption, this may be one way to "reduce the fever" so to speak.
Consider running on Linux... :-)
Background flush by default does not block read/write. mongod does flush every 60s, unless otherwise specified with -syncDelay parameter. syncDelay uses fsync() operation, which can set to block write while in-memory pages flush to disk. A blocked write could have potential to block reads as well. Read more: http://docs.mongodb.org/manual/reference/command/fsync/
However, normally a flush should not take more than 1000ms (1 second). If it does, it is likely the amount of data flushing to disk is too large for your disk to handle.
Solution: upgrade to a faster disk like SSD, or decrease flush interval (try 30s, rather than the default 60s).