Specman: debugging un-accessible memory - specman

We have a huge environment, built from sub environments which are maintained by many users.
When we run a test, we see that we have a GC every 10us, when we use "show mem" we see that we have about 3GB as un-accessible memory, after the GC it's removed.
How can we determine what causes this huge consumption in our memory?
Using iprof mem, didn't give any "big" memory consumer.

Are you using Specman auto GC? you can check that by doing "config mem" at specman prompt and check that the -automatic_gc_settings=STANDARD. if not, try using the auto gc and see if it makes any change. if yes, you may need to increase the process size. are you running in 32 or 64bit mode?
to better understand the problem and assist you, it will be best if you run with SPECMAN_MEMORY_FULL_DEBUG env variable and send Cadence support the resulting log.
If you open a case for cadence support and send me the number, I can assist you further.
Regards,
Semadar
Customer Support Manager #Cadence

Related

Specman memory configuration

I have a server with 20GB of RAM available.
I need to run a regression with Specman, and wish to optimize it, to run at least 5 tests in parallel.
I know my RTL needs a static 2GB memory size, but testbench size varies.
How can I control Specman, in order not to have one test taking the entire memory on the account of the others?
In order to let all 5 simulations use the server's memory without running out of memory is to set the optimal_process_size to 3-4G.
The automatic gc mechanism of specman will do the work and make sure that each process won't run out of memory.
You can set the optimal_process_size paramater in order to control the amount of memory used by the simulator. This way you take control of the GC process.
Use config memory to specify specman optimal and Max process size, for example :
Config mem -max_process_size=2000M;
If needed, use GC debug options to determine optimal parameters for GC threshold, increments and disk usage.
Yoi can setenv SPECMAN_MEMORY_FULL_DEBUG.
This env variable sets debug flag.
This way you can explore your test and set the optimal process size.
Also, try to use 32 bit mode. It usually consumes less memory, though it has overall memory limitations compared to 64 bits mode.

How much time of life has an sd card with Raspbian Linux for ARM (Plate Raspberry Pi)?

Staff, this question is for anyone who believes in Debian linux, more precisely of Raspbian, which is a version to run on the board Raspberry Pi:
As all users of Raspberry Pi should know: The operating system is installed on an SD card. AND the problem is that the SD card is a Flash memory, and this type of memory supports only a limited quantity of write operations.
I would like to know if the Raspbian writes the SD card when it is idle. If this happens, how can I disable?
I found this:
Tips for running Linux on a flash device by David Härdeman
If you are running your NSLU2 on a USB flash key, there are a number
of things you might want to do in order to reduce the wear and tear on
the underlying flash device (as it only supports a limited number of
writes).
Note: this document currently describes Debian etch (4.0) and needs to
be updated to Debian squeeze (6.0) and Debian wheezy (7.0). Some of
the hints may still apply, but some may not.
The ext3 filesystem per default writes metadata changes every five
seconds to disk. This can be increased by mounting the root filesystem
with the commit=N parameter which tells the kernel to delay writes to
every N seconds.
The kernel writes a new atime for each file that has been read which
generates one write for each read. This can be disabled by mounting
the filesystem with the noatime option.
Both of the above can be done by adding e.g. noatime,commit=120,... to /etc/fstab. This can also be done on an
already mounted filesystem by running the command:
mount -o remount,noatime,commit=120 /
The system will run updatedb every day which creates a database of all
files on the system for use with the locate command. This will also
put some stress on the filesystem, so you might want to disable it by
adding
exit 0
early in the /etc/cron.daily/find script.
syslogd will in the default installation sync a lot of log files to
disk directly after logging some new information. You might want to
change /etc/syslog.conf so that every filename starts with a - (minus)
which means that writes are not synced immediately (which increases
the risk that some log messages are lost if your system crashes). For
example, a line such as:
kern.* /var/log/kern.log
would be changed to:
kern.* -/var/log/kern.log
You also might want to disable some classes of messages altogether by
logging them to /dev/null instead, see syslog.conf(5) for details.
In addition, syslogd likes to write -- MARK -- lines to log files
every 20 minutes to show that syslog is still running. This can be
disabled by changing SYSLOGD in /etc/default/syslogd so that it reads
SYSLOGD="-m 0"
After you've made any changes, you need to restart syslogd by running
/etc/init.d/syslogd restart
If you have a swap partition or swap file on the flash device, you
might want to move it to a different part of the disk every now and
then to make sure that different parts of the disk gets hit by the
frequent writes that it can generate. For a swap file this can be done
by creating a new swap file before you remove the old one.
If you have a swap partition or swap file stored on the flash device,
you can make sure that it is used as little as possible by setting
/proc/sys/vm/swappiness to zero.
The kernel also has a setting known as laptop_mode, which makes it
delay writes to disk (initially intended to allow laptop disks to spin
down while not in use, hence the name). A number of files under
/proc/sys/vm/ controls how this works:
/proc/sys/vm/laptop_mode: How many seconds after a read should a
writeout of changed files start (this is based on the assumption that
a read will cause an otherwise spun down disk to spin up again).
/proc/sys/vm/dirty_writeback_centisecs: How often the kernel should
check if there is "dirty" (changed) data to write out to disk (in
centiseconds).
/proc/sys/vm/dirty_expire_centisecs: How old "dirty" data should be
before the kernel considers it old enough to be written to disk. It is
in general a good idea to set this to the same value as
dirty_writeback_centisecs above.
/proc/sys/vm/dirty_ratio: The maximum amount of memory (in percent) to
be used to store dirty data before the process that generates the data
will be forced to write it out. Setting this to a high value should
not be a problem as writeouts will also occur if the system is low on
memory.
/proc/sys/vm/dirty_background_ratio: The lower amount of memory (in
percent) where a writeout of dirty data to disk is allowed to stop.
This should be quite a bit lower than the above dirty_ratio to allow
the kernel to write out chunks of dirty data in one go.
All of the above kernel parameters can be tuned by using a custom init
script, such as this example script. Store it to e.g.
/etc/init.d/kernel-params, make it executable with
chmod a+x /etc/init.d/kernel-params
and make sure it is executed by running
update-rc.d kernel-params defaults
Note: Most of these settings reduce the number of writes to disk by
increasing memory usage. This increases the risk for out of memory
situations (which can trigger the dreaded OOM killer in the kernel).
This can even happen when there is free memory available (for example
when the kernel needs to allocate more than one contiguous page and
there are only fragmented free pages available).
As with any tweaks, you are advised to keep a close eye on the amount
of free memory and adapt the tweaks (e.g. by using less aggressive
caching and increasing the swappiness) depending on your workload.
This article has been contributed by David Härdeman
Go back to the Debian on NSLU2 page.
http://www.cyrius.com/debian/nslu2/linux-on-flash/
Someone has some more tip?
I have been using various raspberry pi setups and haven't had SD card troubles to date (fingers crossed). That being said, there is a bit of evidence for SD card lifespan related issues
A quick google search does show a few more tips though:
Bigger is better - reduces the load on specific sections
Write to ram for temp
Only store the boot partition on SD card and leave the OS on USB drive
(http://www.makeuseof.com/tag/extend-life-raspberry-pis-sd-card/)
Anyway, it'll be interesting to hear from someone who has a raspberry cluster or some such on their SD card lifespans!
(https://resin.io/blog/what-would-you-do-with-a-120-raspberry-pi-cluster/)
You can put files in tmpfs after load and write them back before shutdown using script from http://www.observium.org/wiki/Persistent_RAM_disk_RRD_storage
But it can be detrimental:
Tmpfs will destroy all changes on power outage, you must use UPS;
Raspberry Pi RAM is far from big, don't waste it.
If your pi often writes small files this can work for you

Possible to set a maximum of memory consumption on a powershell script in % or megabyte?

I sometimes have scripts which are eating up all the memory. Since I don't want to monitor them all the time or set cpu priority to low manually I am wondering if there is an option to give that specific script a value (maybe in mb) of memory.
Does this option exist?
PowerShell doesn't provide any built-in way to control system resources like the memory used by a script.
Windows does provide a way to limit system resources to groups of processes, you can learn more about that here: http://msdn.microsoft.com/en-us/library/windows/desktop/ms684161(v=vs.85).aspx
If your scripts are consuming too much memory, I'd suggest investigating the memory leak. There are many tools that help track memory leaks. Some are low level (e.g. using !dumpheap from SOS in windbg - http://msdn.microsoft.com/en-us/library/bb190764(v=vs.110).aspx). Others are pretty smart, letting you take multiple snapshots and show you just the newly allocated objects between the snapshots. You can search for ".Net memory profiler" to get an idea of what's available.

Writing a daemon in perl

I'm writing a daemon for a newsletter in Perl.
The daemon will be running 24/7 on the server. It'll have an active connection to postgresql database almost all the time.
I don't have that much experience with Perl so I would love if some of you can share information about the following:
How to limit the RAM. I don't want to get out of ram. As I said this program will be running all the time as a daemon without being stopped.
What should I be aware of when writing such daemons ?
As far as SQL connection - make sure you don't leak memory. Retrieve the least amount of data you need from the query, and ensure that the data structures storing the data go out of scope immediately so garbage collector can reclaim them
Please note that there may be memory leaks you have no control over (e.g. in Postgresql connectivity code). It's been known to happen. The best solution for that problem (short of doing precise memory profiling and fixing the leaks in underlying libraries) is for your daemon to pull a Phoenix - stop doing what it's doing and exec() a new copy of itself.
As far as writing Perl daemons, some resources:
http://www.webreference.com/perl/tutorial/9/
Proc::Daemon - Run Perl program(s) as a daemon process.
Regarding #1: Perl is garbage collected.
The effective meaning of that is you should ensure all references to data are cleaned up when you are done with them, thus allowing the garbage collector to run.
http://perldoc.perl.org/perlobj.html#Two-Phased-Garbage-Collection
One thing to watch for are memory leaks. There’s a very nice thread about memory leaks in Perl already on SO.

What is the fastest way to read 10 GB file from the disk?

We need to read and count different types of messages/run
some statistics on a 10 GB text file, e.g a FIX engine
log. We use Linux, 32-bit, 4 CPUs, Intel, coding in Perl but
the language doesn't really matter.
I have found some interesting tips in Tim Bray's
WideFinder project. However, we've found that using memory mapping
is inherently limited by the 32 bit architecture.
We tried using multiple processes, which seems to work
faster if we process the file in parallel using 4 processes
on 4 CPUs. Adding multi-threading slows it down, maybe
because of the cost of context switching. We tried changing
the size of thread pool, but that is still slower than
simple multi-process version.
The memory mapping part is not very stable, sometimes it
takes 80 sec and sometimes 7 sec on a 2 GB file, maybe from
page faults or something related to virtual memory usage.
Anyway, Mmap cannot scale beyond 4 GB on a 32 bit
architecture.
We tried Perl's IPC::Mmap and Sys::Mmap. Looked
into Map-Reduce as well, but the problem is really I/O
bound, the processing itself is sufficiently fast.
So we decided to try optimize the basic I/O by tuning
buffering size, type, etc.
Can anyone who is aware of an existing project where this
problem was efficiently solved in any language/platform
point to a useful link or suggest a direction?
Most of the time you will be I/O bound not CPU bound, thus just read this file through normal Perl I/O and process it in single thread. Unless you prove that you can do more I/O than your single CPU work, don't waste your time with anything more. Anyway, you should ask: Why on Earth is this in one huge file? Why on Earth don't they split it in a reasonable way when they generate it? It would be magnitude more worth work. Then you can put it in separate I/O channels and use more CPU's (if you don't use some sort of RAID 0 or NAS or ...).
Measure, don't assume. Don't forget to flush caches before each test. Remember that serialized I/O is a magnitude faster than random.
This all depends on what kind of preprocessing you can do and and when.
On some of systems we have, we gzip such large text files, reducing them to 1/5 to 1/7 of their original size. Part of what makes this possible is we don't need to process these files
until hours after they're created, and at creation time we don't really have any other load on the machines.
Processing them is done more or less in the fashion of zcat thosefiles | ourprocessing.(well it's done over unix sockets though with a custom made zcat). It trades cpu time for disk i/o time, and for our system that has been well worth it. There's ofcourse a lot of variables that can make this a very poor design for a particular system.
Perhaps you've already read this forum thread, but if not:
http://www.perlmonks.org/?node_id=512221
It describes using Perl to do it line-by-line, and the users seem to think Perl is quite capable of it.
Oh, is it possible to process the file from a RAID array? If you have several mirrored disks, then the read speed can be improved. Competition for disk resources may be what makes your multiple-threads attempt not work.
Best of luck.
I wish I knew more about the content of your file, but not knowing other than that it is text, this sounds like an excellent MapReduce kind of problem.
PS, the fastest read of any file is a linear read. cat file > /dev/null should be the speed that the file can be read.
Have you thought of streaming the file and filtering out to a secondary file any interesting results? (Repeat until you have a manageble size file).
Basically need to "Divide and conquer", if you have a network of computers, then copy the 10G file to as many client PCs as possible, get each client PC to read an offset of the file. For added bonus, get EACH pc to implement multi threading in addition to distributed reading.
Parse the file once, reading line by line. Put the results in a table in a decent database. Run as many queries as you wish. Feed the beast regularly with new incoming data.
Realize that manipulating a 10 Gb file, transferring it across the (even if local) network, exploring complicated solutions etc all take time.
I have a co-worker who sped up his FIX reading by going to 64-bit Linux. If it's something worthwhile, drop a little cash to get some fancier hardware.
hmmm, but what's wrong with the read() command in C? Usually has a 2GB limit,
so just call it 5 times in sequence. That should be fairly fast.
If you are I/O bound and your file is on a single disk, then there isn't much to do. A straightforward single-threaded linear scan across the whole file is the fastest way to get the data off of the disk. Using large buffer sizes might help a bit.
If you can convince the writer of the file to stripe it across multiple disks / machines, then you could think about multithreading the reader (one thread per read head, each thread reading the data from a single stripe).
Since you said platform and language doesn't matter...
If you want a stable performance that is as fast as the source medium allows for, the only way I am aware that this can be done on Windows is by overlapped non-OS-buffered aligned sequential reads. You can probably get to some GB/s with two or three buffers, beyond that, at some point you need a ring buffer (one writer, 1+ readers) to avoid any copying. The exact implementation depends on the driver/APIs. If there's any memory copying going on the thread (both in kernel and usermode) dealing with the IO, obviously the larger buffer is to copy, the more time is wasted on that rather than doing the IO. So the optimal buffer size depends on the firmware and driver. On windows good values to try are multiples of 32 KB for disk IO. Windows file buffering, memory mapping and all that stuff adds overhead. Only good if doing either (or both) multiple reads of same data in random access manner. So for reading a large file sequentially a single time, you don't want the OS to buffer anything or do any memcpy's. If using C# there's also penalties for calling into the OS due to marshaling, so the interop code may need bit of optimization unless you use C++/CLI.
Some people prefer throwing hardware at problems but if you have more time than money, in some scenarios it's possible to optimize things to perform 100-1000x better on a single consumer level computer than a 1000 enterprise priced computers. The reason is that if the processing is also latency sensitive, going beyond using two cores is probably adding latency. This is why drivers can push gigabytes/s while enterprise software is ends stuck at megabytes/s by the time it's all done. Whatever reporting, business logic and such the enterprise software do can probably also be done at gigabytes/s on two core consumer CPU, if written like you were back in the 80's writing a game. The most famous example I've heard of approaching their entire business logic in this manner is the LMAX forex exchange, which published some of their ring buffer based code, which was said to be inspired by network card drivers.
Forgetting all the theory, if you are happy with < 1 GB/s, one possible starting point on Windows I've found is looking at readfile source from winimage, unless you want to dig into sdk/driver samples. It may need some source code fixes to calculate perf correctly at SSD speeds. Experiment with buffer sizes also.
The switches /h multi-threaded and /o overlapped (completion port) IO with optimal buffer size (try 32,64,128 KB etc) using no windows file buffering in my experience give best perf when reading from SSD (cold data) while simultaneously processing (use the /a for Adler processing as otherwise it's too CPU-bound).
I seem to recall a project in which we were reading big files, Our implementation used multithreading - basically n * worker_threads were starting at incrementing offsets of the file (0, chunk_size, 2xchunk_size, 3x chunk_size ... n-1x chunk_size) and was reading smaller chunks of information. I can't exactly recall our reasoning for this as someone else was desining the whole thing - the workers weren't the only thing to it, but that's roughly how we did it.
Hope it helps
Its not stated in the problem that sequence matters really or not. So,
divide the file into equal parts say 1GB each, and since you are using multiple CPUs, then multiple threads wont be a problem, so read each file using separate thread, and use RAM of capacity > 10 GB, then all your contents would be stored in RAM read by multiple threads.