Run stable baselines with limited memory - stable-baselines

I am trying to train a SAC agent with stable baselines over an environment (Walker2d from mujoco) fam for 5M time steps.
model = SAC("MlpPolicy", env, verbose=1, **hyperparam)
save_every = 5000000
model.learn(total_timesteps=save_every, log_interval=save_every//10,)
However, this procedure suffers from a memory leak since SAC is saving something at every step, and the memory keeps growing and growing. Is there a way to tell stable baselines to limit the memory used?

Related

Ballpark value for `--jobs` in `pg_restore` command

I'm using pg_restore to restore a database to its original state for load testing. I see it has a --jobs=number-of-jobs option.
How can I get a ballpark estimate of what this value should be on my machine? I know this is dependent on a bunch of factors, e.g. machine, dataset, but it would be great to get a conceptual starting point.
I'm on a MacBook Pro so maybe I can use the number of physical CPU cores:
sysctl -n hw.physicalcpu
# 10
If there is no concurrent activity at all, don't exceed the number of parallel I/O requests your disk can handle. This applies to the COPY part of the dump, which is likely I/O bound. But a restore also creates indexes, which uses CPU time (in addition to I/O). So you should also not exceed the number of CPU cores available.

How to calculate the number of CPU, memory and storage that my Google Cloud SQL needs

My DB is reaching the 100% of CPU utilization and increasing the number of CPU is not working anymore.
What kind of information should I consider to create my Google Cloud SQL? How do you set up the DB configuration?
Info I have:
For 10-50 minute a day I have 120 request/seconds and the CPU reaches 100% of utilization
Memory usage is the maximum 2.5GB during this critical period
Storage usage is currently around 1.3GB
Current configuration:
vCPUs: 10
Memory: 10 GB
SSD storage: 50 GB
Unfortunately, there is no magic formula for determining the correct database size. This is because queries have variable load - some are small and simple and take no time at all, others are complex or huge and take lots of resources to complete.
There are generally two strategies to dealing with high load: Reduce your load (use connection pooling, optimize your queries, cache results), or increase the size of your database (add additional CPUs, Storage, or Read replicas).
Usually, when we have CPU utilization, it is because the CPU is overloaded or we have too many database tables in the same instances. Here are some common issues and fixes provided by Google’s documentation:
If CPU utilization is over 98% for 6 hours, your instance is not properly sized for your workload, and it is not covered by the SLA.
If you have 10,000 or more database tables on a single instance, it could result in the instance becoming unresponsive or unable to perform maintenance operations, and the instance is not covered by the SLA.
When the CPU is overloaded, it is recommended to use this documentation to view the percentage of available CPU your instance is using on the Instance details page in the Google Cloud Console.
It is also recommended to monitor your CPU usage and receive alerts at a specified threshold, set up a Stackdriver alert.
Increasing the number of CPUs for your instance should reduce the strain of your instance. Note that changing CPUs requires an instance restart. If your instance is already at the maximum number of CPUs, shard your database to multiple instances.
Google has this very interesting documentation about investigating high utilization and determining whether a system or user task is causing high CPU utilization. You could use it to troubleshoot your instance and find what's causing the high CPU utilization.

why is kdb process showing high memory usage on system?

I am running into serious memory issues with my kdb process. Here is the architecture in brief.
The process runs in slave mode (4 slaves). It loads a ton of data from database into memory initially (total size of all variables loaded in memory calculated from -22! is approx 11G). Initially this matches .Q.w[] and close to unix process memory usage. This data set increases by very little in incremental operations. However, after a long operation, although the kdb internal memory stats (.Q.w[]) show expected memory usage (both used and heap) ~ 13 G, the process is consuming close to 25G on the system (unix /proc, top) eventually running out of physical memory.
Now, when I run garbage collection manually (.Q.gc[]), it frees up memory and brings unix process usage close to heap number displayed by .Q.w[].
I am running Q 2.7 version with -g 1 option to run garbage collection in immediate mode.
Why is unix process usage so significantly differently from kdb internal statistic -- where is the difference coming from? Why is "-g 1" option not working? When i run a simple example, it works fine. But in this case, it seems to leak a lot of memory.
I tried with 2.6 version which is supposed to have automated garbage collection. Suprisingly, there is still a huge difference between used and heap numbers from .Q.w when running with version 2.6 both in single threaded (each) and multi threaded modes (peach). Any ideas?
I am not sure of the concrete answer but this is my deduction based on following information (and some practical experiments) which is mentioned on wiki:
http://code.kx.com/q/ref/control/#peach
It says:
Memory Usage
Each slave thread has its own heap, a minimum of 64MB.
Since kdb 2.7 2011.09.21, .Q.gc[] in the main thread executes gc in the slave threads too.
Automatic garbage collection within each thread (triggered by a wsful, or hitting the artificial heap limit as specified with -w on the command line) is only executed for that particular thread, not across all threads.
Symbols are internalized from a single memory area common to all threads.
My observations:
Thread Specific Memory:
.Q.w[] only shows stats of main thread and not the summation of all the threads (total process memory). This could be tested by starting 'q' with 2 threads. Total memory in that case should be at least 128MB as per point 1 but .Q.w[] it still shows 64 MB.
That's why in your case at the start memory stats were close to unix stats as all the data was in main thread and nothing on other threads. After doing some operations some threads might have taken some memory (used/garbage) which is not shown by .Q.w[].
Garbage collector call
As mentioned on wiki, calling garbage collector on main thread calls GC on all threads. So that might have collected the garbage memory from threads and reduced the total memory usage which was reflected by reduced unix memory stats.

Best CPU for GWT compile for a new build server

When building our current project the GWT compiling needs quite a large amount of the overall time (currently ~25min overall, 2/3 gwt compile). We reserched how to optimize that (e.g. here)
however in the end we decided to buy a new build server. GWT compiling is a quite CPU intensive task so we did some tests to analyze the improvement per core:
1 cores = 197s
2 cores = 165s
3 cores = 149s
4 cores = 157s (can be that the last core was busy with other tasks)
Judging from those numbers its seems that adding more cores doesn't necessarily improve performance since those numbers seem to flatten.
1.)
So now i would be interessted if someone of you can confirm / disprove that? So 8 or 12 cores doesn't necessarily make a difference - but the individual cpu speed (mhz) does?
2.)
After seeing some benchmarks our sales tend to buy *ntel xeon - any experience with AMD? (I am more of an AMD guy however currently it seems hard to disregard the benchmarks)
3.) Any other suggestions regarding memory, IO etc are welcome
Update: When we get the new server I'll post the updated numbers...
We are using an AMD FX-8350 (#4.00 Ghz) with a Samsung 830 Pro SSD. and we've set localWorkers=4 as well as -Xmx2048m. Previously we used a Intel XEON E5-2609 (#2.40 Ghz). That reduced compilation time from ~440s down to ~310s.
So we also experienced that raw CPU speed matters most in case of a single compilation process (with localWorkers=4). In case of multiple compilation processes running at the same time on this machine a SSD improves the IO wait time which increases with the count of concurrent compilation processes.
Our current hardware supports up to 4 maven builds at the same time (each one with localWorkers=4) and uses then up to 20GB of RAM. With the increasing count of concurrent builds the build time increases. But it is not a linear increase, so we try to reduce the idle time in periods where not all resources are used by a single maven process (Java class compiling, tests, ...).
As we compared the hardware prices, we decided to buy a consumer PC used as a slave in our Jenkins buildfarm. The overall price is much cheaper than server hardware and can easily replaced with a new one in case of a hardware failure.

How to keep 32 bit mongodb memory usage down on changing dataset

I'm using MongoDB on a 32 bit production system, which sucks but it's out of my control right now. The challenge is to keep the memory usage under ~2.5GB since going over this will cause 32 bit systems to crash.
According to the mongoDB team, the best way to track the memory usage is to use your operating system's process tracking system (i.e. ps or htop on Unix systems; Process Explorer on Windows.) for virtual memory size.
The DB mainly consists of one table which is continually cycling data, i.e. receiving data at regular intervals from sensors, and every day a cron job wipes all data from before the last 3 days. Over a period of time, the memory usage slowly increases. I took some notes over time using db.serverStats(), db.lectura.totalSize() and ps, shown in the chart below. Note that the size of the table in question has reduced in the last month but the memory usage increased nonetheless.
Now, there is some scope for adjustment in how many days of data I store. Today I deleted basically half of the data, and then restarted mongodb, and yet the mem virtual / mem mapped and most importantly memory usage according to ps have hardly changed! Why do these not reduce when I wipe data (and restart)? I read some other questions where people said that mongo isn't really using all the memory that it might appear to be using, and that you can't clear the cache or limit memory use. But then how can I ensure I stay under the 2.5GB limit?
Unless there is a way to stem this dataset-size-irrespective gradual increase in memory usage, it seems to me that the 32-bit version of Mongo is unuseable. Note: I don't mind losing a bit of performance if it solves the problem.
To answer regarding why the mapped and virtual memory usage does not decrease with the deletes, the mapped number is actually what you get when you mmap() the entire set of data files. This does not shrink when you delete records, because although the space is freed up inside the data files, they are not themselves reduced in size - the files are just more empty afterwards.
Virtual will include journal files, and connections, and other non-data related memory usage also, but the same principle applies there. This, and more, is described here:
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
So, the 2GB storage size limitation on 32-bit will actually apply to the data files whether or not there is data in them. To reclaim deleted space, you will have to run a repair. This is a blocking operation and will require the database to be offline/unavailable while it was run. It will also need up to 2x the original size in terms of free disk space to be able to run the repair, since it essentially represents writing out the files again from scratch.
This limitation, and the problems it causes, is why the 32-bit version should not be run in production, it is just not suitable. I would recommend getting onto a 64-bit version as soon as possible.
By the way, neither of these figures (mapped or virtual) actually represents your resident memory usage, which is what you really want to look at. The best way to do this over time is via MMS, which is the free monitoring service provided by 10gen - it will graph virtual, mapped and resident memory for you over time as well as plenty of other stats.
If you want an immediate view, run mongostat and check out the corresponding memory columns (res, mapped, virtual).
In general, when using 64-bit builds with essentially unlimited storage, the data will usually greatly exceed the available memory. Therefore, mongod will use all of the available memory it can in terms of resident memory (which is why you should always have swap configured to the OOM Killer does not come into play).
Once that is used, the OS does not stop allocating memory, it will just have the oldest items paged out to make room for the new data (LRU). In other words, the recycling of memory will be done for you, and the resident memory level will remain fairly constant.
Your options for stretching 32-bit are limited, but you can try some things. The thing that you run out of is address space, and the increases in the sizes of additional database files mean that you would like to avoid crossing over the boundary from "n" files to "n+1". It may be worth structuring your data into more or fewer databases so that you can get the maximum amount of actual data into memory and as little as possible "dead space".
For example, if your database named "mydatabase" consists of the files mydatabase.ns (the namespace file) at 16 MB, mydatabase.0 at 64 MB, mydatabase.1 at 128 MB and mydatabase.2 at 256 MB, then the next file created for this database will be mydatabase.3 at 512 MB. If instead of adding to mydatabase you instead created an additional database "mynewdatabase" it would start life with mynewdatabase.ns at 16 MB and mynewdatabase.0 at 64 MB ... quite a bit smaller than the 512 MB that adding to the original database would be. In fact, you could create 4 new databases for less space than would be consumed by adding a new file to the original database, and because the files are smaller they would be easier to fit into contiguous blocks of memory.
It is a well-known message that 32-bit should not be used for production.
Use 64-bit systems.
Point.