Optimization Experiment technical error Anylogic - anylogic

I'm trying to minimize the waiting time of people in the queue for a truck, I provided 25 trucks and only 1 is being used, so I did an optimization experiment with the objective of minimizing the waiting time in the queue with requirements of 95% utilization of trucks, so more than one truck at once could deliver people, when I run the optimization experiment it gives me this error: OpenJDK 64 bit server warning there is insufficient memory of for the java runtime environment to continue , although I used maximum available memory of 16343, how to solve this issue in order to give me the best number of trucks?
Thanks

Insufficient memory of for the java runtime simply means there is too much memory required to run the model.
More often than not the issue in AnyLogic is that there are too many agents, especially if you run some parameter variation or optimization experiments that tries to run multiple experiments in parallel which increase the memory requirement
One option is to convert as many agents to Java Classes as possible. First start converting all agents that are not used in Flow Blocks and where you don't need any animation.
Check this blog post for an example and more information
https://www.theanylogicmodeler.com/post/why-use-java-classes-in-anylogic

Related

Best CPU for GWT compile for a new build server

When building our current project the GWT compiling needs quite a large amount of the overall time (currently ~25min overall, 2/3 gwt compile). We reserched how to optimize that (e.g. here)
however in the end we decided to buy a new build server. GWT compiling is a quite CPU intensive task so we did some tests to analyze the improvement per core:
1 cores = 197s
2 cores = 165s
3 cores = 149s
4 cores = 157s (can be that the last core was busy with other tasks)
Judging from those numbers its seems that adding more cores doesn't necessarily improve performance since those numbers seem to flatten.
1.)
So now i would be interessted if someone of you can confirm / disprove that? So 8 or 12 cores doesn't necessarily make a difference - but the individual cpu speed (mhz) does?
2.)
After seeing some benchmarks our sales tend to buy *ntel xeon - any experience with AMD? (I am more of an AMD guy however currently it seems hard to disregard the benchmarks)
3.) Any other suggestions regarding memory, IO etc are welcome
Update: When we get the new server I'll post the updated numbers...
We are using an AMD FX-8350 (#4.00 Ghz) with a Samsung 830 Pro SSD. and we've set localWorkers=4 as well as -Xmx2048m. Previously we used a Intel XEON E5-2609 (#2.40 Ghz). That reduced compilation time from ~440s down to ~310s.
So we also experienced that raw CPU speed matters most in case of a single compilation process (with localWorkers=4). In case of multiple compilation processes running at the same time on this machine a SSD improves the IO wait time which increases with the count of concurrent compilation processes.
Our current hardware supports up to 4 maven builds at the same time (each one with localWorkers=4) and uses then up to 20GB of RAM. With the increasing count of concurrent builds the build time increases. But it is not a linear increase, so we try to reduce the idle time in periods where not all resources are used by a single maven process (Java class compiling, tests, ...).
As we compared the hardware prices, we decided to buy a consumer PC used as a slave in our Jenkins buildfarm. The overall price is much cheaper than server hardware and can easily replaced with a new one in case of a hardware failure.

NUMA awareness of JVM

My question concerns the extent to which a JVM application can exploit the NUMA layout of a host.
I have an Akka application in which actors concurrently process requests by combining incoming data with 'common' data already loaded into an immutable (Scala) object. The application scales well in the cloud, using many dual core VMs, but performs poorly on a single 64 core machine. I presume this is because the common data object resides in one NUMA cell and many threads concurrently accessing from other cells is too much for the interconnects.
If I run 64 separate JVM applications each containing 1 actor then performance is is good again. A more moderate approach might be to run as many JVM applications as there are NUMA cells (8 in my case), giving the host OS a chance to keep the threads and memory together?
But is there a smarter way to achieve the same effect within a single JVM? E.g. if I replaced my common data object with several instances of a case class, would the JVM have the capability to place them on the optimal NUMA cell?
Update:
I'm using Oracle JDK 1.7.0_05, and Akka 2.1.4
I've now tried with the UseNUMA and UseParallelGC JVM options. Neither seemed to have any significant impact on slow performance when using one or few JVMs. I've also tried using a PinnedDispatcher and the thre-pool-executor with no effect. I'm not sure if the configuration is having an effect though, since there seems nothing different in the startup logs.
The biggest improvement remains when I use a single JVM per worker (~50). However, the problem with this appears to be that there is a long delay (up to a couple of min) before the FailureDector registers the successful exchange of 'first heartbeat' between Akka cluster JVMs. I suspect there is some other issue here that I've not yet uncovered. I already had to increase the ulimit -u since I was hitting the default maximum number of processes (1024).
Just to clarify, I'm not trying to achieve large numbers of messages, just trying to have lots of separate actors concurrently access an immutable object.
I think if you sure that problems not in message processing algorithms then you should take in account not only NUMA option but whole env. configuration, starting from JVM version (latest is better, Oracle JDK also mostly performs better than OpenJDK) then JVM options (including GC, memory, concurrency options etc.) then Scala and Akka versions (latest release candidates and milestones can be much better) and also Akka configuration.
From here you can borrow all things that matter to got 50M messages per second of total throughput for Akka actors on contemporary laptops.
Never had chance to run these benchmarks on 64-core server - so any feedback will be greatly appreciated.
From my findings, which can help, current implementations of ForkJoinPool increases message send latency when number of threads in pool increases. It is greatly noticeable for cases when rate of response-request call between actors is high, e. g. on my laptop when increasing pool size from 4 to 64 message send latency of Akka actors for such cases grows up to 2-3x times for most executor services (Scala's ForkJoinPool, JDK's ForkJoinPool, ThreadPoolExecutor).
You can check if there are any differences by running mvnAll.sh with the benchmark.parallelism system variable set to different values.

Can two processes simultaneously run on one CPU core?

Can two processes simultaneously run on one CPU core, which has hyper threading? I learn from the Internet. But, I do not see a clear straight answer.
Edit:
Thanks for discussion and sharing! My purse to post my question here is not to discuss about parallel computing. It will be too big to be discussed here. I just want to know if a multithread application can benefit more from hyper threading than a multi process application. After further reading, I have following as my learning notes.
1) A Hyper-Threading Technology enabled CPU Core has two set of CPU state and Interrupt Logic. Meanwhile, it has only one set of Execution Units and Cache. (I have not study what is pipeline yet)
2) Multi threading benefits from Hyper Threading only if there is latency happen in some executed thread. I think this point can exactly map to the common reason for why and when software programmer use multi thread. If the multi thread application has been optimized. It may not gain any benefit from Hypter threading.
3) If the CPU state maps to process state, I believe Marc is correct that multiple process application can even benefit more from hyper threading technology.
4) When CPU vendor says "thread", it looks like their "thread" is different from thread that I know as a java programmer?
No, a hyperthreaded CPU core still only has a single execution pipeline. Even though it appears as two CPUs to the overlying OS, there's still only ever one instruction being executed at any given time.
Hyperthreading was intended to allow the CPU to continue executing one thread while another thread was stalled waiting for a resource or other operation to complete, without leaving too many stages of the pipeline empty and useless. This goes back to the Pentium 4 days, with its absurdly long pipeline - a stall was essentially catastrophic for efficiency and throughput, and hyperthreading allowed Intel to keep the cpu busy doing other things while it cleaned up from the stall.
While Marc B's answer is pretty much the definitive summary of how HT works, I just want to make a small contribution by linking this article, which should clear up a lot of things about HT: http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/
Short answer, yes.
A single core cpu(a processor), can run 2 or more threads simultaneously. These threads may belong to the one program, or they may belong different programs and thus processes. This type of multithreading is called Simultaneous MultiThreading(SMT).
Information that claims cpu core can execute only one instruction at any given time is also not true. Modern CPUs exploit Instruction Level Parallelism(ILP) by duplicating pipeline resources(e.g 2 ALUs instead of 1). This type of pipeline is called "superscalar" pipeline.
Wikipedia page of Simultaneous Multithreading:
Simultaneous multithreading

Determine maximum number of threads that run on different windows systems

Can anyone tell me if there is a way to find out the maximum number of threads that can run on different windows systems?
For example - (Assumption)A windows 32-bit system can run maximum 4000 threads.
I doubt there is a maximum number. Well, since we're using a finite amount of memory, it would be as many threads as you can fit into memory or as many as you can keep track of. Each system is different and I know Java and C don't have a function to provide this. C# can tell you how much memory a specific object/app needs so you could go calculate the estimate.
You could test this on your system. Write a sample app which spawns threads and see when you run out of memory. Use a counter to count them. This will give you roughly the range for your system.
In Java, you can use an ExecutorService with a thread pool.. Depending on which executor service you use, it can keep spawning threads if you submit more jobs.
A similar technique exists in C#.
A better question is what the maximum number of threads to spawn and avoid thrashing is.
Are you trying to take over the OS and do your own process/thread management? You should not be doing this.

What is your experience with Sun CoolThreads technology?

My project has some money to spend before the end of the fiscal year and we are considering replacing a Sun-Fire-V490 server we've had for a few years. One option we are looking at is the CoolThreads technology. All I know is the Sun marketing, which may not be 100% unbiased. Has anyone actually played with one of these?
I suspect it will be no value to us, since we don't use threads or virtual machines much and we can't spend a lot of time retrofitting code. We do spawn a ton of processes, but I doubt CoolThreads will be of help there.
(And yes, the money would be better spent on bonuses or something, but that's not going to happen.)
IIRC The coolthreads technology is referring to the fact that rather than just ramping up the clock speed ever higher to improve performance they are now looking at multiple core processors with hyperthreading effectively giving you loads of processors on one chip. Overall the processing capacity available is higher but without the additional electrical power and aircon requirements you would expect (hence cool). Its usefulness definitely depends on what you are planning to run on it. If you are running Apache with the multiple threads core it will love it as it can run the individual response threads on the individual cpu cores. If you are simply running single thread processes you will get some performance increases over a single cpu box but not as great (any old fashioned non mod_perl/mod_python CGID processes would still be sharing the the cpu a bit). If your application consists of one single threaded process running maxed out on the box you will get very little improvement on a single core cpu running at the same speed.
Peter
Edit:
Oh and for a benchmark. We compared a T2000 in our server farm to our current V240s (May have been V480's I don't recall) The T2000 took the load of 12-13 of the Older boxes in a live test without any OS tweeking for performance. As I said Apache loves it :-)
Disclosure: I work for Sun (but as an engineer in client software).
You don't necesarily need multithreaded code to make use of these machines. Having multiple processes will make use of multiple hardware threads on multiple cores.
The old T1 processors (T1000 and T2000 boxes) did have only a single FPU, and weren't really suitable for tasks with much more than about 1% floating point. The newer T2 and T2+ processors have an FPU per core. That's probably still not great for massive floating point crunching, but is much more respectable.
(Note: Hyper-Threading Technology is a trademark of Intel. Sun uses the term Chip MultiThreading (CMT).)
We used Sun Fire T2000s for my last system. The boxes themselves were far exceeded our capacity requirements in terms of processing power. For us the decision was based on the lower power consumption and space requirement. We successfully ran WebSphere 6, Oracle 10g and SunONE Directory server on the same box.
My info may be a bit out of date (last used these servers 2 years ago) but as I recall one big gotcha was that all the cores on a single CPU all shared the same FPU unit, so if your code did a lot of floating point (we were doing GIS) the FPU was a massive bottleneck and you didn't get much benefit from the large number of threads.
For any process with high parallelism these machines (eg, the t1000/t2000) are great for their cost. I've been running oracle on them for about 18 months now and it works great.
If you task is a single threaded/single process, then you'd be better off with a high speed dual/quad core intel machine.
If your application has lots of threads/lots of processes then these machines will likely be great for it.
Best of all, Sun will send you one for 60 days to evaluate, that is what we did before committing to it, ended up getting 2 t2000's and have recently purchased another 4 t1000's.
It hit me last night that our core processes aren't multi-threaded, but the machine in question does have a bunch of system processes that are. In particular, it acts as an NFS server. It sounds like running hundreds of processes will benefit from all those cores, as well.
I'll see if we can get a demo unit to test on first.
Sun has been selling the Niagra machines to be all things to all comers. They do have their place: web services being the best deployment. We have run Oracle on some T2000s and it worked well for highly parallelized operations. But the machines fall flat on single-treaded operations, the performance of which is rather bad. If you have floating point work to do, look elsewhere. Even the newer chips with A FPU per core is inadequate. Also, these machines cannot take a enterprise-class pounding for long and we've had reliability problems. Multi-core techology is more hype than substance. Sandia National Lab's research on it and found that four to eight cores is about the top-end of usefulnes and that a 16 core chip has the same throughput as a dual core chip. So a 16 core chip is a waste of a lot of money. Also, as the number of cores increase, the clock speed muust decrease, because of the thermal wall. Most manufacturers will probably settle on quad-core chips until memory technology improves (you can't keep 16 cores fed with memory and most of the cores are stalled). Finally, given the chaos at Sun, you'd do better to look elsewhere.