Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
What is the computational bottleneck algorithm for medical imaging applications? We are trying to figure out if there is a benefit to run these algorithms on regular cloud server instances or GPU accelerated server instances.
Unless the software has been specifically designed with GPU processing power in mind, GPU accelerated instances will be about the same performance as regular commodity server instances, only at a higher price.
I'm willing to gamble and say that the bottleneck of any algorithm, medical or not, imaging or not is the rate at which you can throw data at the CPU, and the number of cores, and the clock rate.
Get some fast CPUs, Insanely fast RAM, blindingly fast striped/mirrored storage, and do it that way.
I suspect that you'll probably find that running on "the cloud" is actually counter-intuitive, or at least counterproductive, as many cloud service providers don't tune their storage backends to cater for high performance computing, but more to providing a little bit of IO to the masses.
I think you'd be better off with owned dedicated hardware, that way, you can spend more time and money in efficiently tuning the hardware stack to match your software stack. Any cloud service provider (including Amazon) will give you some trade offs and compromises.
Oh, and don't forget about not putting all your eggs in one basket. What happens when Amazon goes offline, and nobody can examine any X-Rays, or the poor schmuck who put a heart monitoring application on Amazon Cloud instances, and Amazon went offline in a massive outage.
Aside from the compromises of cloud hosting, the problems of being redundant and resilient to provider outages, not putting critical infrastructure on the cloud, there's other questions surrounding the architecture of your application itself.. Will it scale linearly?
I bet it won't.
By benching a GPU-like implementation against Cloud Server instances, you can see huge FPS differences [1, 2] for operations on large (e.g., CR) images. However, on the other hand, the GPU can be occupied with a lot of memory and therefore delaying and giving continuously dropouts. Therefore, a Cloud Server solution could be more stable with not as many dropouts and a smoother feeling but with lower FPS.
[1] Zhang, Lequan, et al. "A high-frequency, high frame rate duplex ultrasound linear array imaging system for small animal imaging." IEEE transactions on ultrasonics, ferroelectrics, and frequency control 57.7 (2010).
[2] Miguez, D., et al. "A technical note on variable inter-frame interval as a cause of non-physiological experimental artefacts in ultrasound." Royal Society open science 4.5 (2017): 170245.
Related
I model on an ancient PC and recently got some lab funds for a new modeling computer. The choice of processor confounds me. For optimal AnyLogic simulation modeling, should I focus on maxing out the single-core speed or max the number of processor cores? Also, would a high-end graphics card help? I have heard from my engineering colleagues that for certain modeling tools that they do help with the work load. Any advice helps. Thanks.
This is what AnyLogic answered when I asked for the perfect computer to buy:
The recommended platform for AnyLogic is a powerful PC/laptop running
64-bit operating system (Windows preferable), plus CPU with multiple
cores like i7 and at least 8 Gb of RAM.
In general, faster CPU (3GHz or more recommended) means faster single
run execution. More cores means faster execution of the experiments
running the model multiple times in parallel (optimization, parameter
variation, monte carlo, etc.). Also, pedestrians and transporters
benefit from many cores (even single run, since the algorithm causing
movement of pedestrians and transporters uses all available cores).
For the time being, AnyLogic doesn't support GPU processing. RAM is
crucial when you have a lot of agents and many parallel runs (e.g. if
single run takes 1GB, then 8 parallel runs will take 8 Gb). For
working with GIS map, it may be needed to have a good connection to
the Internet. For example, if model requests a lot of routes from
online route provider.
On average, a middle-end PC/laptop in sufficient for most of the
models, high-end PC or server/instance will be useful in case of
really heavy models.
Just to add to Felipe's reply: graphic card is completely irrelevant, AnyLogic does not support outsourcing computations to their tensor cores.
Focus on decent processor speed and 8-12 cores as well as at least 16 GB of RAM and (crucial!!) an SSD harddrive. Good to go :)
Oh, and you may want to use Windows. Linux and Mac OS seem to feature more problems/bugs in AnyLogic than Windows
I am practicing System Design concepts and I am not clear what configuration (cpu, memory, disk storage) to pick for an application instance? Also, how many instances are needed (assuming you are running your application on Kubernetes cluster)
For Back of the envelope calculation ,I saw examples of calculating tps for read and write calls, calculate bandwidth needs, database storage needs etc. but I have not seen how to determine cpu, memory needs and how many instances are enough. Is there a procedure that guides to solve this problem?
My hunch says that we pick small to medium sized server instance (if we use cloud provider like AWS) and run stress tests for calculated TPS and see CPU and memory usage and see if we need to increase or decrease server configuration based on results?
I would greatly appreciate any inputs you may have.
I am not clear what configuration (cpu, memory, disk storage) to pick for an application instance? Also, how many instances are needed (assuming you are running your application on Kubernetes cluster)
This is mostly a question about economics. If resources was very cheap, you could use a lot of them - but unfortunately, they have an economic cost.
Scale out horizontal or scale up vertical
The first fundamental question to ask, should you scale up your app vertically (e.g. to bigger instances) or should you scale out your app horizontally.
The most important thing here is that scaling out horizontally is much easier. But wether you can scale out horizontally of if you have to scale up vertically depends on your app. If your app is a stateless webserver, it typically is very easy to scale out, but if you have a stateful cache or database, scale up vertically might be your only short term option. Try to design so that you can scale out horizontally since that is much easier.
Accurate size - use observability
To find your accurate size, use observability and investigate your bottlenecks and adjust relatively to that.
E.g. if you use too little memory, your app will be terminated, or if you use too little CPU, your response time will be slow. Just start somewhere and adjust.
In addition to Jonas's answer:
You have two approaches (which are not mutually exclusive):
Estimate your needs based on expected load, etc.
Adjust you needs based on what you observe in production.
Regarding the first approach:
Have you done any analysis into what your expected load is? E.g. how many users (unique sessions), how many requests on average per hour (page views, API calls, etc), potential peaks in activity leading to increased load, etc.
Have you done any benchmarking?
Have you looked at your system and what it does, and worked out if it has any specific resource (CPU, memory, disk, etc) needs?
Estimating resources ahead of time requires some knowledge (or informed guesses) regarding what the load will be, as per the 3 points above. Having an idea of what the daily or hourly request average is isn't a bad place to start.
Also make sure you aware if any potential spikes that might catch you out (end of month for financial systems/services). Whether or not these are significant enough that is worth worrying about is another thing. A friend of mine was working on a ticketing system once, and they had massive traffic spikes for major events that did warrant serious scaling-out and back... but your average system probably won't need to be that extreme.
CPU is probably only worth "worrying" about if you have anything that does any above average processing - this should be obvious through benchmarking or if you/your team has good knowledge of your code.
Disk usage can be calculated - e.g.
If on average a user generates 1Mb of data in a session (not including system logs), and you get 100 sessions a day then that's 100Mb a day, 500Mb a working week, 200Mb a month, etc.
If a user profile has on average 200Kb of data and 300Kb of storage space (images) then you can calculate that.
You can also do this for records, especially for records that you know are "large" (e.g. >25mb) or where there will be lots of them (e.g. millions).
You can also start to forecast growth over time if you allow a growth rate (e.g. number of users and their sessions, and the amount of data generated). A simple way to do that is to have a spreadsheet with some simple formulas that take various inputs like number of users, average requests per user, disk space per user, etc. You can then do what-if modelling by playing with the inputs.
In terms of the second approach - as Jonas says, observe and adjust. Make sure you know how to do that, and that your solution provides the data you need. This might be using metrics provided by your cloud-provider (if applicable) or instrumentation / reporting you have custom built into you solution.
Scaling-Up is probably more relevant in scenarios where you have a central point/resource that cannot be scaled-out, like a central database.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What is the Von Neuman bottleneck and how does functional programming reduce its effect? Can someone explain in a simple way through a practical and comprehensive example that shows, for instance, the advantage of using Scala over Java, if there is any?
More importantly, why is avoiding imperative control-structures and preferring functions so significant to improving performance? Ideally, an actual coding example that explains how a problem solved with a function and without one is affected by the Von Neuman bottleneck would be very helpful.
Using Scala will not necessarily fix your performance problems, even if you use functional programming.
More importantly, there are many causes of poor performance, and you don't know the right solution without profiling.
The von Neumann Bottleneck has to do with the fact that, in a von Neumann architecture, the CPU and memory are separate and therefore the CPU often has to wait for memory. Modern CPUs solve this by caching memory. This isn't a perfect fix, since it requires the CPU to guess correctly about which memory it needs to cache. However, high-performance code makes it easy for the CPU to guess correctly by structuring data efficiency and iterating over data linearly (i.e. good data locality).
Scala can simplify parallel programming, which is probably what you are looking for. This is not directly related to the von Neumann Bottleneck.
Even so, Scala is not automatically the answer if you want to do parallel programming. There are several reasons for this.
Java is also capable of parallel programming, and has many types of parallel collections for that purpose.
Java 8 Streams are Java's answer to Scala's parallel collections. They can be used for functional programming.
Parallel programming is not guaranteed to improve performance, and can make a program slower on small data sets, due to setup costs.
There is one case where you are correct that Scala overcomes the von Neumann Bottleneck, and that is with big data. When the data won't fit easily on a single machine, you can store the data on many machines, such as a Hadoop cluster. Hadoop's distributed filesystem is designed to keep data and CPUs close together to avoid network traffic. The easiest way to program for Hadoop is currently with Apache Spark in Scala. Here are some Spark examples; as of Spark 2.x, the Scala examples are much simpler than the Java examples.
On Digitalocean I came up with this message when I want to add swap:
Although swap is generally recommended for systems utilizing traditional spinning hard drives, using swap with SSDs can cause issues with hardware degradation over time. Due to this consideration, we do not recommend enabling swap on DigitalOcean or any other provider that utilizes SSD storage. Doing so can impact the reliability of the underlying hardware for you and your neighbors. This guide is provided as reference for users who may have spinning disk systems elsewhere.
If you need to improve the performance of your server on DigitalOcean, we recommend upgrading your Droplet. This will lead to better results in general and will decrease the likelihood of contributing to hardware issues that can affect your service.
Why is that? I thought it was necessary for creating a stable server (not running into memory issues)
I believe that here's your answer.
Early SSDs had a reputation for failing after fewer writes than HDDs. If the swap was used often, then the SSD may fail sooner. This might be why you heard it could be bad to use an SSD for swap.
Modern SSDs don't have this issue, and they should not fail any faster than a comparable HDD. Placing swap on an SSD will result in better performance than placing it on an HDD due to its faster speeds.
I believe this is referring to the fact that SSDs have a relatively limited lifetime measured in number of times data is written in each memory location. Although such number has gotten big enough that using SSD as storage drives should not be a concern anymore, Swap memory, as a backup for ram memory, can potentially be written on pretty frequently, thus reducing the overall life of the SSD.
SSD Endurance is measured in so called DWPD units. DWPD stands for Drive full Writes Per Day. For Mobile, Client and Enterprise Storage Market segments DWPD requirements are very different. SSD Vendors usually state warranty as, for example, 0.8 DWPD / 3 years or 3.0 DWPD / 5 years. First example means that writing 80% of Drive Capacity every single day will result into 3 years life-time. Technically you can kill your 480GB Drive (let's say with 1 DWPD / 3 years warranty) within 12 days if to perform non-stop write access at the speed of 500 MB/s.
SSDs show much higher throughput on the one side if to compare with HDDs, but at the same time quite low endurance level. Partially it is due to the media physical structure and mapping. For example, when writing 1GB of user data to the HDD drive - internally physical media will receive around 10% more data (meta data, error protection data, etc.). Ratio between Host Data Amount and Internal Data Amount is called Write Amplification Factor (WAF). In comparison SSD may need to write 4 times more data than received from Host. Pure Random access is the worst scenario, when writing 1GB of Host Data will result into writing 4GB of data to the Internal Flash Media. If to perform only sequential write access WAF for SSDs will be close to 1.0, like for HDDs.
Enabling System swap and its intensive usage (probably due to DRAM shortage) will generate more Random access to the SSD. Endurance will degrade quicker if to compare with disable swap. Unless you are running Enterprise System with non-stop IO traffic to the SSD, I would not expect Swap enablement to affect SSD endurance much. You can always monitor SSD SMART Health parameter called - SSD Life Left. How it is changing in dynamic with/without swap enabled will help to make a decision.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
There is another question with this same title, but the question is asked differently than what's troubling me, and the answer is not sufficient.
The most prominent analogies I hear to explain bandwidth are the highway example, and the pipe example. In the highway example, bandwidth is the amount of cars that can drive on the highway in a given amount of time, and in the pipe its an amount of water that can flow through.
My question is - by measuring by cars per second, or liters per second, does that mean that a longer highway, pipe or copper wire has a higher bandwidth than a shorter one? That seems strange to me.
Wouldn't it make more sense to give the highway bandwidth as the amount of lanes it has - irrespective of a unit of time? It just makes more sense to me and is simpler to say that the pipe is "1 foot in diameter" rather than "it carries 100 litres per second".
Why do we measure bandwidth in bits per second and not just in bits?
"My question is - by measuring by cars per second, or liters per second, does that mean that a longer highway, pipe or copper wire has a higher bandwidth than a shorter one?"
No!
Bandwidth is not about how many cars can fit on the road. It's about how many cars can pass a point on the road during a certain time. How many cars per second can pass under a bridge, for example.
No, it wouldn't. You quote a highway in terms of lanes, because it's more understandable, and a reasonable approximation to assume 4 lanes = 4x as much traffic. But even then, you might have a traffic jam, and then 4 lanes is 'transmitting' fewer cars per minute than it would otherwise.
With a hose pipe, the width of the pipe is the speed of transmission if you assume the same water pressure.
These assumptions don't apply to communications - when I'm transmitting 'a bit' nothing physical is moving *. A 'bit' is the smallest piece that 'information' can be broken down into, and in order to transmit it, something needs to change.
If I turn on my torch and shine it at you, I've sent one 'message' (my torch is on). To send you anything more detailed, I would need to turn it off and on again - morse code is an example of doing this. The pattern of switching it off and on gives you some letters. How fast I can switch it off and on again, is how fast I can send a message.
So it is with bandwidth. I need to change things to communicate. If I can change things faster, I can communicate faster.
"bits" would be a measure of the number of torches I own. Bits per second is how fast I can flick them on and off to send a message.
* Electrons and photons do move, as does air to carry sound. But the signal isn't the thing moving - I don't have to move an atom of air from my mouth to your ear to 'talk' to you, the wave propagates through the medium.