Matlab, parallel computing and Amazon EC2 - matlab

I'm completely new to Amazon EC2. I have read the website documentation but I'm still confused.
At the moment I'm estimating my model using Matlab-r2014b. In my Matlab code I use parallel computing ("parfor") on the local cluster. I run my model through the HPC of my University which allows me to access 1 node with 40GB of memory and 12 cores.
My questions are the following:
(1) Does Amazon EC2 offer a machine with more than 40GB of memory and 12 cores where I can run my Matlab code?
(2) Prices and instructions?

Short answer, yes, the c3.8xlarge instance type is available with 60GB of ram and 32 cores.
Per hour pricing is available here: https://aws.amazon.com/ec2/pricing/ along with all the other sizes and options.

Related

Cockroach DB is slower in 64GB RAM cluster as compared to 32GB RAM cluster

I have installed CockroachDB in below clusters
a. 3 nodes cluster in which every node has 32GB RAM
b. 3 nodes cluster in which every node has 64GB RAM
I am testing the performance by running the same queries(Select, join, insert, delete, aggregate functions, nested queries, concurrent queries) in both the clusters.
After testing for 3 times, I have found that 64GB cluster is slower than 32 GB cluster.
I was expecting 64GB RAM cluster would be faster than 32GB RAM cluster.
I am not able to find the suitable answers for the same.
Any answers or insights would be greatly appreciated.
Thanks in Advance!
Thanks for the post! Without knowing the exact machine specs, configuration settings, and workload it would be hard to figure out what could be happening here :)
We have a Slack channel that might be a bit easier for back and forth on performance optimization, or a support ticket could be opened with a best effort SLA. If you have any more information on the run configuration that would be great too!
https://www.cockroachlabs.com/join-community/
https://support.cockroachlabs.com/hc/en-us

Cassandra and MongoDB minimum system requirements for Windows 10 Pro

RAM- 4GB,
PROCESSOR-i3 5010ucpu #2.10 GHz
64 bit OS
can Cassandra and MongoDB be installed in such a laptop? Will it run successfully?
The hardware configuration proposed does not meet the minimum requirements. For Cassandra, the documentation requests a minimum of 8GB of RAM and at least 2 cores.
MongoDB's documentation also states that it will need at least 2 real cores or one multi-core physical CPU. With 4GB in RAM, the WiredTiger will allocate 1.5GB for the cache. Please also note that MongoDB will require changes in BIOS to allow memory interleaving to enable Non-Uniform Access Memory, a.k.a. NUMA, such changes will impact the performance of the laptop for other processes.
Will it run successfully?
This will depend on the workload expected to be executed; there are documented examples where Cassandra was installed on a Raspberry Pi array, which since the design it was expected to have slow performance and have a limited amount of data that can be held in the cluster.
If you are looking to have a small sandbox to start using these databases there are other options, MongoDB has a service named Atlas, with a model of a database as a service, it offers a free tier for a 3-node replica and up to 512Mb of storage. For Cassandra there are similar options, AWS offers in the free tier a small cluster of their Managed Cassandra Service (MCS), Datastax is also planning to offer similar services with Constellation

MongoDB Response Slower on Amazon EC2 Server Than Localhost

I'm loading the same amount of data (~100kb) on both my local server and a test Amazon EC2 server, but the response is 2x slower on EC2. Both are running Apache 2 and MongoDB on the same machine. On my local server, the response is about 209ms versus approximately 455ms on EC2.
I've setup a simple query and AJAX call that grabs point data to display on the map based on the current viewport of the device.
How can I debug this issue? How can I make it as faster as my local server? I even tried experimenting with different types of instances to make sure the specs are the same, but no luck. I also realize it could be because of network latency.
Local computer specs:
Intel Core i5 # 3.30GHz
8GB RAM
64-bit Windows 8
Amazon EC2 specs (m4.large):
2.4 GHz Intel Xeon Haswell (2 vCPUs)
8GB RAM
Amazon Linux
A remote query to EC2 is unlikely to return the result of the AJAX query as fast as your local server because it has network latency, while your local server does not. Measure the time in your AJAX handler from the start of the query to the point where it is ready to return data to get a meaningful baseline for comparison.
MongoDB is very sensitive to data being in RAM vs on disk. Depending on how you configured your EC2 instance, and on your local hardware, chances are pretty good that your local hardware is faster. EC2 instances can be configured to use SSD storage, and you can configure a guaranteed IOPS figure.
Is the 100KB the size of the result set, or the amount of data needed to form the result set? If you process 4GB of data down to get a 100KB result set, there's a good chance that disk IO is involved. If the amount of data you need to pull is small, repeat the test a few times to ensure data is entirely in RAM.
Finally, if both local and EC2 are pulling data from RAM, there's a good chance that your local CPU core is just faster than the EC2 CPU core, and that your RAM access is faster as well. EC2 is designed to provide low-cost commodity hardware. Developer setups are often much faster.
If you cannot account for the speed differences given the factors above, update your question with the time measurements that exclude network latency and provide more detailed specifications about your hardware. Update the question to indicate whether the data you are retrieving from MongoDB should be entirely in RAM, given it's size and the amount of RAM on your instance.

Connecting laptop(s)/desktop(s) to form a MATLAB computing cluster?

I have experience running parallel jobs on a remote cluster, and parallel (parfor) jobs on a single local machine, but never tried making a cluster of my own. I have access couple of laptops/desktops/servers (root access on all except one server), and was wondering if I could connect them all (or some) to form a local cluster (will have about 30 cores total).
Once you move beyond working with one machine, you move license types from a parallel computing toolbox to a Distributed Computing Server license. The licenses are available in clusters from 8 workers and up. List price on a 8 worker cluster is $6K, 32 workers are $21K. You can get more information on the Mathworks product page. Also note that submitting jobs to the workers requires the Parallel Computing Toolbox.
Once you have the worker licenses the only supported way to distribute jobs to the workers is through a scheduler. The server licenses come with a basic Mathworks scheduler that does have some limitations, but is ideal for single users or small groups. Beyond that you would need to go with one of the higher end schedulers such as LSF. A full list of supported schedulers is on the product page. Moving from a PCT setup on a single machine to a distributed setup can be fairly involved.
Are you prepared to pay the license cost for this? You can use local clusters (up to 8) using 1 copy of the parallel computing toolbox license. But to use distributed clusters, you need a distributed computing toolbox for each "node" (processor core) on the cluster. I'm not familiar with how to set this up. I know that I have access to a few of these clusters, and I also use local clusters extensively. We opted to not create our own distributed cluster for this reason. We also have data that shows that distributed clusters were slow for our particular tasks (a lot of file io was happening in our case).
I know this doesn't answer your question, just a few things to think about.

How to setup matlabpool for multiple processors?

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things.
This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM.
On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng:
matlabpool open 2
On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running:
matlabpool open 8
I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance!
So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors.
My question is, how do I get matlab to work with those 8 processors?
I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem.
Any help appreciated!
Note: the point is not EC2, I am remoting into it and running matlab on it as if it was any other machine. The point is that I can't get matlab to see the 8 processors!
MATLAB isn't seeing all 8 cores. Set it manually. Parallel menu -> Manage Configurations. Right-click on the "local" line. In the scheduler tab, set the "Number of workers available to scheduler" to 8.
Original answer was a question getting more detail:
Are you trying to use MDCS on EC2 (and MATLAB's user interface on your PC), or are you trying to run MATLAB's user interface and PCT on EC2 (via ssh or vnc or the like)?
This post is to add information in response to a part of original poster's question
[OP] I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not)...
The paper mentioned above is no longer available
In its place MathWorks offers MATLAB users a way to set up and distribute computations on a cluster running MATLAB Distributed Computing Server (MDCS) on Amazon EC2. More information is available here: http://www.mathworks.com/ec2