large data file in matlab doesn't load/import - matlab

I have been trying to load data file (csv) into matlab 64 bit running on win7(64 bit) but get memory related errors. The file size is about 3 GB, containing date ( dd/mm/yyyy hh:mm:ss) in first column and bid and ask prices in another two columns. The memory command returns the following :
Maximum possible array: 19629 MB (2.058e+010 bytes) *
Memory available for all arrays: 19629 MB (2.058e+010 bytes) *
Memory used by MATLAB: 522 MB (5.475e+008 bytes)
Physical Memory (RAM): 16367 MB (1.716e+010 bytes)
* Limited by System Memory (physical + swap file) available.
Can somebody here please explain if the max possible array size is 19.6 GB then why would matlab throw a memory error while importing a data array that is just about 3GB. Apologies if this is a simple question to the experienced as I have little experience in process/app memory management.
I would greatly appreciate if someone would also suggest solution to being able to load this dataset into matlab workspace.
Thank you.

I am no expert in memory management but from experience I can tell you that you will run into all kinds of problems if you're importing/exporting 3GB text files.
I would either use an external tool to split your data before you read it or look into storing that data in another format that is more suited to large datasets. Personally, I have used hdf5 in the past---this is designed for large sets of data and is also supported by matlab.
In the meantime, these links may help:
Working with a big CSV file in MATLAB
Handling Large Data Sets Efficiently in MATLAB

I've posted before showing how to use memmapfile() to read huge text files in matlab. This technique may help you as well.

Related

Matlab issues with large image files

I am currently facing a new batch of hyperspectral images in Matlab. Usually, I have no problems, but this time images of around 2000x2000x324 (double) have flooded me with "Out of memory" messages for all the image processing and statistical analyses I was a happily doing before.
My pc is relatively good ( Intel Core i7 and a total of available RAM of 15.8 GB but says available 7), so I am surprised.
I was wondering if you have any suggestions, strategy for dealing with such files or I am missing something here.
I have been suggested to switch to python for larger files, but with Matlab I have been having an intimate relationship for a while now.
Thank you very much for your support.
E

Error using zeros Out of memory

When I try running
Adj = zeros(x*y);
I am receiving the following error:
Error using zeros
Out of memory. Type HELP MEMORY for your options.
where x*y=37901. The occupancy of my PC storage is
I know the C drive doesn't have much space but 34.2 GB should be more than enough for creating a 37901*37901 matrix.
When I run the memory command this is what I got:
>> memory
Maximum possible array: 4825 MB (5.059e+09 bytes) *
Memory available for all arrays: 4825 MB (5.059e+09 bytes) *
Memory used by MATLAB: 12369 MB (1.297e+10 bytes)
Physical Memory (RAM): 12218 MB (1.281e+10 bytes)
* Limited by System Memory (physical + swap file) available.
How can I solve this issue? (I am using MATLAB 2017b)
Actually, coding side, variables are normally stored into memory (your computer RAM) rather than into hard disk space. That's what your error complains about... you don't have enough memory to store the variable you want to allocate.
The default numerical variable used by Matlab is double, which is used to represent double precision floating-point values and takes up 8 bytes of memory. Hence, you are trying to allocate:
37901 * 37901 * 8 = 11491886408 bytes
~= 10.7 gigabytes
When you only have something like 11.9 gigabytes of available memory and Matlab is telling you that you can't allocate an array greater than 4.7 gigabytes. As a workaround, I suggest you to take a look at Tall Arrays, which are a Matlab feature tailored around handling very big data flows:
Tall arrays are used to work with out-of-memory data that is backed by
a datastore. Datastores enable you to work with large data sets in
small chunks that individually fit in memory, instead of loading the
entire data set into memory at once. Tall arrays extend this
capability to enable you to work with out-of-memory data using common
functions.
What is a Tall Array?
Since the data is not loaded into memory all at once, tall arrays can be arbitrarily large in the first dimension
(that is, they can have any number of rows). Instead of writing
special code that takes into account the huge size of the data, such
as with techniques like MapReduce, tall arrays let you work with large
data sets in an intuitive manner that is similar to the way you would
work with in-memory MATLAB® arrays. Many core operators and functions
work the same with tall arrays as they do with in-memory arrays.
MATLAB works with small chunks of the data at a time, handling all of
the data chunking and processing in the background, so that common
expressions, such as A+B, work with big data sets.
Benefits of Tall Arrays
Unlike in-memory arrays, tall arrays typically remain unevaluated until you request that the calculations
be performed using the gather function. This deferred evaluation
allows you to work quickly with large data sets. When you eventually
request output using gather, MATLAB combines the queued calculations
where possible and takes the minimum number of passes through the
data. The number of passes through the data greatly affects execution
time, so it is recommended that you request output only when
necessary.

loading large csv files in Matlab

I had csv files of size 6GB and I tried using the import function on Matlab to load them but it failed due to memory issue. Is there a way to reduce the size of the files?
I think the no. of columns are causing the problem. I have a 133076 rows by 2329 columns. I had another file which is of the same no. of rows but only 12 rows and Matlab could handle that. However, once the columns increases, the files got really big.
Ulitmately, if I can read the data column wise so that I can have 2329 column vector of 133076, that will be great.
I am using Matlab 2014a
Numeric data are by default stored by Matlab in double precision format, which takes up 8 bytes per number. Data of size 133076 x 2329 therefore take up 2.3 GiB in memory. Do you have that much free memory? If not, reducing the file size won't help.
If the problem is not that the data themselves don't fit into memory, but is really about the process of reading such a large csv-file, then maybe using the syntax
M = csvread(filename,R1,C1,[R1 C1 R2 C2])
might help, which allows you to only read part of the data at one time. Read the data in chunks and assemble them in a (preallocated!) array.
If you do not have enough memory, another possibility is to read chunkwise and then convert each chunk to single precision before storing it. This reduces memory consumption by a factor of two.
And finally, if you don't process the data all at once, but can implement your algorithm such that it uses only a few rows or columns at a time, that same syntax may help you to avoid having all the data in memory at the same time.

What are the maximum number of columns in the input data in MATLAB

i must import a big data file in matlab , and its size is abute 300 MB.
now i want to know what are the maximum number of columns ,that i can imort to matlab. so divided that file to some small file.
please hellp me
There are no "maximum" number of columns that you can create for a matrix. What's the limiting factor is your RAM (à la knedlsepp), the data type of the matrix (which is also important... a lot of people overlook this), your operating system, and also what version of MATLAB you're using - specifically whether it's 32 or 64 bit.
If you want a more definitive answer, here's a comprehensive chart from MathWorks forums on what you can allocate given your OS version, MATLAB version and the data type of the matrix you want to create:
The link to this post is here: http://www.mathworks.com/matlabcentral/answers/91711-what-is-the-maximum-matrix-size-for-each-platform
Even though the above chart is for MATLAB R2007a, the sizes will most likely not have changed over the evolution of the software.
There are a few caveats with the above figure that you need to take into account:
The above table also takes your workspace size into account. As such, if you have other variables in memory and you are trying to allocate a matrix that tries to reach the limit seen in the charge, you will not be successful in its allocation.
The above table assumes that MATLAB has just been launched with no major processing carried out in a startup.m file.
The above table assumes that there is unlimited system memory, so RAM plus any virtual memory or swap file being available.
The above table's actual limits will be less if there is insufficient system memory available, usually due to the swap file being too small.

Matlab .mat file saving

I have identical code in Matlab, identical data that was analyzed using two different computers. Both are Win 7 64 bit. Both Matlabs are 2014-a version. After the code finishes its run, I save the variables using save command and it outputs .mat file.
Is it possible to have two very different memory sizes for these files? Like one being 170 MB, and the other being 2.4 GB? This is absurd because when I check the variables in matlab they add up to maybe 1.5 GB at most. What can be the reason for this?
Does saving to .mat file compress the variables (still with the regular .mat extension)? I think it does because when I check the individual variables they add up to around 1.5 GB.
So why would one output smaller file size, but the other just so huge?
Mat in recent versions is HDF5, which includes gzip compression. Probably on one pc the default mat format is changed to an old version which does not support compression. Try saving specifying the version, then both PCs should result in the same size.
I found the reason for this based on the following stackoverflow thread: MATLAB: Differences between .mat versions
Apparently one of the computers was using -v7 format which produces much smaller files. - v7.3 just inflates the files significantly. But this is ironical in my opinion since -v7.3 enables saving files larger than 2 GB, which means they will be much much larger when saved in .mat file.
Anyway this link is very useful.
Update:
I implemented the serialization mentioned in the above link, and it increased the file size. In my case the best option will be using -v7 format since it provides the smallest file size, and is also able to save structures and cell arrays that I use a lot.