Read a very large text file in Matlab (~30Gb) - matlab

I have some results in format text include text header. It's about 15-50Gb. I want to import this in Matlab for the treatment. Could you give me some advises what command I should use for this big file?

You mat use the Root software that is available at Cern website
This was developed to work with very large files ( say > 1T)

I solved this problem via textscan function. But I had text files around 5 GB. Computer has 16GB RAM and it wasn't enough and have to use pagefile.sys. Reading time cca 60min.

Related

matlab lose response when use xlsread reading a large spreadsheet

I am trying to use xlsread functioin to read spreadsheets of 6000x2700 (xlsx file).
I have two questions:
First, when I use something like
[num,txt,~]=xlsread(input_file,input_sheet,'A1:CYY6596')
Matlab keeps showing 'busy' and lose response (while I can open it in excel within 30 seconds).
Is there any solution If I don't want to loop through ranges of the xlsx file? In other word, can I just dump spreadsheet of this size into matlab using xlsread?
Alternatively, Maybe I can use loops to read these files range by range, but I cannot identify the last column of each of the spreadsheets unless I read the whole file first. Therefore, If I cannot identify the last column, it is hard to make loops and do my interpretation on the file.
So My second questions is: Is there a way to identify the last column of the spreadsheet without reading the whole spreadsheet?
Thanks.
EDIT:However, if I run a similar code which only reads first 400 columns ('A1:RY6596') of the spreadsheet, such problem doesn't happen.
which version of matlab you are using?
matlab has a problem to load bix excell file.
convert the excell in csv and use M = csvread(filename).
You can try to convert .xlsx into .xls also.
You can Try the tool in
File Exchange

Xlsread returning zero values....?

I am getting zero values while using xlsread command in MATLAB.I am using a real world dataset taken from UCI repository which has got both integer and float values.
[Train,textData,rawData] = `xlsread('C:\Users\pooja\Documents\project\breastcancer.csv');`
I have tried with xls format too..
[Train,textData,rawData] = xlsread('C:\Users\pooja\Documents\project\breastcancer.xls');
Thanx in Advance..!
In the wide world of computers, there are a lot of data formats. You need to remember that data formats are different from each other. Generally software like Matlab allows you to open different types of data formats. Each one of course with its own function.
You can guess that the function xmlread is to read XML files. If you want to read csv files or any other type of file in the world, please (I think this is obvious) do not use xmlread!
Specifically to open csv files matlab has csvread. Please, do not use csv read to open files that are not CSV.....

MATLAB used up all my disk space! How can I get it back?

I left MATLAB running on a simple ode45 + plot, and when I came back I saw that the 5GBs of free space I had on my drive (C:) was no more! MATLAB had stopped due to "no memory".
Can someone please tell me what happened and how I can get my space back???
Thank You.
You can visually inspect hard disk usage and find folders and files which take up a lot of space with a tool such as TreeSize Free.
P.S. You can also try clearing temporary folders either trough built-in disk cleaner or other tools such as CCleaner.
MatLab is one of those apps that have an all world of computing science where you only want to work in a small tiny island of knowledge, the Help folder of it is huge, anyway here's some things you can do to make it slimmer on disk:
Install only the packages you need.
Use JPEGMini to compress the JPEG collection of the huge help folder.
Use Pngyu to compress the huge collection of PNG files to 8 bit depth.
Step 2 and 3 will get you back like a Gigabyte if not more.
Use NTFS compression on the MatLab Folder.
It will get you back another 2 Gigabytes
Both step 2 and 3 must be done with admin privileges, the drag and drop of folder to it must be done with another app with admin privileges also, you can use Explorer++ as Windows File Explorer alternative.

Extract .mat data without matlab - tried scilab unsuccessfully

I've downloaded a data set that I am interested in. However, it is in .mat format and I do not have access to Matlab.
I've done some googling and it says I can open it in SciLab.
I tried a few things, but I haven't found any good tutorials on this.
I did
fd = matfile_open("file.mat")
matfile_listvar(fd)
and that prints out the filename without the extension. I tried
var1 = matfile_varreadnext(fd)
and that just gives me "var1 = "
I don't really know how the data is organized. The repository described the data it contains, but not how it is organized.
So, my question is, what am I doing wrong in extracting/viewing this data? I'm not committed to SciLab, if there is a better tool for this I am open to that.
One options is to use Octave, which can read .mat files and run most Matlab .m files. Octave is open source with binaries available for Linux, Mac, and Windows. Inside of Octave you can load the file using:
load file
See Octave's manual section 14.1.3 Simple File I/O for more details.
In Scilab:
loadmatfile('file.mat');
(Source)
I had this same interest a few years back. I used this question as a guide. It uses Python and SciPy. There are options for NumPy and hd5f as well. Another option is to write your own reader for the .mat format in whatever language you need. Here is the link to the mat file format definition.

editing / splitting / saving data in a text file

I have a text file called playlist.pls which is dynamically created, and in the text file I have thousands of lines that look like this:
File000001=/home/ubu32sc/Documents/octave/pre/wavefn_0001.wav
File000002=/home/ubu32sc/Documents/octave/pre/wavefn_0002.wav
File000003=/home/ubu32sc/Documents/octave/pre/wavefn_0003.wav
File000004=/home/ubu32sc/Documents/octave/pre/wavefn_0004.wav
File000005=/home/ubu32sc/Documents/octave/pre/wavefn_0005.wav
File000006=/home/ubu32sc/Documents/octave/pre/wavefn_0006.wav
File000007=/home/ubu32sc/Documents/octave/pre/wavefn_0007.wav
File000008=/home/ubu32sc/Documents/octave/pre/wavefn_0008.wav
File000009=/home/ubu32sc/Documents/octave/pre/wavefn_0009.wav
File000010=/home/ubu32sc/Documents/octave/pre/wavefn_0010.wav etc...
I need to have the data in the text file split into several different files.
example:
The play1.pls file would contain:
File000001=/home/ubu32sc/Documents/octave/pre/wavefn_0001.wav
File000002=/home/ubu32sc/Documents/octave/pre/wavefn_0002.wav
File000003=/home/ubu32sc/Documents/octave/pre/wavefn_0003.wav
The play2.pls file would contain:
File000004=/home/ubu32sc/Documents/octave/pre/wavefn_0004.wav
File000005=/home/ubu32sc/Documents/octave/pre/wavefn_0005.wav
File000006=/home/ubu32sc/Documents/octave/pre/wavefn_0006.wav
The play3.pls file would contain:
File000007=/home/ubu32sc/Documents/octave/pre/wavefn_0007.wav
File000008=/home/ubu32sc/Documents/octave/pre/wavefn_0008.wav
File000009=/home/ubu32sc/Documents/octave/pre/wavefn_0009.wav
The play4.pls file would contain:
File000010=/home/ubu32sc/Documents/octave/pre/wavefn_0010.wav etc...
What's the best way to go about doing this I was thinking about using octave/matlab to do this but I think this would be over kill and resource intensive to run a for loop on a text file with 10's of thousands of lines. Is grep or perl the proper thing to use and or should I use another type of program? and if so how could I do this with it?
I'm using Ubuntu 32 10.04 6 gig ram
Thanks
As you mentionned it, Matlab / Octave seems to be an overkill if you just want to split a text file into multiple files.
There are a thousand ways to do this (espcially on a unix system) so just pick yours.
One of the possibilities is to use split which goes like this:
split --lines=3 file prefix