Fast csv import - matlab

I wish to import a large number of csv files to MATLAB. I can do this without any difficulty except it takes a lot of time - about 3 seconds per file with the following code. Is there a way to do it faster? Here A is a matrix with 15 rows and 250 columns. There are 150 files.
tic
file_name = [];
for w = scenario_size:-1:1
file_name = sprintf('monthly_population_%d.csv',w) ; % read file name f
A = xlsread(file_name);
pop(:,:,w) = A' ;
end
clear A
toc

You may have improved performance by using readmatrix, instead of xlsread. For example:
A = readmatrix(file_name);
Or, if you are on a Matlab release which doesn't have readmatrix, try readtable:
A = table2array(readtable(file_name));

Related

Matlab interp1 gives last row as NaN

I have a problem similar to here. However, it doesn't seem that there is a resolution.
My problem is as such: I need to import some files, for example, 5. There are 20 columns in each file, but the number of lines are varied. Column 1 is time in terms of crank-angle degrees, and the rest are data.
So my code first imports all of the files, finds the file with the most number of rows, then creates a multidimensional array with that many rows. The timing is in engine cycles so, I would then remove lines from the imported file that go beyond a whole engine cycle. This way, I always have data in terms of X whole engine cycles. Then I would just interpolate the data to the pre-allocated array to have a giant multi-dimensional array for the 5 data files.
However, this seems to always result in the last row of every column of every page being filled with NaNs. Please have a look at the code below. I can't see where I'm doing wrong. Oh, and by the way, as I have been screwed over before, this is NOT homework.
maxlines = 0;
maxcycle = 999;
for i = 1:1
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = filelines(filename); % Import file clean
lines = size(file,1); % Find number of lines of file
if lines > maxlines
maxlines = lines; % If number of lines in this file is the most, save it
end
lastCAD = file(end,1); % Add simstart to shift the start of the cycle to 0 CAD
lastcycle = fix((lastCAD-simstart)./cycle); % Find number of whole engine cycles
if lastcycle < maxcycle
maxcycle = lastcycle; % Find lowest number of whole engine cycles amongst all designs
end
cols = size(file,2); % Find number of columns in files
end
lastcycleCAD = maxcycle.*cycle+simstart; % Define last CAD of whole cycle that can be used for analysis
% Import files
thermo = zeros(maxlines,cols,designs); % Initialize array to proper size
xq = linspace(simstart,lastcycleCAD,maxlines); % Define the CAD degrees
for i = 1:designs
filename = sprintf('C:\\Directory\\%s\\file.out',folder{i});
file = importthermo(filename, 6, inf); % Import the file clean
[~,lastcycleindex] = min(abs(file(:,1)-lastcycleCAD)); % Find index of end of last whole cycle
file = file(1:lastcycleindex,:); % Remove all CAD after that
thermo(:,1,i) = xq;
for j = 2:17
thermo(:,j,i) = interp1(file(:,1),file(:,j),xq);
end
sprintf('file from folder %s imported OK',folder{i})
end
thermo(end,:,:) = []; % Remove NaN row
Thank you very much for your help!
Are you sampling out of the range? if so, you need to tell interp1 that you want extrapolation
interp1(file(:,1),file(:,j),xq,'linear','extrap');

Fortran and Matlab: Change the data format

I have saved my job in Fortran with the following format
OPEN(50,file ='h.dat',form='formatted')
WRITE(50,'(101F12.6)')(u(k),k=1,nx)
CLOSE(50)
Since nx = 201, the data is saved in 2 lines. The first line has 101 columns, the second one has 100 columns. Therefore, MATLAB can not read h.dat with the following message “... must be the same as previous lines”.
Would it be possible to change this 2-line data to be 1-line data (201 columns) by using Matlab?
hh = importdata('h.dat');
size(hh) % ans = 2 101
nx = 201;
p = 0;
for i = 1:2;
for j = 1:101;
p = p+1;
ha(p) = hh(i,j);
end
end
ha = ha(1:nx);
save haa.dat ha -ascii
But, I think, it is much easier to use Fortran to solve it...

Performing logical OR between multiple CSVs with 32 bit hex values using MATLAB

I am trying to read multiple (50+) CSV files within the same folder using MATLAB. These CSVs contain 3 32 bit hex values and the format of the data is the same for all files. Each CSV contains the data within 2 rows and 3 columns with no headers. For e.g.
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
After ORing the 2 rows from all files, the final 2 rows of 3 32 bit hex values need to be written out to a CSV.
Now, before jumping in the deep end trying to process multiple files, I have started by just trying to OR Row 1 with Row 2 of the same file. So, 00000800| 000008FF , D404002C | D404002C.. I have been able to convert them to binary and do a logical OR between the 3 values however currently have the following issues:
1) If the MSB of the hex value starts with 3 or 4 (binary 0011 or 0100) then the leading 0's are missed or if the second hex value happens to be 800 then the leading 00000's are missed.
2) I cannot convert the integer cell array back to hex
I have seen many posts about just reading CSVs using MATLAB or separating the data and etc on stackoverflow and matlabcentral however not been able to interpret any of them to sort my issue. Any help would be much appreciated.Below is what I have so far:
fid = fopen('File1.csv');
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
contents = c{1};
row1 = strsplit(contents{1},',','CollapseDelimiters',0);
row2 = strsplit(contents{2},',','CollapseDelimiters',0);
x = 1;
y = 1;
while x <= length(row1)
column1{x} = hex2dec(row1(x));
column2{x} = hex2dec(row2(x));
x = x + 1;
end
while y <= length(column1)
bin1{y} = zeros(1,32);
bin2{y} = zeros(1,32);
bin1{y} = dec2bin(column1{y});
bin2{y} = dec2bin(column2{y});
result{y} = bitor(bitget(uint8(bin1{y}),1),bitget(uint8(bin2{y}),1));
y = y+ 1 ;
end
Also, eventually need to be able to do this process with multiple CSVs so I have attached link to File1.csv and File2.csv if someone wants to try to OR row 1 of File1 with row 2 of File2.csv and so on.
CSV Files
Apologies if I have missed anything, Please leave a comment and I'll try to explain it further.
Thanks!
EDIT: Hope the image below explains what I am trying to do better.
You can try the following approach:
use the dir function to get the list of files to be processed
create a loop to go through the files to be processed. In the loop
read the input files
convert the hexadecimal values read from the files into a matrix of characters using the char function
convert the data stored in the char matrinx from hex to dec and then to uint32 using the functions hex2dec and uint32
perform the or using the bitor function
go to next iteration
at the end of the loop, write the output
The above described approach has been implemented in the folowing code:
% Get the list of CSV files
hex_files=dir('O_File*.csv');
% Open the outpur file
fp_out=fopen('new_hex_file.csv','wt');
% Loop over the CSV files
for i=1:length(hex_files)
% Read the i-th CSV file
fid = fopen(hex_files(i).name);
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
% Get the 2 rows
contents = c{1};
row_1=char(strsplit(contents{1},',','CollapseDelimiters',0));
row_2=char(strsplit(contents{2},',','CollapseDelimiters',0));
% Convert from hex to uint32
row_d_1=uint32(hex2dec(row_1));
row_d_2=uint32(hex2dec(row_2));
if(i == 1)
% Store the row of the first file and continue
tmp_var_1=row_d_1;
tmp_var_2=row_d_2;
continue
else
% OR the two rows
tmp_var_1=bitor(tmp_var_1,row_d_1);
tmp_var_2=bitor(tmp_var_2,row_d_2);
end
end
% Write the OR values into the new file
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_1);
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_2);
% Close the output file
fclose(fp_out);
The following input files have been used to test it:
File1.csv
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
File2.csv
000008FF,D404DD2C,49475115
11100800,D411EC2C,3ACD1266
File3.csv
123456FF,ABCDEF2C,369ABC15
01012369,00110033,36936966
The output is:
12345EFF,FFCDFF2C,7FDFFF9F
11112BFF,D415EC3F,7EDF7BEF
Hope this helps.
Qapla'

How to sparsely read a large file in Matlab?

I ran a simulation which wrote a huge file to disk. The file is a big matrix v. I can't read it all, but I really only need a portion of the matrix, say, 1:100 of the columns and rows. I'd like to do something like
vtag = dlmread('v',1:100:end, 1:100:end);
Of course, that doesn't work. I know I should have only done the following when writing to the file
dlmwrite('vtag',v(1:100:end, 1:100:end));
But I did not, and running everything again would take two more days.
Thanks
Amir
Thankfully the dlmread function supports specifying a range to read as the third input. So if you wan to read all N columns for the first 100 rows, you can specify that with the following command
startRow = 1;
startColumn = 1;
endRow = 100;
endColumn = N;
rng = [startRow, startColumn, endRow, endColumn] - 1;
vtag = dlmread(filename, ',', rng);
EDIT Based on your clarification
Since you don't want 1:100 rows but rather 1:100:end rows, the following approach should work better for you.
You can use textscan to read chunks of data at a time. You can read a "good" row and then read in the next "chunk" of data to ignore (discarding it in the process), and continue until you reach the end of the file.
The code below is a slight modification of that idea, except it utilizes the HeaderLines input to textscan which instructs the function how many lines to ignore before reading in the data. The first time through the loop, no lines will be skipped, however all other times through the loop, rows2skip lines will be skipped. This allows us to "jump" through the file very rapidly without calling any additional file opertions.
startRow = 1;
rows2skip = 99;
columns = 3000;
fid = fopen(filename, 'rb');
% For now, we'll just assume you're reading in floating-point numbers
format = repmat('%f ', [1 columns]);
count = 1;
lines2discard = startRow - 1;
while ~feof(fid)
% Use "HeaderLines" to skip data before reading in data we care about
row = textscan(fid, format, 1, 'Delimiter', ',', 'HeaderLines', lines2discard);
data{count} = [row{:}];
% After the first time through, set the "HeaderLines" (i.e. lines to ignore)
% to be the # we want to skip between lines (much faster than alternatives!)
lines2discard = rows2skip;
count = count + 1;
end
fclose(fid);
data = cat(1, data{:});
You may need to adjust your format specifier for your own type of input.

Problem with loop MATLAB

no time scores
1 10 123
2 11 22
3 12 22
4 50 55
5 60 22
6 70 66
. . .
. . .
n n n
Above a the content of my txt file (thousand of lines).
1st column - number of samples
2nd column - time (from beginning to end ->accumulated)
3rd column - scores
I wanted to create a new file which will be the total of every three sample of the scores divided by the time difference of the same sample.
e.g. (123+22+22)/ (12-10) = 167/2 = 83.5
(55+22+66)/(70-50) = 143/20 = 7.15
new txt file
83.5
7.15
.
.
.
n
so far I have this code:
fid=fopen('data.txt')
data = textscan(fid,'%*d %d %d')
time = (data{1})
score= (data{2})
for sample=1:length(score)
..... // I'm stucked here ..
end
....
If you are feeling adventurous, here's a vectorized one-line solution using ACCUMARRAY (assuming you already read the file in a matrix variable data like the others have shown):
NUM = 3;
result = accumarray(reshape(repmat(1:size(data,1)/NUM,NUM,1),[],1),data(:,3)) ...
./ (data(NUM:NUM:end,2)-data(1:NUM:end,2))
Note that here the number of samples NUM=3 is a parameter and can be substituted by any other value.
Also, reading your comment above, if the number of samples is not a multiple of this number (3), then simply discard the remaining samples by doing this beforehand:
data = data(1:fix(size(data,1)/NUM)*NUM,:);
Im sorry, here's a much simpler one :P
result = sum(reshape(data(:,3), NUM, []))' ./ (data(NUM:NUM:end,2)-data(1:NUM:end,2));
%# Easier to load with importdata
data = importdata('data.txt',' ',1);
%# Get the number of rows
n = size(data,1);
%# Column IDs
time = 2;score = 3;
%# The interval size (3 in your example)
interval = 3;
%# Pre-allocate space
new_data = zeros(numel(interval:interval:n),1);
%# For each new element in the new data
index = 1;
%# This will ignore elements past the closest (floor) multiple of 3 as requested
for i = interval:interval:n
%# First and last elements in a batch
a = i-interval+1;
b = i;
%# Compute the new data
new_data(index) = sum( data(a:b,score) )/(data(b,time)-data(a,time));
%# Increment
index = index+1;
end
For what it's worth, here is how you would go about to do that in Python. It is probably adaptable to Matlab.
import numpy
no, time, scores = numpy.loadtxt('data', skiprows=1).T
# here I assume that your n is a multiple of 3! otherwise you have to adjust
sums = scores[::3]+scores[1::3]+scores[2::3]
dt = time[2::3]-time[::3]
result = sums/dt
I suggest you use the importdata() function to get your data into your variable called data. Something like this:
data = importdata('data.txt',' ', 1)
replace ' ' by the delimiter your file uses, the 1 specifies that Matlab should ignore 1 header line. Then, to compute your results, try this statement:
(data(1:3:end,3)+data(2:3:end,3)+data(3:3:end,3))./(data(3:3:end,2)-data(1:3:end,2))
This worked on your sample data, should work on the real data you have. If you figure it out yourself you'll learn some useful Matlab.
Then use save() to write the results back to a file.
PS If you find yourself writing loops in Matlab you are probably doing something wrong.