How can you split import text to different variables - matlab

I have text which is (could be 100K lines like this)
time,10 a b,20 c d
(time = HH:mm:ss.ffff with milliseconds)
I want to import it into 2 arrays
time,a,b
time,c,d
whats the shortest way? I need to store the script/code for future use too...

MATLAB has several text input options. While regexp-based options (like textscan) are often effective, it sounds like you have a fixed format that might be better handled by manually reading the lines sequentially. I've found that performance with this method is more consistent than with textscan or import. If a, b, c, d are not fixed width, you'll need to do something else. In that case, I'd just use the import wizard to set up the input, and then save the import code and modify as needed to automate it.
array1 = NaN(<numberoflines>,6);
array2 = NaN(<numberoflines>,6);
fname = 'path_to_some_file';
fid = fopen(fname);
stop = 0;
jj = 1;
while ~stop
cline = fgetl(fid);
if ischar(cline)
HH = str2double(cline(1:2));
MM = str2double(cline(...));
...
array1(jj,:) = [HH MM SS MS a b];
array2(jj,:) = ...;
else
disp('End of file')
stop = 1;
end
end
fclose(fid)

Try using regexp. A very powerful tool for parsing strings in matlab.

Related

How to sparsely read a large file in Matlab?

I ran a simulation which wrote a huge file to disk. The file is a big matrix v. I can't read it all, but I really only need a portion of the matrix, say, 1:100 of the columns and rows. I'd like to do something like
vtag = dlmread('v',1:100:end, 1:100:end);
Of course, that doesn't work. I know I should have only done the following when writing to the file
dlmwrite('vtag',v(1:100:end, 1:100:end));
But I did not, and running everything again would take two more days.
Thanks
Amir
Thankfully the dlmread function supports specifying a range to read as the third input. So if you wan to read all N columns for the first 100 rows, you can specify that with the following command
startRow = 1;
startColumn = 1;
endRow = 100;
endColumn = N;
rng = [startRow, startColumn, endRow, endColumn] - 1;
vtag = dlmread(filename, ',', rng);
EDIT Based on your clarification
Since you don't want 1:100 rows but rather 1:100:end rows, the following approach should work better for you.
You can use textscan to read chunks of data at a time. You can read a "good" row and then read in the next "chunk" of data to ignore (discarding it in the process), and continue until you reach the end of the file.
The code below is a slight modification of that idea, except it utilizes the HeaderLines input to textscan which instructs the function how many lines to ignore before reading in the data. The first time through the loop, no lines will be skipped, however all other times through the loop, rows2skip lines will be skipped. This allows us to "jump" through the file very rapidly without calling any additional file opertions.
startRow = 1;
rows2skip = 99;
columns = 3000;
fid = fopen(filename, 'rb');
% For now, we'll just assume you're reading in floating-point numbers
format = repmat('%f ', [1 columns]);
count = 1;
lines2discard = startRow - 1;
while ~feof(fid)
% Use "HeaderLines" to skip data before reading in data we care about
row = textscan(fid, format, 1, 'Delimiter', ',', 'HeaderLines', lines2discard);
data{count} = [row{:}];
% After the first time through, set the "HeaderLines" (i.e. lines to ignore)
% to be the # we want to skip between lines (much faster than alternatives!)
lines2discard = rows2skip;
count = count + 1;
end
fclose(fid);
data = cat(1, data{:});
You may need to adjust your format specifier for your own type of input.

Looping a process, outputting numerically labelled variables each time

I have about 50 different arrays and I want to perform the following operation on all of them:
data1(isnan(data1)) = 0;
coldata1 = nonzeros(data1);
avgdata1 = mean(coldata1);
and so on for data2, data3 etc... the goal being to turn data1 into a vector without NaNs and then take a mean, saving the vector and the mean into coldata1 and avgdata1.
I'm looking for a way to automate this for all 50, rather than copy it 50 times and change the numbers... any ideas? I've been playing with eval but no luck so far. Also tried:
for y = 1:50
data(y)(isnan(data(y))) = 0;
coldata(y) = nonzeros(data(y));
avgdata(y) = mean(coldata(y));
end
You can do it with eval but really should not. Rather use a cell array as suggested here: Create variables with names from strings
i.e.
for y = 1:50
data{y}(isnan(data{y})) = 0;
coldata{y} = nonzeros(data{y});
avgdata{y} = mean(coldata{y});
end
Also read How can I create variables A1, A2,...,A10 in a loop? for alternative options.

Read (m x n) comma separated lines of a .txt file

Hello I have these kind of data in a text file and i wanted to read the data inside it.
2003,04,15,15,15,00,38.4279,-76.61,1565,3.7,0.0,38.19,-999,-999,3.9455,3.1457,2.9253
2003,04,15,16,50,00,38.368,-76.5,1566,3.7,0.0,35.01,-999
2003,04,15,17,50,00,38.3074,-76.44
I have used the following codes:
a= zeros(4460,216);
nl = a(:,1);
nc = a(1,:);
if fid>0
for i = 1:length(nl)
d = textscan(Ligne,'%f','whitespace',',');
numbers = d{:}';
D = a(i) + numbers;
i = i+1;
end
Ligne = fgetl(fid);
end
The problem is that i cant implement the matrix D. The data are being replaced each time. Can somebody help me please?
Assuming your file looks like:
Header
Header
Header
2003,04,15,15,15,00,38.4279,-76.61,1565,3.7,0.0,38.19,-999,-999,3.9455,3.1457,2.9253
2003,04,15,16,50,00,38.368,-76.5,1566,3.7,0.0,35.01,-999
2003,04,15,17,50,00,38.3074,-76.44
In the example you have 4 headerlines and the delimiter is ','. Now just use importdata as a very convenient import function:
X = importdata('myData.txt',',',4)
which returns:
X =
data: [3x17 double]
textdata: {4x17 cell}
colheaders: {1x17 cell}
X.data contains your numeric data. As the data in your file has a different number of entries in every row, missing values are filled with NaN. X.textdata contains the skipped header lines as strings.
You can process them, if needed with textscan:
additionalInformation = textscan(X.textdata, ... )
The alternative suggested by Shai using csvread with the row offset set to 4 does the job as well. But be aware that missing values are replaced with zeros, what I personally dislike for further processing of data. Especially as your actual data also contains zeros.
X = csvread('myData.txt',4)
You already said it: D is replaced every time. This is happening since you don't specify indices when accessing D. You should do something like
D = zeros(size(a))
....
if ...
for ...
...
D(i) = a(i) + numbers;
...
end
end
But as Shai pointed out, there might be a simpler solution to your problem.
Have you considered using csvread?
D = csvread( filename );
Regarding your code, you have two major bugs
D = a(i)+numbers; - you actually override D at each iteration. Try D(i,:) = a(i,:)+numbers; instead
i=i+1; - you change the loop variable inside the loop! if you are using a for-loop on i you do not need to increment it manually.
And some comments:
It is best not to use i as a variable name in Matlab.
You pre-allocated a but not D, consider pre-allocating D as well.
Finally i have used these code lines.
D = NaN(size(a));
i=1;
while ~(Ligne==-1)
d = textscan(Ligne,'%f','whitespace',',');
numbers = d{:}';
D(i,:) = numbers;
Ligne = fgetl(fid);
i=i+1;
end

sprintf confusion (Matlab)

Quick question,
I would like to make a count from 50-70 using sprintf in Matlab. This example prints 0101-0120
for i = 1:20
filename = sprintf('Brain_01%02d.dcm', i);
[X(:,:,1,i), amap] = dicomread(filename);
end
How would I change this to print 0151-0170?
The answer seemed obvious at first, but it seems like another issue might be related to the indexing of X getting broken if i doesn't start at one. Here's one way to address that while handling pre-allocation of X,
imgInds = 151:170;
di = dicominfo(sprintf('Brain_%04d.dcm',imgInds(1)));
X = zeros(di.Height,di.Width,1,numel(imgInds),class(dicomread(di))); % modify
for i = 1:numel(imgInds),
filename = sprintf('Brain_%04d.dcm', imgInds(i));
[X(:,:,1,i), amap] = dicomread(filename);
end
For clarity, I think it is better to build your sprintf with %04d instead of 01%02d. You should set the size of X accordingly on the line labeled modify, particularly the third dimension since I assume your actual code will not have this be 1.
I'm guessing this should do it:
for i = 51:70
filename = sprintf('Brain_01%02d.dcm', i);
[X(:,:,1,i), amap] = dicomread(filename);
end
Thank you for your responses! Actually all I needed to do (for my purposes) was:
for i = 1:20
filename = sprintf('Brain_01%02d.dcm', i + 49);
[X(:,:,1,i), amap] = dicomread(filename);
end
which made the count start from 50.

Parse text file into structure

For some purposes, I wanna give an external text file as input of one of my MATLAB functions.
Generally this text file shows the following layout:
-----------------------------------------------------
HubHt = 90;
GridWidth = 220;
GridHeight = 220;
Ny = 35;
Nz = 37;
Nfft = 8192;
time = 620;
Uhub = 15;
Coherence = Bladed;
-----------------------------------------------------
To read it in, I'm currently calling this piece of code:
fid = fopen('test.inp','r+');
A = textscan(fid,'%s','Delimiter',';','commentStyle', '-','CollectOutput',1);
fclose(fid);
A = A{1};
inputs = regexp(A,' = ','split');
The last variable, inputs results in a <9x1> cell; each element will be a <1x2> cell.
The first element of the <1x2> cell is supposed to be the field of a overall INPUT structure, whereas the second element is the associated parameter.
At the moment, I'm using a quite static and awful way to achieve my goal:
inp = struct(char(inputs{1}(1)),str2double(inputs{1}(2)),char(inputs{2}(1)),str2double(inputs{2}(2)),char(inputs{3}(1)),str2double(inputs{3}(2)),char(inputs{4}(1)),str2double(inputs{4}(2)),char(inputs{5}(1)),str2double(inputs{5}(2)),char(inputs{6}(1)),str2double(inputs{6}(2)),char(inputs{7}(1)),str2double(inputs{7}(2)),char(inputs{8}(1)),str2double(inputs{8}(2)),char(inputs{9}(1)),char(inputs{9}(2)));
I believe that exist quite better ways to do the same: I'd like if you could share one with me.
You can use cell2struct:
% create cell vector where fieldnames and values alternate
tmp = [inputs{:}];
inp = cell2struct(tmp(2:2:end), tmp (1:2:end), 2);
Since what you have written is (nearly) valid Matlab source code why not give it the file extension .m and just run it ? Or call it from inside your function.
This is is an approach which we've used a lot; it's straightforward and simple. Obviously you have to make sure that it is (entirely) valid Matlab source but that's not difficult.