Matlab: How to read and extract matrix by specifying header name? - matlab

Is it possible to read a matrix under a specified headline from a text file?
I have a text file like this:
Header A (2x3):
3 6 7
5 8 8
Header B (4x4):
23 65 2 6
4 6 7 8
33 7 8 9
so what I want to accomplish is to take the header names as an argument and grab the matrix under it. Is it possible to do in Matlab?
Thanks in advance!!

In addition, try to use this code:
infilename = '1.txt'; % name of your file
m = memmapfile(infilename); % load file to memory (and after close it)
instrings = strsplit(char(m.Data.'),'\n','CollapseDelimiters',true).';
checkstr = 'Header B';
% find all string (their indices) starting with checkstr
ind = find(strncmpi(instrings,checkstr,length(checkstr)));
data = [];
if isempty(ind)
fprintf('\n No strings with %s',checkstr)
else
% first string with string checkstr
n = ind(1)+1;
N = length(instrings);
while n<=N % find all numerical data after string with `checkstr`
convert = str2num(instrings{n});
if isempty(convert), break, end % find non-numerical data
data(end+1,1:length(convert)) = convert; % it because you can have various number of columns
n = n+1;
end
end
data % display load data
output
23 65 2 6 7
4 6 7 8 0
33 7 8 9 0
for the file 1.txt:
Header A (2x3):
3 6 7
5 8 8
Header B (4x4):
23 65 2 6 7
4 6 7 8
33 7 8 9

The following would work, but might not be all that fast if you are dealing with a lot of data:
function [ matrixOut ] = readLineBasedOnHeader( headerString, FileName )
%readLineBasedOnHeader: Scan through a text file, and return matrix below
% a row which starts with the string `headerString`
% Read each row into cell array:
cellStrings = dataread('file', FileName, '%s', 'delimiter', '\n'); %#ok<DDTRD>
% Find the row matching headerString
headerIndex = ismember(cellStrings, headerString);
if sum(headerIndex) == 1
% We've found 1 match; return the matrix
% find how many rows have numberic
rowIdx = find(headerIndex)+1;
matrixOut = str2num(cellStrings{rowIdx}); %#ok<ST2NM>
stillAnumber = ~isempty(matrixOut);
if ~stillAnumber
error('row beneath header found not numeric');
end
while stillAnumber && rowIdx < length(cellStrings)
rowIdx = rowIdx+1;
nextRow = str2num(cellStrings{rowIdx}); %#ok<ST2NM>
stillAnumber = ~isempty(nextRow);
matrixOut = [matrixOut; nextRow]; %#ok<AGROW>
end
elseif sum(headerIndex) > 1
% More than 1 match; throw an error
error('More than 1 copy of header string found');
else % Less than 1 match; throw an error
error('Header string not found');
end
end
Assuming you have a file text_file.txt with the content you have given above, then running:
readLineBasedOnHeader('Header A (2x3):', 'text_file.txt') should return:
ans =
3 6 7
5 8 8
And running:
readLineBasedOnHeader('Header B (4x4):', 'text_file.txt')
Should return:
ans =
23 65 2 6
4 6 7 8
33 7 8 9
Note that this requires you input the full header line (i.e. an exact match for the row); but I'm sure you could have a play with this to get it to match just the Header A bit.

Related

How to read desired ranges of rows in a text file and assign the elements of these ranges to different matrixes in MATLAB?

I have a text file which has some elements:
5
4
4 3 1 4
3 1 2 1
9 8 1 3
4 Inf Inf 4
13 9 Inf 6
1 3
2 3
3 4
4 5
-1 -1
I need to create 2 different matrixes with these elements. The first two elements in the first two rows (5 and 4 here) correspond to the size (mxn) of the first matrix (In this example, a 5x4 matrix).
I should assign the below mxn elements (from row2 to row5, total 20 elements here) into a matrix.
After that, the remained values until the final row (which has -1 -1) should be assigned to another pxt matrix (In this example, a 4x2 matrix. The row with -1 -1 indicates the end of the line.)
I will use many text files and the numbers of rows and columns of them are different each other (The sizes of matrixes which should be created are different.), so I need to write codes which can run all of the text files. I've tried to write a piece of code but it's results are wrong because there are empty spaces between values and the program suppose these spaces as characters. Also, 13 and Inf have more than one character. Here is my code and the result for the first matrix.
Also, I need to create a second matrix like I explain, but I don't know how to do that.
clear;
clc;
fileID=fopen('1.txt', 'r');
nrow = fscanf(fileID,'%d',1);
ncolumn = fscanf(fileID,'%d',1);
maxrix1 = zeros(nrow,ncolumn);
a = 1;
nline = 1;
while ~feof(fileID) && nline<nrow+2
line = fgetl(fileID);
if(nline > 1 && nline<=nrow+2)
for b = 1:ncolumn
if ~ischar(line), break, end
maxrix1(a, b) = str2double(line(b));
end
a = a + 1;
end
nline = nline + 1;
end
fclose(fileID);
Here is the result I've received, but it isn't true because of the empty spaces and elements which have more than one character (Inf and 13)
4 NaN 3 NaN
3 NaN 1 NaN
9 NaN 8 NaN
4 NaN NaN NaN
1 3 NaN 9
It should be:
4 3 1 4
3 1 2 1
9 8 1 3
4 Inf Inf 4
13 9 Inf 6
After correct the code for creating matrix1, I need to create matrix2 like that:
1 3
2 3
3 4
4 5
This is how I would approach the problem:
fid = fopen('file.txt');
M = str2double(fgetl(fid));
N = str2double(fgetl(fid));
matrix1 = NaN(M,N); % initiallize and preallocate
for m = 1:M
li = fgetl(fid); % read next line
matrix1(m,:) = str2double(strsplit(li, ' ')); % avoid str2num
end
matrix2 = []; % initiallize. We cannot preallocate
while true % we will exit explicitly with a break statement
li = fgetl(fid); % read next line. Gives -1 if end of file
if ~isequal(li, -1)
matrix2(end+1,:) = str2double(strsplit(li, ' ')); % avoid str2num
else
break
end
end
matrix2(end,:) = []; % remove last row, which contains [-1 -1]
fclose(fid)

Import matlab with space and tab delimiter

I need to import a txt file into Matlab which has this format
text text text
1 0 1 2 3
4 5 6 7
2 10 11 15 18
15 1 18 3
The first column is separated with the second one by a tab delimiter, while the rest of the data are separated by a space.
I tried to import it using this:
g = importdata('file.txt',delimiterIn,headerlinesIn);
delimiterIn = ' ';
headerlinesIn = 1;
but then the extracted table is like this:
text text text
1 0 1 2 3
4 5 6 7 nan
2 10 11 15 18
15 1 18 3 nan
What I want is a table that maintains the format, with the first column of g.data on its own and then all the others.
I want an output matrix like
1 0 1 2 3
4 5 6 7
2 10 11 15 18
15 1 18 3
Then if I need to extract data represented by 2 in the first column, I can put it into another matrix with the values
10 11 15 18
15 1 18 3
each number inside a cell of a matrix
How can I do it?
A sollution might be:
fid = fopen('test.txt');
M = {[]};Midx=1;
l = fgetl(fid); %header
l = fgetl(fid);
while ~isnumeric(l)
idx = str2double(l(1));
if ~isnan(idx)
Midx=idx;
M{Midx}=[];
l = l(2:end);
end
val = cell2mat(textscan(l,'%f'))';
M{Midx}=[M{Midx};val];
l=fgetl(fid);
end
fclose(fid);
Maybe a bit too pragmatic, but this might help:
for i=1:size(A,1)
if isnan(A(i,end))==1
A(i,2:end) = A(i,1:4);
A(i,1) = NaN;
end
end
for i=1:size(A,1)
if A(i,1)==2
B = A(i:i+1,2:end);
end
end

Import table with readtable if row is shifted in .txt file

I have a table that looks like this
x x x x x x
x x
y y y y y y
y y
z z z z z z
z z
I want to import it using readtable such that all the x are in one row, all the y in the next row, etc. In other words, in the .txt file the last two contents that are supposed to be in one line are shifted into the next. I think I need to change something in DelimitedTextImportOptions but I cannot figure out what exactly.
Would be glad if someone could help me with this, thank you very much in advance!
If it is a requirement to use readtable, one option would be to transform the original file to a new format and then apply readtable to the new file.
Here are sample contents of the file in.txt that can be used in the example below:
1 2 3 abc 5 6
7 8
3 4 5 def 7 8
9 0
9 1 0 ghi 3 2
1 4
Here is the code:
% FIRST, TRANSFORM THE INPUT FILE INTO A FILE WHERE THE SPLIT LINES ARE
% COMBINED INTO SINGLE LINES
% open input and output files
in = fopen('in.txt', 'r');
out = fopen('out.txt', 'w');
% read the first line of the input file
currline = fgetl(in);
% while we haven't reached the end of the file
while ~isequal(currline, -1)
% read the following line of the input file
currline_append = fgetl(in);
% ... if it doesn't exist, throw an error; the file is not as expected
if isequal(currline_append, -1)
error('Bad file');
end
% print this pair of lines to the output file as a single line.
% Note: if using Windows Notepad or similar application to read the
% file, you may want to replace '\n' by '\r\n' in the format string
fprintf(out, '%s %s\n', currline, currline_append);
% get the next line of the input file
currline = fgetl(in);
end
% close input and output files
fclose(in);
fclose(out);
% NEXT, READ THE TABLE FROM THE OUTPUT FILE
t = readtable('out.txt');
Actually, if your text file is shaped EXACTLY and STRICTLY as you described (every row has the same number of elements, and two of them overflow into the next line), you can read it very easily as follows:
fid = fopen('data.txt','r');
data = textscan(fid,'%d','CollectOutput',true);
fclose(fid);
data_inner = data{1,1};
cols = 8; % predefined number of elements per row
rows = numel(data_inner) / cols;
C = reshape(data_inner.',cols,rows).';
Example of input:
1 2 3 4 5 6
7 8
2 3 4 5 6 7
8 9
3 4 5 6 7 8
9 10
Example of output:
C =
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 10
Once this is done, the matrix can be easily converted into a table as follows:
T = array2table(C)
T =
C1 C2 C3 C4 C5 C6 C7 C8
__ __ __ __ __ __ __ __
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
3 4 5 6 7 8 9 10

Insert certain value after occurence of a set of n equal values

Example:
input = [1 255 0 0 0 9 9 9 1 6 6 6 6 6 6 1]; % array of numbers (uint8)
output = [1 255 0 0 0 255 9 9 9 255 1 6 6 6 255 6 6 6 255 1];
% output must have 255 inserted at positions 6, 10, 15, 19
% because 0, 9, 6, 6 have occurred three times respectively
outputIndex = [6 10 15 19];
% outputIndex must indicate the positions where 255 was inserted
This could be one vectorized approach to get things done efficiently -
%// Input
A = [1 255 0 0 0 9 9 9 1 6 6 6 6 6 6 1]
%// Input paramter (how many times a value must be repeated for detection)
search_count = 3;
%// Find difference between consecutive elemnts and set all non zero
%// differences as ones, otherwise as zeros in a binary array
diffA = diff(A)~=0
%// Find start and end indices of "islands" of same value
starts = strfind([1 diffA],[1 zeros(1,search_count-1)])
ends = strfind([diffA 1],[zeros(1,search_count-1) 1])+search_count
%// For each island of same valued elements, find out where first group ends
firstgrp = starts + search_count
%// Find how many times a group of that same value of search_count times repeats
%// within each "island" of same valued elements. Also get the max repeats.
pattern_repeats = floor((ends - starts)./search_count)
max_repeat = max(pattern_repeats)
%// All possible repeat indices within all islands
all_repeats = bsxfun(#plus,firstgrp,[0:max_repeat-1]'*(search_count)) %//'
%// Use a binary mask to select only those repeats allowed with pattern_repeat
out_idx = all_repeats(bsxfun(#lt,[0:max_repeat-1]',pattern_repeats)) %//'
out_idx = out_idx + [0:numel(out_idx)-1]' %//'
%// Create output arary, insert 255 at out_idx locations and put values
%// from input array into rest of the locations
out = zeros(1,numel(A)+numel(out_idx));
out(out_idx) = 255
out(out==0) = A
Code run -
>> A
A =
Columns 1 through 13
1 255 0 0 0 9 9 9 1 6 6 6 6
Columns 14 through 16
6 6 1
>> out_idx
out_idx =
6
10
15
19
>> out
out =
Columns 1 through 13
1 255 0 0 0 255 9 9 9 255 1 6 6
Columns 14 through 20
6 255 6 6 6 255 1
I don't understand the downvotes, it's actually an interesting question.
Here the long answer:
n = 3;
subst = 255;
input = [1 255 0 0 0 9 9 9 1 6 6 6 6 6 6 61];
%// mask
X = NaN(1,numel(input));
%// something complicated (see below)
X(filter(ones(1,n-1),1,~([0. diff(input)])) == n-1) = 1;
%// loop to split multiple occurences of n-tuples
for ii = 1:numel(input)
if X(ii) == 1 && ii < numel(X)-n+1
X(ii+1:ii+n-1) = NaN(1,n-1);
end
end
%// output vector
D = [input; X.*subst];
E = D(:);
output = E(isfinite(E))
%// indices of inserted 255
D = [input.*0; X.*subst];
E = D(:);
outputIndex = find(E(isfinite(E)))
Explanation of the complicated part:
%// finite differences of input
A = [0 diff(input)];
%// conversion to logical
B = ~A;
%// mask consecutive values
mask = filter(ones(1,n-1),1,B) == n-1;
%// set masked values to 1
X(mask) = 1;
If you have the image processing toolbox you can save the loop with this fancy oneliner for getting the mask:
mask = accumarray(bwlabel(filter(ones(1,n-1),1,~([0. diff(input)])) == n-1).'+1,1:numel(input),[],#(x) {getfield(sort(x),{find(mod(cumsum(1:numel(x)),n) == 1)})});
X = NaN(1,numel(input));
X(vertcat(mask{2:end})) = subst;
%// output vector
D = [input; X];
E = D(:);
output = E(isfinite(E))
%// indices of inserted 255
D = [input.*0; X];
E = D(:);
outputIndex = find(E(isfinite(E)))

Matrix 1,2,3 how can i generate?

i want to control the creation of random numbers in this matrix :
Mp = floor(1+(10*rand(2,20)));
mp1 = sort(Mp,2);
i want to modify this code in order to have an output like this :
1 1 2 2 3 3 3 4 5 5 6 7 7 8 9 9 10 10 10 10
1 2 3 3 3 3 3 3 4 5 6 6 6 6 7 8 9 9 9 10
i have to fill each row with all the numbers going from 1 to 10 in an increasing order and the second matrix that counts the occurences of each number should be like this :
1 2 1 2 1 2 3 1 1 2 1 1 2 1 1 2 1 2 3 4
1 1 1 2 3 4 5 6 1 1 1 2 3 4 1 1 1 2 3 1
and the most tricky matrix that i'v been looking for since the last week is the third matrix that should skim through each row of the first matrix and returns the numbers of occurences of each number and the position of the last occcurence.here is an example of how the code should work. this example show the intended result after running through the first row of the first matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (positions)
1 2
2 2
3 3
4 1
5 2
6 1
7 2
8 1
9 2
10 4
(numbers)
this example show the intended result after running through the second row of the first matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (positions)
1 1 2
2 1 2
3 3 6
4 1 1
5 3
6 1 4
7 2 1
8 1 1
9 2 3
10 4
(numbers)
so the wanted matrix must be filled up with zeros from the beginning and each time after running through each row of the first matrix, we add the new result to the previous one...
I believe the following code does everything you asked for. If I didn't understand, you need to get a lot clearer in how you pose your question...
Note - I hard coded some values / sizes. In "real code" you would never do that, obviously.
% the bit of code that generates and sorts the initial matrix:
Mp = floor(1+(10*rand(2,20)));
mp1 = sort(Mp, 2);
clc
disp(mp1)
occCount = zeros(size(mp1));
for ii = 1:size(mp1,1)
for jj = 1:size(mp1,2)
if (jj == 1)
occCount(ii,jj) = 1;
else
if (mp1(ii,jj) == mp1(ii,jj-1))
occCount(ii,jj) = occCount(ii, jj-1) + 1;
else
occCount(ii,jj) = 1;
end
end
end
end
% this is the second matrix you asked for
disp(occCount)
% now the third:
big = zeros(10, 20);
for ii = 1:size(mp1,1)
for jj = 1:10
f = find(mp1(ii,:) == jj); % index of all of them
if numel(f) > 0
last = f(end);
n = numel(f);
big(jj, last) = big(jj, last) + n;
end
end
end
disp(big)
Please see if this is indeed what you had in mind.
The following code solves both the second and third matrix generation problems with a single loop. For clarity, the second matrix M2 is the 2-by-20 array in the example containing the cumulative occurrence count. The third matrix M3 is the sparse matrix of size 10-by-20 in the example that encodes the number and position of the last occurrence of each unique value. The code only loops over the rows, using accumarray to do most of the work. It is generalized to any size and content of mp1, as long as the rows are sorted first.
% data
mp1 = [1 1 2 2 3 3 3 4 5 5 6 7 7 8 9 9 10 10 10 10;
1 2 3 3 3 3 3 3 4 5 6 6 6 6 7 8 9 9 9 10]; % the example first matrix
nuniq = max(mp1(:));
% accumulate
M2 = zeros(size(mp1));
M3 = zeros(nuniq,size(mp1,2));
for ir=1:size(mp1,1),
cumSums = accumarray(mp1(ir,:)',1:size(mp1,2),[],#numel,[],true)';
segments = arrayfun(#(x)1:x,nonzeros(cumSums),'uni',false);
M2(ir,:) = [segments{:}];
countCoords = accumarray(mp1(ir,:)',1:size(mp1,2),[],#max,[],true);
[ii,jj] = find(countCoords);
nzinds = sub2ind(size(M3),ii,nonzeros(countCoords));
M3(nzinds) = M3(nzinds) + nonzeros(cumSums);
end
I won't print the outputs because they are a bit big for the answer, and the code is runnable as is.
NOTE: For new test data, I suggest using the commands Mp = randi(10,[2,20]); mp1 = sort(Mp,2);. Or based on your request to user2875617 and his response, ensure all numbers with mp1 = sort([repmat(1:10,2,1) randi(10,[2,10])],2); but that isn't really random...
EDIT: Error in code fixed.
I am editing the previous answer to check if it is fast when mp1 is large, and apparently it is:
N = 20000; M = 200; P = 100;
mp1 = sort([repmat(1:P, M, 1), ceil(P*rand(M,N-P))], 2);
tic
% Initialise output matrices
out1 = zeros(M, N); out2 = zeros(P, N);
for gg = 1:M
% Frequencies of each row
freqs(:, 1) = mp1(gg, [find(diff(mp1(gg, :))), end]);
freqs(:, 2) = histc(mp1(gg, :), freqs(:, 1));
cumfreqs = cumsum(freqs(:, 2));
k = 1;
for hh = 1:numel(freqs(:, 1))
out1(gg, k:cumfreqs(hh)) = 1:freqs(hh, 2);
out2(freqs(hh, 1), cumfreqs(hh)) = out2(freqs(hh, 1), cumfreqs(hh)) + freqs(hh, 2);
k = cumfreqs(hh) + 1;
end
end
toc