How can I merge this data in MATLAB? - matlab

In the sample text file below, if column 3 contains a 1 then the corresponding data of column 2 should be merged with the data of the previous row in column 2. For example, the 40 in row 2 should be added to the 10 in row 1, then row 2 should be set to 0 (as shown in the modified sample text file). The problem with my code below is that it only records changes in the current data time(i,1) but not the changes made for the previous data.
original.txt
a time c
1 10 0
2 40 1
3 20 0
4 11 0
5 40 1
modified.txt
a time c
1 50 0
2 0 0
3 20 0
4 51 0
5 0 0
fid=fopen('data.txt');
A=textscan(fid,'%f%f%f');
a =A{1};
time=A{2};
c =A{3};
fclose(fid);
fid=fopen('newData.txt','wt');
for i=1:size(a)
if c(i,1)==1
time(i-1,1)=time(i,1)+time(i-1,1); % merge the time of the current and the previous
time(i,1) =0; %set the time to 0
array = []; %empty the array
array = [a(i,1) time c(i,1)]; add new data
format short g;
fprintf(fid,'%g\t %g\t %g\n',array);
end
fclose(fid)

The reason the current time value is written properly but the previous one isn't is because you have already written the previous one to the file on the previous loop iteration, so there is no way for you to change it. You need to remove the printing from within the loop and add it after you adjust all of the time values.
You can also take advantage of vectorization by using the FIND function instead of a for loop. You also only need one call to FPRINTF to output all the data. Try this:
a = [1; 2; 3; 4; 5]; %# Sample data
time = [10; 40; 20; 11; 40]; %# Sample data
c = [0; 1; 0; 0; 1]; %# Sample data
index = find(c == 1); %# Find indices where c equals 1
temp = time(index); %# Temporarily store the time values
time(index) = 0; %# Zero-out the time points
time(index-1) = time(index-1)+temp; %# Add to the previous time points
c(index) = 0; %# Zero-out the entries of c
fid = fopen('newData.txt','wt'); %# Open the file
fprintf(fid,'%g\t %g\t %g\n',[a time c].'); %'# Write the data to the file
fclose(fid); %# Close the file

Related

MATLAB: summing values in matrix up to a threshold level of one column

So, I have a matrix with 2 columns and 5 rows (as an example).
2 1
5 1
3 1
4 1
7 1
what I want to do is:
Starting from position (1,1) and moving down the first column, find the cells that lead to a value <10. In this case I would have:
step 1: 2 = 10? No, continue
step 2: 2+5 = 10? No, continue
step 3: 2+5+3 = 10? Yes, stop and return the sum of the corresponding values in the second column
step 4: 4 = 10? No, continue
step 5: 4+7 = 10? No, it's larger, thus we save the previous step and return 1 form the second column.
In the end of this process I would need to obtain a new matrix that looks like this:
10 3
4 1
7 1
You can perform exactly the logic you described in a loop.
Each row, test the "recent sum", where "recent" here means from the last output row to the current row.
If the sum is 10 or more, add to the output as described. Otherwise continue to the next row.
Code:
% Original data
x =[2 1; 5 1; 3 1; 4 1; 7 1];
% Output for result
output = [];
% idx is the row we start sum from on each test
idx = 1;
% Loop over all rows
for ii = 1:size(x,1)
% Get sum in column 1
s = sum(x(idx:ii, 1));
if s == 10
% Append row with sums
output = [output; 10 sum(x(idx:ii,2))];
idx = ii+1;
elseif s > 10
% Append all "recent" rows
output = [output; x(idx:ii,:)];
idx = ii+1;
end
end
Result:
>> disp(output)
10 3
4 1
7 1
This is my proposed solution:
% Create A and the cumulative sum of its first column...
A = [2 1; 5 1; 3 1; 4 1; 7 1];
A_cs = cumsum(A(:,1));
% Create a variable R to store the result and an indexer to it...
R = NaN(size(A_cs,1),2);
R_off = 1;
% Perform the computation...
while (~isempty(A_cs))
idx = find(A_cs <= 10);
idx_max = max(idx);
R(R_off,:) = [A_cs(idx_max) sum(A(idx,2))];
A_cs = A_cs - A_cs(idx_max);
disp(A_cs);
A_cs(idx) = [];
R_off = R_off + 1;
end
% Clear unused slots in R...
R(any(isnan(R),2),:) = [];
The computation is performed by taking the maximum index of the first cumsum group that lies within the specified threshold (10 in this case). Once it has been found, its corresponding value is inserted into the result matrix and the cumsum array is updated by removing the group entries and subtracting their sum. When the cumsum array is empty, the iteration finishes and R contains the desired result.

MATLAB separating array [duplicate]

I'm trying to elegantly split a vector. For example,
vec = [1 2 3 4 5 6 7 8 9 10]
According to another vector of 0's and 1's of the same length where the 1's indicate where the vector should be split - or rather cut:
cut = [0 0 0 1 0 0 0 0 1 0]
Giving us a cell output similar to the following:
[1 2 3] [5 6 7 8] [10]
Solution code
You can use cumsum & accumarray for an efficient solution -
%// Create ID/labels for use with accumarray later on
id = cumsum(cut)+1
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(mask).',vec(mask).',[],#(x) {x})
Benchmarking
Here are some performance numbers when using a large input on the three most popular approaches listed to solve this problem -
N = 100000; %// Input Datasize
vec = randi(100,1,N); %// Random inputs
cut = randi(2,1,N)-1;
disp('-------------------- With CUMSUM + ACCUMARRAY')
tic
id = cumsum(cut)+1;
mask = cut==0;
out = accumarray(id(mask).',vec(mask).',[],#(x) {x});
toc
disp('-------------------- With FIND + ARRAYFUN')
tic
N = numel(vec);
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
toc
disp('-------------------- With CUMSUM + ARRAYFUN')
tic
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
toc
Runtimes
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.068102 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.117953 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 12.560973 seconds.
Special case scenario: In cases where you might have runs of 1's, you need to modify few things as listed next -
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Setup IDs differently this time. The idea is to have successive IDs.
id = cumsum(cut)+1
[~,~,id] = unique(id(mask))
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(:),vec(mask).',[],#(x) {x})
Sample run with such a case -
>> vec
vec =
1 2 3 4 5 6 7 8 9 10
>> cut
cut =
1 0 0 1 1 0 0 0 1 0
>> celldisp(out)
out{1} =
2
3
out{2} =
6
7
8
out{3} =
10
For this problem, a handy function is cumsum, which can create a cumulative sum of the cut array. The code that produces an output cell array is as follows:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [0 0 0 1 0 0 0 0 1 0];
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = {};
for i=1:numel(sumvals)
output{i} = vec(cutsum == sumvals(i)); %#ok<SAGROW>
end
As another answer shows, you can use arrayfun to create a cell array with the results. To apply that here, you'd replace the for loop (and the initialization of output) with the following line:
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
That's nice because it doesn't end up growing the output cell array.
The key feature of this routine is the variable cutsum, which ends up looking like this:
cutsum =
0 0 0 NaN 1 1 1 1 NaN 2
Then all we need to do is use it to create indices to pull the data out of the original vec array. We loop from zero to max and pull matching values. Notice that this routine handles some situations that may arise. For instance, it handles 1 values at the very beginning and very end of the cut array, and it gracefully handles repeated ones in the cut array without creating empty arrays in the output. This is because of the use of unique to create the set of values to search for in cutsum, and the fact that we throw out the NaN values in the sumvals array.
You could use -1 instead of NaN as the signal flag for the cut locations to not use, but I like NaN for readability. The -1 value would probably be more efficient, as all you'd have to do is truncate the first element from the sumvals array. It's just my preference to use NaN as a signal flag.
The output of this is a cell array with the results:
output{1} =
1 2 3
output{2} =
5 6 7 8
output{3} =
10
There are some odd conditions we need to handle. Consider the situation:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14];
cut = [1 0 0 1 1 0 0 0 0 1 0 0 0 1];
There are repeated 1's in there, as well as a 1 at the beginning and end. This routine properly handles all this without any empty sets:
output{1} =
2 3
output{2} =
6 7 8 9
output{3} =
11 12 13
You can do this with a combination of find and arrayfun:
vec = [1 2 3 4 5 6 7 8 9 10];
N = numel(vec);
cut = [0 0 0 1 0 0 0 0 1 0];
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
1 2 3
out{2} =
5 6 7 8
out{3} =
10
So how does this work? Well, the first line defines your input vector, the second line finds how many elements are in this vector and the third line denotes your cut vector which defines where we need to cut in our vector. Next, we use find to determine the locations that are non-zero in cut which correspond to the split points in the vector. If you notice, the split points determine where we need to stop collecting elements and begin collecting elements.
However, we need to account for the beginning of the vector as well as the end. ind_after tells us the locations of where we need to start collecting values and ind_before tells us the locations of where we need to stop collecting values. To calculate these starting and ending positions, you simply take the result of find and add and subtract 1 respectively.
Each corresponding position in ind_after and ind_before tell us where we need to start and stop collecting values together. In order to accommodate for the beginning of the vector, ind_after needs to have the index of 1 inserted at the beginning because index 1 is where we should start collecting values at the beginning. Similarly, N needs to be inserted at the end of ind_before because this is where we need to stop collecting values at the end of the array.
Now for ind_after and ind_before, there is a degenerate case where the cut point may be at the end or beginning of the vector. If this is the case, then subtracting or adding by 1 will generate a start and stopping position that's out of bounds. We check for this in the 4th and 5th line of code and simply set these to 1 or N depending on whether we're at the beginning or end of the array.
The last line of code uses arrayfun and iterates through each pair of ind_after and ind_before to slice into our vector. Each result is placed into a cell array, and our output follows.
We can check for the degenerate case by placing a 1 at the beginning and end of cut and some values in between:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [1 0 0 1 0 0 0 1 0 1];
Using this example and the above code, we get:
>> celldisp(out)
out{1} =
1
out{2} =
2 3
out{3} =
5 6 7
out{4} =
9
out{5} =
10
Yet another way, but this time without any loops or accumulating at all...
lengths = diff(find([1 cut 1])) - 1; % assuming a row vector
lengths = lengths(lengths > 0);
data = vec(~cut);
result = mat2cell(data, 1, lengths); % also assuming a row vector
The diff(find(...)) construct gives us the distance from each marker to the next - we append boundary markers with [1 cut 1] to catch any runs of zeros which touch the ends. Each length is inclusive of its marker, though, so we subtract 1 to account for that, and remove any which just cover consecutive markers, so that we won't get any undesired empty cells in the output.
For the data, we mask out any elements corresponding to markers, so we just have the valid parts we want to partition up. Finally, with the data ready to split and the lengths into which to split it, that's precisely what mat2cell is for.
Also, using #Divakar's benchmark code;
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.272810 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.436276 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 17.112259 seconds.
-------------------- With mat2cell
Elapsed time is 0.084207 seconds.
...just sayin' ;)
Here's what you need:
function spl = Splitting(vec,cut)
n=1;
j=1;
for i=1:1:length(b)
if cut(i)==0
spl{n}(j)=vec(i);
j=j+1;
else
n=n+1;
j=1;
end
end
end
Despite how simple my method is, it's in 2nd place for performance:
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.264428 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.407963 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 18.337940 seconds.
-------------------- SIMPLE
Elapsed time is 0.271942 seconds.
Unfortunately there is no 'inverse concatenate' in MATLAB. If you wish to solve a question like this you can try the below code. It will give you what you looking for in the case where you have two split point to produce three vectors at the end. If you want more splits you will need to modify the code after the loop.
The results are in n vector form. To make them into cells, use num2cell on the results.
pos_of_one = 0;
% The loop finds the split points and puts their positions into a vector.
for kk = 1 : length(cut)
if cut(1,kk) == 1
pos_of_one = pos_of_one + 1;
A(1,one_pos) = kk;
end
end
F = vec(1 : A(1,1) - 1);
G = vec(A(1,1) + 1 : A(1,2) - 1);
H = vec(A(1,2) + 1 : end);

Importing Multiple text File to Matlab

I need to import some text files as matrix In Matlab. Can anyone help me for code please? Here is the my text file names.
elist_S06n1.txt
elist_S06n2.txt
elist_S06n3.txt
elist_S06n4.txt
elist_S07n1.txt
elist_S07n2.txt
elist_S07n3.txt
elist_S07n4.txt
.
.
.
elist_S27n5.txt
So, till elist_S09n1.tx n is going 1 through 4, then it is going 1 through 5.
Thank you in advance.
Thanks for your update so we can see what you have tried so far.
It seems to me that you have difficulties generating the proper filename. Instead of looping over your cell array index, you could use two loops, one from 6 to 27 and the other from 1 to 4 or 5. Based on these values, you can easily generate the desired filename (mind the leading zero!). Within the loop, you keep track of an index for the resulting cell array.
By the way, if I count the number of files, I arrive at a total of 18*5 + 4*4 = 106 and not 95.
The code:
numfiles = (27-9)*5 + (9-5)*4;
mydata = cell(1, numfiles);
idx = 0; % index for mydata
n = 4;
for k1 = 6:27
if k1 == 10
n = 5; % switch to 5 files if k1 reaches 10
end
for k2 = 1:n
idx = idx+1;
myfilename = sprintf('elist_S%02dn%d.txt', k1, k2);
mydata{idx} = importdata(myfilename);
end
end

Using Matlab to make modification to a text file

Essentially I am writing a Matlab file to change the 2nd, 3rd and 4th numbers in the line below "STR" and above "CON" in the text file (which is given below and is called '0.dat'). Currently, my Matlab code makes no changes to the text file.
Text File
pri
3
len 0.03
vic 5 5
MAT
1 147E9 0.3 0 4.9E9 8.5E9
LAY
1 0.000125 1 45
2 0.000125 1 0
3 0.000125 1 -45
4 0.000125 1 90
5 0.000125 1 45
WAL
1 1 2 3 4 5
PLATE
1 0.005 1 1
STR
1 32217.442335442 3010.34241024889 2689.48842888812
CON
1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1
ATT
1 901 7 901
LON
34
POI
123456
1 7
X 0.015
123456
2 6
X 0.00381966011250105 0.026180339887499
123456
3 5
X 0.000857864376269049 0.0291421356237309
123456
4
X 0
PLO
2 3
CRO
0
RES
INMOD=1
END
Matlab code:
impafp = importdata('0.dat','\t');
afp = impafp.textdata;
fileID = fopen('0.dat','r+');
for i = 1:length(afp)
if (strncmpi(afp{i},'con',3))
newNx = 100;
newNxy = 50;
newNy = 500;
myformat = '%0.6f %0.9f %0.9f %0.9f\n';
newData = [1 newNx newNxy newNy];
afp{i-1} = fprintf(fileID, myformat, newData);
fclose(fileID);
end
end
From the help for importdata:
For ASCII files and spreadsheets, importdata expects to find numeric
data in a rectangular form (that is, like a matrix). Text headers can
appear above or to the left of numeric data. To import ASCII files
with numeric characters anywhere else, including columns of character
data or formatted dates or times, use TEXTSCAN instead of import data.
Indeed, if you print out the value of afp, you'll see that it just contains the first line. You were also not performing any operation that was writing to a file. And you were not closing the file ID if the if state wasn't triggered.
Here is one way to do this with textscan (which is probably faster too):
% Read in data as strings using textscan
fid = fopen('0.dat','r');
afp = textscan(fid,'%s','Delimiter','');
fclose(fid);
isSTR = strncmpi(afp{:},'str',3); % True for all lines starting with STR
isCON = strncmpi(afp{:},'con',3); % True for all lines starting with CON
% Find indices to replace - create logical indices
% True if line before is STR and line after is CON
% Offset isSTR and isCON by 2 elements in opposite directions to align
% Use & to perform vectorized AND
% Pad with FALSE on either side to make output the same length as afp{1}{:}
datIDX = [false;(isSTR(1:end-2)&isCON(3:end));false];
% Overwrite data using sprintf
myformat = '%0.6f %0.9f %0.9f %0.9f';
newNx = 100;
newNxy = 50;
newNy = 500;
newData = [1 newNx newNxy newNy];
afp{1}{datIDX} = sprintf(myformat, newData); % Set only elements that pass test
% Overwrite old file using fprintf (or change filename to new one)
fid = fopen('0.dat','w');
fprintf(fid,'%s\r\n',afp{1}{1:end-1});
fprintf(fid,'%s',afp{1}{end}); % Avoid blank line at end
fclose(fid);
If you're unfamiliar with logical indexing, you might read this blog post and this.
I would recommend just reading the entire file in, finding which lines contain your "keywords", modifying specific lines, and then writing it back out to a file, which can have the same name or a different one.
file = fileread('file.dat');
parts = regexp(file,'\n','split');
startIndex = find(~cellfun('isempty',regexp(parts,'STR')));
endIndex = find(~cellfun('isempty',regexp(parts,'CON')));
ind2Change = startIndex+1:endIndex-1;
tempCell{1} = sprintf('%0.6f %0.9f %0.9f %0.9f',[1,100,50,500]);
parts(ind2Change) = deal(tempCell);
out = sprintf('%s\n',parts{:});
out = out(1:end-1);
fh = fopen('file2.dat','w');
fwrite(fh,out);
fclose(fh);

how to read a matrix from a text file in matlab

I have a text file which has 500 columns and 500 rows, of numerical(integer) values . Every element in the row is separated by a tab. I want to read this file as a matrix in matlab. Example(my text file is like this):
1 2 2 1 1 2
0 0 0 1 2 0
1 2 2 1 1 2
0 0 0 1 2 0
And after reading this text file as a matrix (a[]) in matlab I want to do transpose.
Help me.
You can use importdata.
Something like:
filename = 'myfile01.txt';
delimiterIn = '\t';
headerlinesIn = 1;
A = importdata(filename,delimiterIn,headerlinesIn);
A_trans = A';
You can skip headerlines if your file does not have any haeder.. (It is the number of lines before the actual data starts)
Taken from Matlab documentation, improtdata
Have you tired load with -ascii option?
For example
a = load('myfile.txt', '-ascii'); % read the data
a = a.'; %' transpose
% Pre-allocate matrix
Nrow=500; Ncol=500;
a = zeros(Nrow,Ncol);
% Read file
fid = fopen('yourfile.txt','r');
for i:1:Nrow
a(i,:) = cell2mat(textscan(fid,repmat('%d ',Ncol));
end
fclose(fid);
% Trasnspose matrix
a_trans = a.';
You could simply do:
yourVariable = importdata('yourFile.txt')';
%Loads data from file, transposes it and stores it into 'yourVariable'.