Creating plots based on single column variable - matlab

I'm fairly new to the Matlab community and need help with a particular plotting task! Any assistance would be greatly appreciated.
I've been tasked with creating an automated process that produces numerous 2d line graphs, using X (Elevation) and Y (Chainage) data based on survey data we have gathered in the field. This XY data needs to be split into differing figures in accordance to a variable within a 'Profile_ID' column. An example of the data is shown below:
Easting
Northing
Elevation
Chainage
FC
Name
Profile_ID
219578.603
101400.293
6.675
133.393
CE
N/A
7b01346
219577.925
101400.621
6.088
134.146
X
N/A
7b01346
219577.833
101400.709
6.037
134.267
X
N/A
7b01346
219577.378
101400.789
5.904
134.714
X
N/A
7b01346
219577.319
101400.987
5.887
134.850
X
N/A
7b01346
The PROFILE_ID changes throughout the .txt file. The file is ordered based on profile_id and then chainage
However, I also need to overlay previous survey data to the same corresponding 'Profile_ID' graph. So, essentially I have 2 data sets which have an identical column layout, just with differing X and Y data. One is from a previous survey and one from the newest survey. I was hoping to find a way that allows me to run a for loop to create a figure for every iteration of 'profile_id' and then also overlay the previous surveys data, which has the same 'profile_id'.
I hope that this all makes sense, i've linked an example here: Example of desired graph produced by the script, for one iteration of 'Profile_ID'
Cheers!
clc
clear
close all
%Import inputs
point_file_old = readtable('7b7B3-2_20170627tp.csv'); %Input older file name here
matrix_profile_old = table2array(point_file_old(:,3:4)); %Extracting elevation & chainage column
id_old = point_file_old(:,7);
L_old = length(matrix_profile_old(:,1));
point_file_new = readtable('20220430_7b7B3-2tp.csv'); %Input newer file name here
matrix_profile_new = table2array(point_file_new(:,3:4)); %Extracting elevation & chainage column
id_new = point_file_new(:,7);
L_new = length(matrix_profile_new(:,1));
%Settings
chainage_old = matrix_profile_old(:,2); %Identifying old chainage column
elevation_old = matrix_profile_old(:,1); %Identifying old elevation column
chain_old_num = length(chainage_old); %Amount of rows in chainage
elev_old_num = length(elevation_old) %Amount of rows in elevation
chainage_new = matrix_profile_new(:,2); %Identifying old chainage column
elevation_new = matrix_profile_new(:,1); %Identifying old elevation column
chain_new_num = length(chainage_new); %Amount of rows in chainage
elev_new_num = length(elevation_new); %Amount of rows in elevation
t_old = table(chainage_old(:,1), elevation_old(:,1), table2array(id_old(:,1)));
t_new = table(chainage_new(:,1), elevation_new(:,1), table2array(id_new(:,1)));
G = findgroups(t_new(:,3));
temp = splitapply(#(varargin) {sortrows(table(varargin{:}),3)}, t_new, G); %order separated groups in terms of chainage
So I currently have the data sorted into groups and now need to plot each individual group and then finally overlay the previous data to corresponding group data.

One approach is to combine all of the data (old and new) into a single table, then find the unique IDs and loop over them. For each ID you can identify the relevant rows and plot them.
Note that the table2array usage in your current code makes your life harder, tables are nice because you can index columns using the column names directly, so this:
table2array(point_file_new(:,3));
becomes this:
point_file_new.Elevation;
Commented code below:
data_old = readtable('7b7B3-2_20170627tp.csv'); %Input older file name here
data_old.Source(:) = {'Old'}; % Add this column so we can track the source later
data_new = readtable('20220430_7b7B3-2tp.csv'); %Input newer file name here
data_new.Source(:) = {'New'}; % Add this column so we can track the source later
data_all = [data_old; data_new]; % combine data into single table
IDs = unique( data_all.Profile_ID ); % Get unique Profile_ID values
NID = numel(IDs); % Number of unique IDs
for ii = 1:NID
ID = IDs{ii}; % Current ID
idxID = ismember( data_all.Profile_ID, ID ); % Rows with this ID
idxOld = strcmp(data_all.Source, 'Old'); % Rows from old data
idxOld = idxOld & idxID; % Rows from old data and this ID
idxNew = strcmp(data_all.Source, 'New'); % Rows from new data
idxNew = idxNew & idxID; % Rows from new data and this ID
figure(); % Make a new figure for this ID
hold on; % hold so we can plot multiple lines
plot( data_all.Elevation(idxOld), data_all.Chainage(idxOld), 'displayname', 'Old data' ); % plot old
plot( data_all.Elevation(idxNew), data_all.Chainage(idxNew), 'displayname', 'New data' ); % plot new
% Add labels/title
xlabel( 'Chainage (m)' );
ylabel( 'Elevation (m)' );
title( ID );
grid on;
hold off; % done plotting
legend('show','location','best');
end

Related

How can I color-label the cluster data after GMM is fitted?

I am trying to do some labelling on cluster data following GMMs but haven't found a way to do it.
Let me explain:
I have some x,y data pairs into a X=30000x2 array. In reality the array contains the data from different sources (known) and each source has the same number of data (So source 1 has 500 (x,y), source 2 500 (x,y) and so on and all of them are appended into the X array above).
I have fitted a GMM on X. Cluster results are fine and as expected but now that the data are clustered I want to be able to color code them based on their initial origin.
So let's say I want to shown in black the data points of source 1 that are in cluster 2.
Is that possible?
Example:
In the original array we have three sources for the data. Source 1 is data from 1-10000, source 2 10001-20000 and source 3 20001-30000.
After GMM fitting and clustering I have clustered my data as per figure 1 and I got two clusters. The red colour in all of them is irrelevant.
I want to modify the color of the data points in cluster 2 based on their index and the original array X.
E.g., if a data point belongs to cluster 2 (clusteridx=2), then I want to check to which source it belongs and then color it and label it accordingly. So that you can tell from which source are the data points in cluster 2 as shown in the second figure.
Original clusters
Desired labelling
You could add a "source_id" column and then plot through a loop on that. For example:
% setup fake data
source1 = rand(10,2);
source2 = rand(15,2);
source3 = rand(8,2);
% end setup
% append column with source_id (you could do this in a loop if you have many sources)
source1 = [source1, repmat(1, length(source1), 1)];
source2 = [source2, repmat(2, length(source2), 1)];
source3 = [source3, repmat(3, length(source3), 1)];
mytable = array2table([source1; source2; source3]);
mytable.Properties.VariableNames = {'X' 'Y' 'source_id'};
figure
hold on;
for ii = 1:max(mytable.source_id)
rows = mytable.source_id==ii;
x = mytable.X(rows);
y = mytable.Y(rows);
label = char(strcat('Source ID =', {' '}, num2str(ii)));
mycolor = rand(1,3);
scatter(x,y, 'MarkerEdgeColor', mycolor, 'MarkerFaceColor', mycolor, 'DisplayName', label);
end
set(legend, 'Location', 'best')

How to store .csv data and calculate average value in MATLAB

Can someone help me to understand how I can save in matlab a group of .csv files, select only the columns in which I am interested and get as output a final file in which I have the average value of the y columns and standard deviation of y axes? I am not so good in matlab and so I kindly ask if someone to help me to solve this question.
Here what I tried to do till now:
clear all;
clc;
which_column = 5;
dirstats = dir('*.csv');
col3Complete=0;
col4Complete=0;
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K)=mean(col4(:));
end
col3Complete(1)=[];
col4Complete(1)=[];
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
m=mean(B,2);
C = reshape (col4Complete,[5000,K]);
S=std(C,0,2);
Now I know that I should compute mean and stdeviation inside for loop, using mean()function, but I am not sure how I can use it.
which_column = 5;
dirstats = dir('*.csv');
col3Complete=[]; % Initialise as empty matrix
col4Complete=[];
avgVal = zeros(length(dirstats),2); % initialise as columnvector
for K = 1:length(dirstats)
[num,txt,raw] = xlsread(dirstats(K).name);
col3=num(:,3);
col4=num(:,4);
col3Complete=[col3Complete;col3];
col4Complete=[col4Complete;col4];
avgVal(K,1)=mean(col4(:)); % 1st column contains mean
avgVal(K,2)=std(col4(:)); % 2nd column contains standard deviation
end
%columnavg = mean(col4Complete);
%columnstd = std(col4Complete);
% xvals = 1 : size(columnavg,1);
% plot(xvals, columnavg, 'b-', xvals, columnavg-columnstd, 'r--', xvals, columnavg+columstd, 'r--');
B = reshape(col4Complete,[5000,K]);
meanVals=mean(B,2);
I didn't change much, just initialised your arrays as empty arrays so you do not have to delete the first entry later on and made avgVal a column vector with the mean in column 1 and the standard deviation in column 1. You can of course add two columns if you want to collect those statistics for your 3rd column in the csv as well.
As a side note: xlsread is rather heavy for reading files, since Excel is horribly inefficient. If you want to read a structured file such as a csv, it's faster to use importdata.
Create some random matrix to store in a file with header:
A = rand(1e3,5);
out = fopen('output.csv','w');
fprintf(out,['ColumnA', '\t', 'ColumnB', '\t', 'ColumnC', '\t', 'ColumnD', '\t', 'ColumnE','\n']);
fclose(out);
dlmwrite('output.csv', A, 'delimiter','\t','-append');
Load it using csvread:
data = csvread('output.csv',1);
data now contains your five columns, without any headers.

Matlab: Removing struct values via an index array

I have a large struct (total) that I need to separate into 3 structs who's values are random selected from the original struct. I need a struct with 60% (trainingData), and two structs that are 20% each (testData & crossValData) but none of the values can overlap.
indexRand1 = zeros((size(total,1)*.2),1); % index of test data: 20%
indexRand2 = zeros((size(total,1)*.2),1); % index of cross validation data: 20%
for index_i = 1:size(indexRand1,1)
temp1 = randi([1 size(total,1)],1,1); % get a random value from total
temp2 = randi([1 size(total,1)],1,1); % get another random value from total
while ismember(temp1,indexRand1) % make sure 1st value is not already in test data
temp1 = randi([1 size(total,1)],1,1);
end
indexRand1(index_i,1) = temp1; % add 1st value to test data
while ismember(temp2,indexRand1) % make sure 2nd value is not already in test data
temp2 = randi([1 size(total,1)],1,1);
while ismember(temp2,indexRand2) % or cross validation data
temp2 = randi([1 size(total,1)],1,1);
end
end
indexRand2(index_i,1) = temp2; % add 2nd value to cross validation data
end
indexRand3 =[indexRand1;indexRand2]; % index of test and cross validation data
testData = total(indexRand1,:); % use index to get test data
crossValData = total(indexRand2,:); % use index to get cross validation data
total(indexRand3,1) = []; % remove test and cross validation data
trainingData = total; % save training data to new name
My problem comes at 'total(indexRand3,1) = []; % remove test and cross validation data' the error I get is 'A null assignment can have only one non-colon index.' How do I remove values from a struct using an index? (or how do you separate a struct randomly into 3 unequal structs?)
I think you are making this problem much harder than it needs to be. A more Matlab-y solution seems to be:
%Make some test data
% (Well, I guess you already have data. I need data to test with)
total(1000).sampleData = 1
%Determine the number of elements in each derived set
nTraining = round(numel(total)*0.6);
nTest = round(numel(total)*0.2);
nCrossVal = numel(total) - nTest - nTraining;
%Create a random order vector
ixsRandomOrder = randperm(numel(total));
%Use the random order vector to create distinct, derived sets
testData = total(ixsRandomOrder( (1:nTest) ));
trainingData = total(ixsRandomOrder(nTest + (1:nTraining) ));
crossValData = total(ixsRandomOrder(nTest + nTraining + (1:nCrossVal) ));

MATLAB: Dividing a year-length varying-resolution time vector into months

I have a time series in the following format:
time data value
733408.33 x1
733409.21 x2
733409.56 x3
etc..
The data runs from approximately 01-Jan-2008 to 31-Dec-2010.
I want to separate the data into columns of monthly length.
For example the first column (January 2008) will comprise of the corresponding data values:
(first 01-Jan-2008 data value):(data value immediately preceding the first 01-Feb-2008 value)
Then the second column (February 2008):
(first 01-Feb-2008 data value):(data value immediately preceding the first 01-Mar-2008 value)
et cetera...
Some ideas I've been thinking of but don't know how to put together:
Convert all serial time numbers (e.g. 733408.33) to character strings with datestr
Use strmatch('01-January-2008',DatesInChars) to find the indices of the rows corresponding to 01-January-2008
Tricky part (?): TransformedData(:,i) = OriginalData(start:end) ? end = strmatch(1) - 1 and start = 1. Then change start at the end of the loop to strmatch(1) and then run step 2 again to find the next "starting index" and change end to the "new" strmatch(1)-1 ?
Having it speed optimized would be nice; I am going to apply it on data sampled ~2 million times.
Thanks!
I would use histc with a list a list of last days of the month as the second parameter (Note: use histc with the two return functions).
The edge list can easily be created with datenum or datevec.
This way you don't have operation on string and you that should be fast.
EDIT:
Example with result in a simple data structure (including some code from #Rody):
% Generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% Generate list of edge
edge = [];
for y = 2008:2010
for m = 1:12
edge = [edge datenum(y, m, 1)];
end
end
% Histogram
[number, bin] = histc(tspan, edge);
% Setup of result
result = {};
for n = 1:length(edge)
result{n} = [tspan(bin == n), data(bin == n)];
end
% Test
% 04-Aug-2008 17:25:20
datestr(result{8}(4,1))
tspan(data == result{8}(4,2))
datestr(tspan(data == result{8}(4,2)))
Assuming you have sorted, non-equally-spaced date numbers, the way to go here is to put the relevant data in a cell array, so that each entry corresponds to the next month, and can hold a different amount of elements.
Here's how to do that quite efficiently:
% generate some test times/data
tstart = datenum('01-Jan-2008');
tend = datenum('31-Dec-2010');
tspan = tstart : tend;
tspan = tspan(:) + randn(size(tspan(:))); % add some noise so it's non-uniform
data = randn(size(tspan));
% find month numbers
[~,M] = datevec(tspan);
% find indices where the month changes
inds = find(diff([0; M]));
% extract data in columns
sz = numel(inds)-1;
cols = cell(sz,1);
for ii = 1:sz-1
cols{ii} = data( inds(ii) : inds(ii+1)-1 );
end
Note that it can be difficult to determine which entry in cols belongs to which month, year, so here's how to do it in a more human-readable way:
% change this line:
[y,M] = datevec(tspan);
% and change these lines:
cols = cell(sz,3);
for ii = 1:sz-1
cols{ii,1} = data( inds(ii) : inds(ii+1)-1 );
% also store the year and month
cols{ii,2} = y(inds(ii));
cols{ii,3} = M(inds(ii));
end
I'll assume you have a timeVals an Nx1 double vector holding the time value of each datum. Assuming data is also an Nx1 array. I also assume data and timeVals are sorted according to time: that is, the samples you have are ordered according to the time they were taken.
How about:
subs = #(x,i) x(:,i);
months = subs( datevec(timeVals), 2 ); % extract the month of year as a number from the time
r = find( months ~= [months(2:end), months(end)+1] );
monthOfCell = months( r );
r( 2:end ) = r( 2:end ) - r( 1:end-1 );
dataByMonth = mat2cell( data', r ); % might need to transpose data or r here...
timeByMonth = mat2cell( timeVal', r );
After running this code, you have a cell array dataByMonth each cell contains all data relevant to a specific month. The corresponding cell of timeByMonth holds the sampling times of the data of the respective month. Finally, monthOfCell tells you what is the month's number (1-12) of each cell.

Plotting multiple lines within a FOR loopin MATLAB

Okay so this sounds easy but no matter how many times I have tried I still cannot get it to plot correctly. I need only 3 lines on the same graph however still have an issue with it.
iO = 2.0e-6;
k = 1.38e-23;
q = 1.602e-19;
for temp_f = [75 100 125]
T = ((5/9)*temp_f-32)+273.15;
vd = -1.0:0.01:0.6;
Id = iO*(exp((q*vd)/(k*T))-1);
plot(vd,Id,'r',vd,Id,'y',vd,Id,'g');
legend('amps at 75 F', 'amps at 100 F','amps at 125 F');
end;
ylabel('Amps');
xlabel('Volts');
title('Current through diode');
Now I know the plot function that is currently in their isn't working and that some kind of variable needs setup like (vd,Id1,'r',vd,Id2,'y',vd,Id3,'g'); however I really can't grasp the concept of changing it and am seeking help.
You can use the "hold on" function to make it so each plot command plots on the same window as the last.
It would be better to skip the for loop and just do this all in one step though.
iO = 2.0e-6;
k = 1.38e-23;
q = 1.602e-19;
temp_f = [75 100 125];
T = ((5/9)*temp_f-32)+273.15;
vd = -1.0:0.01:0.6;
% Convert this 1xlength(vd) vector to a 3xlength(vd) vector by copying it down two rows.
vd = repmat(vd,3,1);
% Convert this 1x3 array to a 3x1 array.
T=T';
% and then copy it accross to length(vd) so each row is all the same value from the original T
T=repmat(T,1,length(vd));
%Now we can calculate Id all at once.
Id = iO*(exp((q*vd)./(k*T))-1);
%Then plot each row of the Id matrix as a seperate line. Id(1,:) means 1st row, all columns.
plot(vd,Id(1,:),'r',vd,Id(2,:),'y',vd,Id(3,:),'g');
ylabel('Amps');
xlabel('Volts');
title('Current through diode');
And that should get what you want.