Related
I am trying to set create pareto graphs based on a dataset from excel. The dataset has three columns "Comment", "part", and "number". The values in comment and number repeat as they are general while the part is independent. As such, I need to group them based on the part.
I've been able to create two pareto graphs. By getting the unique part numbers and counting the number of occurrences of unique comments, I have been able to create a plot of number of comments (y-axis) and part (x-axis). Now the part I've been struggling with is plotting the number of comments (y-axis) by the number (x- axis) for a specified part.
Data = readtable('Example_Dataset.xlsx')
Data = Data{:,:}
part = Data(:,2) %Gets part
number = Data(:,3) %Gets number
comments = Data(:,1) %gets comment
Unique_Part= unique(part,'stable')
b = cellfun(#(x) sum(ismember(part,x)),Unique_Part,'un',0)
Unique_number = unique(number,'stable')
c = cellfun(#(x) sum(ismember(number,x)),Unique_number,'un',0)
Unique_comments = unique(comments,'stable')
comment_type =cell2mat(Unique_comments)
comments_parts = cell2mat(b)
comments_number = cell2mat(c)
figure
pareto(comments_parts,Unique_part)
figure
pareto(comments_number,Unique_number)
A simplified dataset is shown here. It should be noted that they are not equal sizes, some repeat only once others repeat numberous times. And sometimes the part is not numeric.
https://imgur.com/a/V3MxeTD
Problem
I have a data set of describing geological structures. Each structure has a row with two attributes - its length and orientation (0-360 degrees).
Within this data set, there are two types of structure.
Type 1: less data points, but the structures are physically larger (large length, and so more significant).
Type 2: more data points, but the structures are physically smaller (small length, and so less significant).
I want to create a rose plot to show the spread of the structures' orientations. However, I want this plot to also represent the significance of the structures in combination with the direction they face - taking into account the lengths.
Is it possible to scale this by length in MATLAB somehow so that the subset which is less numerous is not under represented, when the structures are large?
Example
A data set might contain:
10 structures orientated North-South, 50km long.
100 structures orientated East-West, 0.5km long.
In this situation the East-West population would look to be more significant than the North-South population based on absolute numbers. However, in reality the length of the members contributing to this population are much smaller and so the structures are less significant.
Code
This is the code I have so far:
load('WG_rose_data.xy')
azimuth = WG_rose_data(:,2);
length = WG_rose_data(:,1);
rose(azimuth,20);
Where WG_rose_data.xy is a data file with 2 columns containing the length and azimuth (orientation) data for the geological structures.
For each row in your data, you could duplicate it a given number of times, according to its length value. Therefore, if you had a structure with length 50, it counts for 50 data points, whereas a structure with length 1 only counts as 1 data point. Of course you have to round your lengths since you can only have integer numbers of rows.
This could be achieved like so, with your example data in the matrix d
% Set up example data: 10 large vertical structures, 100 small ones perpendicular
d = [repmat([0, 50], 10, 1); repmat([90, .5], 100, 1)];
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for ii = 1:size(d,1)
% make d(ii,2) = length copies of d(ii,1) = orientation
d1(end+1:end+ceil(d(ii,2))) = d(ii,1);
end
Output rose plot:
You could fine tune how to duplicate the data to achieve the desired balance of actual data and length weighting.
Thanks for all the help with this. This code is my final working version for reference:
clear all
close all
% Input dataset
original_data = load('WG_rose_data.xy');
d = [];
%reformat azimuth
d(:,1)= original_data(:,2);
%reformat length
d(:,2)= original_data(:,1);
% For each row, duplicate the data in column 1, according to the length in column 2
d1 = [];
for a = 1:size(d,1)
d1(end+1:end+ceil(d(a,2))) = d(a,1);
end
%create oposite directions for rose diagram
length_d1_azi = length(d1);
d1_op_azi=zeros(1,length_d1_azi);
for i = 1:length_d1_azi
d1_op_azi(i)=d1(i)-180;
if d1_op_azi(i) < 1;
d1_op_azi(i) = 360 - (d1_op_azi(i)*-1);
end
end
%join calculated oposites to original input
new_length = length_d1_azi*2;
all=zeros(new_length,1);
for i = 1:length_d1_azi
all(i)=d1(i);
end
for j = length_d1_azi+1:new_length;
all(j)=d1_op_azi(j-length_d1_azi);
end
%convert input aray into radians to plot
d1_rad=degtorad(all);
rose(d1_rad,24)
set(gca,'View',[-90 90],'YDir','reverse');
I ve got an ascii file and im trying to import it to matlab in order to make some plots. Is there any way of importing those data, even tho they contain , (comma) rather than . (dot)?
00:00:00,000;-2,14;
00:00:00,001;-1,80;
Well the first column which I want to create is referred to the time and its corresponding to 00:00:00,001; 00:00:00,002; etc.
The second column should be the amplitude of the sample i.e. -2,14; -1,80 etc.
Yup. First use importdata so that you can read each row of your text file as a cell in a cell array. After, to allow for the processing of your times to be performed in MATLAB, you'll need to replace each , character with a .. This will allow you to use MATLAB's commands for date and time processing. Specifically, use regular expressions to help you do this. Regular expressions help you find patterns in strings. We can use these patterns to help extract out the data you need. Use regexprep to replace all , characters with a ..
For the purposes of this answer, the example data that I'm going to be using is:
00:00:00,000;-2,14;
00:00:00,001;-1,80;
00:00:00,002;-0,80;
00:00:00,003;2,40;
00:00:00,004;3,78;
Therefore, assuming that your data is stored in a text file called data.txt, do:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Now, what we can do is split up all of the quantities by using ; as the delimiter. We can use regexp to help us split up the quantities. We can further decompose the data by:
Arep_decomp = regexp(Arep, '[^;]+', 'match');
The first parameter is the cell array that contains each of our rows in the text file (with the commas converted to periods). The second parameter is a pattern that specifies what exactly you're trying to look for in each string in the cell array. [^;]+ means that you want to find all strings that consist of a bunch of characters excluding until we hit a semi-colon. Once we hit the semi-colon, we stop. 'match' means that you want to retrieve the actual strings which will be stored as cell arrays.
The result after the above line's execution gives:
Arep_decomp{1}{1} =
00:00:00.000
Arep_decomp{1}{2} =
-2.14
Arep_decomp{2}{1} =
00:00:00.001
Arep_decomp{2}{2} =
-1.80
Arep_decomp{3}{1} =
00:00:00.002
Arep_decomp{3}{2} =
-0.80
Arep_decomp{4}{1} =
00:00:00.003
Arep_decomp{4}{2} =
2.40
Arep_decomp{5}{1} =
00:00:00.004
Arep_decomp{5}{2} =
3.78
You can see that the output cell array, Arep_decomp is a 5 element cell array, where each cell is a nested 2 element cell array, where the first element is the time, and the second element is the magnitude. Note that these are all strings.
What you can do now is create two numeric arrays that will convert these quantities into numeric representations. Specifically, the time format that you have looks like the form:
HH:MM:SS.FFF
H is for hours, M is for minutes, S is for seconds and F is for microseconds. Use datenum to allow you to convert these time representations into actual date numbers. You would do this so that you can plot these on a graph, but then you perhaps want to display these times on the plot as well. That can easily be done by manipulating some plot functions. Nevertheless, use cellfun so that we can extract out the time strings as a separate array so we can use this for plotting later, and also use this to convert the time strings into date numbers via datenum, and convert the magnitude numbers into actual numbers.
Therefore:
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
The first line of code extracts out each of the time strings as a single cell array - the uni=0 flag is important to do this. Next, we convert each time string into a date number, and we convert the magnitude strings into physical numbers by str2double.
Now, all you have to do is plot the data. That can be done by:
plot(datenums, mags);
set(gca, 'XTick', datenums);
set(gca, 'XTickLabel', datestr);
The above code plots the data where the date numbers are on the horizontal axis, the magnitude numbers are on the vertical axis, but we will probably want to rename the horizontal axis to be those time strings that you wanted. Therefore, we use to calls to set to ensure that the only ticks that are visible are from the date numbers themselves, and we relabel the date numbers so that they are the string representations of the times themselves.
Once we run the above code, we get:
Because the time step in between times is so small, it may clutter the horizontal axis as the labels are long, yet the interval is short. Therefore, you may consider only displaying times at a certain interval and you can do that by doing something like:
step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));
step_size controls how many ticks and labels appear in succession. Obviously, you need to make sure that step_size is smaller than the total number of points in your data.
For your copying and pasting pleasure, this is what the full code I wrote looks like:
%// Load in each row as a cell array
A = importdata('data.txt');
%// Each row has , replaced with .
Arep = regexprep(A, ',', '\.');
Arep_decomp = regexp(Arep, '[^;]+', 'match');
datestr = cellfun(#(x) x{1}, Arep_decomp, 'uni', 0);
datenums = cellfun(#(x) datenum(x, 'HH:MM:SS.FFF'), datestr);
mags = cellfun(#(x) str2double(x{2}), Arep_decomp);
step_size = 1;
%step_size = 5;
plot(datenums, mags);
set(gca, 'XTick', datenums(1:step_size:end));
set(gca, 'XTickLabel', datestr(1:step_size:end));
I have a matrix, X, in which I want to plot it using the kmeans function. What I would like: If row has a value of 1 in column 4 I would like it to be square shaped If the row has a value of 2 in column 4 I would like it + shaped BUT If the row has a value of 0 in column 5 it must be blue and if the row has a vale of 1 in column 5 it must be yellow
(You don't need to use these exact colors and shapes, I just want to distinguish these.) I tried this and it did not work:
plot(X(idx==2,1),X(idx==2,2),X(:,4)==1,'k.');
Thanks!!
Based on the example on the kmeans documentation page I propose this "nested" logic:
X = [randn(100,2)+ones(100,2);...
randn(100,2)-ones(100,2)];
opts = statset('Display','final');
% This gives a random distribution of 0s and 1s in column 5:
X(:,5) = round(rand(size(X,1),1));
[idx,ctrs] = kmeans(X,2,...
'Distance','city',...
'Replicates',5,...
'Options',opts);
hold on
plot(X(idx==1,1),X(idx==1,2),'rs','MarkerSize',12)
plot(X(idx==2,1),X(idx==2,2),'r+','MarkerSize',12)
% after plotting the results of kmeans,
% plot new symbols with a different logic on top:
plot(X(X(idx==1,5)==0,1),X(X(idx==1,5)==0,2),'bs','MarkerSize',12)
plot(X(X(idx==1,5)==1,1),X(X(idx==1,5)==1,2),'gs','MarkerSize',12)
plot(X(X(idx==2,5)==0,1),X(X(idx==2,5)==0,2),'b+','MarkerSize',12)
plot(X(X(idx==2,5)==1,1),X(X(idx==2,5)==1,2),'g+','MarkerSize',12)
The above code is a minimal working example, given that the statistics toolbox is available.
The key feature is the nested logic for the plotting. For example:
X(X(idx==1,5)==0,1)
The inner X(idx==1,5) selects those values of X(:,5) for which idx==1. From those, only values which are 0 are considered: X(X(...)==0,1). Based on the logic in the question, this should be a blue square: bs.
You have four cases, hence there are four additional plot lines.
I have a data file in which I take all of the numbers from 2 columns. The 2 columns of data, which I call 'a' and 'r', have an initial value stored in them which I would like to discount. It also has several runs in the the data, so the initialization is repeated about 15 times. Is there a way that I can graph the log of the points that correspond to a being greater than the initial value (in this case 4) and its corresponding r value? It would be along the lines of
fia = fopen('data.txt');
A = fscanf(fia, '%f %f, [2,inf]);
a=A(1,:);
r=A(2;:);
plot(log(a(4:end)),log(r)); % I know this won't work, as it breaks down the matrix, but something like this.