MATLAB loop over unique function output - matlab

following problem:
I have a very large matrix and several rows share the same identifier in column 1. For these rows I need to do some averaging, reformatting etc.
Currently I am identifying all unique identifier values in column 1 by using the function unique and then do averaging, reformatting of values in other columns within this loop for each set of rows sharing the same column 1 value within a loop.
ID = unique(data.1);
for i = 1:length(ID);
do stuff
end
I guess this is highly inefficient and slow but I cannot think of a better way of handling this.

Related

How to get a collective output of multiple loop run using a selection condition in Matlab?

I have a table (L-arrival) of 279 rows and 252 columns. Only the first column has values while others are just NaN. The cells in the first column have multiple values (i.e. some have 1, some have 4 number of values). First of all, I am trying to select a single maximum value from each cell of the first column so that I can have a column of a single value for each cell only. Then I want to do this in a loop so that for every new value that I get, they are sorted and only the maximum values are chosen. Finally, I want to make a collection of these values obtained from multiple runs for each cell. Can anyone suggest to me how it can be approached in MatLab?I tried using the following code but didn't work well.
for b=1:279
m = numel(cell2mat(L_arrival(b,1)));
g(b)=mat2cell([cell2mat(g(b)); cell(L_arrival(b,1))]',[1 2]);
end

Splitting a table in MATLAB by Sensor ID

I have a large table in MATLAB which contains over 1000 rows of data, in two columns. Column 1 is the ID of the sensor which gathered the data, and column two is the data itself (in this case a voltage).
I have been able to sort my table to gather all the data for sensors together. So, all the data from Sensor 1 is in rows 1 to 100, the data for Sensor 2 is in rows 101 to 179, the data for Sensor 3 is in rows 180 to 310, and so on. In other words, the number of rows which contain data for a given sensor is never the same.
Now, I want to split this main table into separate tables for each sensor ID, and I am having trouble figuring out a way to do it. I imagine I could do it with a loop, where my I cycle through the various IDs, but that doesn't seem like a very, MATLAB way of doing it.
What would be an efficient way to complete this task? Or would a loop really be the only way?
I have attached a small screenshot of some of my data.
The screenshot you shared shows a 1244x1 structure array with 2 fields but the question describes a table. You could convert the structure array to a table using,
T = struct2table(S); % Assuming S is the name of your structure
Whether the variable is a structure or table, it's better to not separate the variable and to use indexing instead. For example, assuming the variable is a table, you can compute the mean voltage for sensor1 using,
mean(T.reported_voltage(strcmp(T.sensor_id,'Sensor1')))
and you could report the mean of all groups using,
groupsummary(T,'sensor_id', 'mean')
or
splitapply(#mean,T.reported_voltage,findgroups(T.sensor_id))
But if you absolutely must break apart and tidy, well-organized table, you can do so by splitting the table into sub-tables stored within a cell array using,
unqSensorID = unique(T.sensor_id);
C = arrayfun(#(id){T(strcmp(T.sensor_id, id),:)},unqSensorID)
In this case the for loop is fine because (I guess) there aren't that many different sensors and your code will likely spend most of its time processing the data anyway - the loop won't give you a significant overhead.
Assuming your table is called t, the following should do what you want.
unique_sensors = unique(t.sensor_id)
for i = 1:length(unique_sensors)
sensor_data = t(t.sensor_id == unique_sensors(i), :);
% save or do some processing on this data
end

When to use a cell, matrix, or table in Matlab

I am fairly new to matlab and I am trying to figure out when it is best to use cells, tables, or matrixes to store sets of data and then work with the data.
What I want is to store data that has multiple lines that include strings and numbers and then want to work with the numbers.
For example a line would look like
'string 1' , time, number1, number 2
. I know a matrix works best if al elements are numbers, but when I use a cell I keep having to convert the numbers or strings to a matrix in order to work with them. I am running matlab 2012 so maybe that is a part of the problem. Any help is appreciated. Thanks!
Use a matrix when :
the tabular data has a uniform type (all are floating points like double, or integers like int32);
& either the amount of data is small, or is big and has static (predefined) size;
& you care about the speed of accessing data, or you need matrix operations performed on data, or some function requires the data organized as such.
Use a cell array when:
the tabular data has heterogeneous type (mixed element types, "jagged" arrays etc.);
| there's a lot of data and has dynamic size;
| you need only indexing the data numerically (no algebraic operations);
| a function requires the data as such.
Same argument for structs, only the indexing is by name, not by number.
Not sure about tables, I don't think is offered by the language itself; might be an UDT that I don't know of...
Later edit
These three types may be combined, in the sense that cell arrays and structs may have matrices and cell arrays and structs as elements (because thy're heterogeneous containers). In your case, you might have 2 approaches, depending on how you need to access the data:
if you access the data mostly by row, then an array of N structs (one struct per row) with 4 fields (one field per column) would be the most effective in terms of performance;
if you access the data mostly by column, then a single struct with 4 fields (one field per column) would do; first field would be a cell array of strings for the first column, second field would be a cell array of strings or a 1D matrix of doubles depending on how you want to store you dates, the rest of the fields are 1D matrices of doubles.
Concerning tables: I always used matrices or cell arrays until I
had to do database related things such as joining datasets by a unique key; the only way I found to do this in was by using tables. It takes a while to get used to them and it's a bit annoying that some functions that work on cell arrays don't work on tables vice versa. MATLAB could have done a better job explaining when to use one or the other because it's not super clear from the documentation.
The situation that you describe, seems to be as follows:
You have several columns. Entire columns consist of 1 datatype each, and all columns have an equal number of rows.
This seems to match exactly with the recommended situation for using a [table][1]
T = table(var1,...,varN) creates a table from the input variables,
var1,...,varN . Variables can be of different sizes and data types,
but all variables must have the same number of rows.
Actually I don't have much experience with tables, but if you can't figure it out you can always switch to using 1 cell array for the first column, and a matrix for all others (in your example).

Extract a specific row from a combination matrix

Suppose I have 121 elements and want to get all combinations of 4 elements taken at a time, i.e. 121c4.
Since combnk(1:121, 4) takes a lot of time, I want to go for 2% of that combination by providing:
z = 1:50:length(121c4(:, 1))
For example: 1st row, 5th row, 100th row and so on, up to 121c4, picking only those rows from a 121c4 matrix without generating the complete combination (it's consuming too much for large numbers like 625c4).
If you haven't defined an ordering on the combinations, why not just use
randi(121,p,4)
where p is the number of combinations you want in your set ? With this approach you may, or may not, want to replace duplicates.
If you have defined an ordering on the combinations, tell us what it is.

Matlab: Code Performance Issue Using "ismember"

I need this section of my code to run faster, as it is called many many times. I am new to Matlab and I feel as though there MUST be a way to do this that is not so round-about. Any help you could give on how to improve the speed of what I have or other functions to look into that would help me perform this task would be appreciated.
(Task is to get only lines of "alldata" where the first column is in the set of "minuteintervals" into "alldataMinutes". "minuteintervals" is just the minimum value of "alldata" column one increasing by twenty to the maximum of alldata.
minuteintervals= min(alldata(:,1)):20:max(alldata(:,1)); %20 second intervals
alldataMinutes= zeros(30000,4);
counter=1;
for x=1:length(alldata)
if ismember(alldata(x,1), minuteintervals)
alldataMinutes(counter,:)= alldata(x,:);
counter= counter+1;
end
end
alldataMinutes(counter:length(alldataMinutes),:)= [];
This should give you what you want, and it should be substantially faster:
minuteintervals = min(alldata(:,1)):20:max(alldata(:,1)); %# Interval set
index = ismember(alldata(:,1),minuteintervals); %# Logical index showing first
%# column values in the set
alldataMinutes = alldata(index,:); %# Extract the corresponding rows
This works by passing a vector of values to the function ISMEMBER, instead of passing values one at a time. The output index is a logical vector the same size as alldata(:,1), with a value of 1 (i.e. true) for elements of alldata(:,1) that are in the set minuteintervals, and a value of 0 (i.e. false) otherwise. You can then use logical indexing to easily extract the rows corresponding to the ones in index, placing them in alldataMinutes.