Separating and splitting table by numeric values and writing to multiple files - matlab

I have a single CSV with data uploaded from multiple IOT sensors (all within the same file), each device's row distinguished by a unique numeric ID. I convert this into a MATLAB table using the readtable() method. I can then organize the file by grouping device ID's separately by using the sortrows() method on the Device ID column.
However, how would I split the grouped devices into seperate tables? Currently I use the following algorithm:
for g = 1:numDevices %create a new file for each device type
outputFileIndoorData1 = sprintf("C:\Users\Documents\duetTestFile%d",g);
for h = 1:height(indoorDataSorted) %Iterate through entire CSV, splitting by device ID
if strcmp(indoorDataSorted.device_id(h),deviceIDs(g,1))
writelines(indoorDataSorted(h,1:17),outputFileIndoorData1); %Write to individual file specified
end
end
end
This is extremely resource intensive, however. What could be a more efficient way separating each devices data into a different file?

You should be able to do this, fairly efficiently, with a couple of lines based on the unique function.
%Setup a small sample data table to work with.
exampleData = cell2table({...
'd1' 1 2; ...
'd1' 3 4; ...
'd1' 5 6; ...
'd1' 7 8; ...
'd2' 9 10; ...
'd2' 11 12; ...
'd3' 13 14; ...
'd3' 15 16; ...
'd3' 17 18; ...
'd3' 19 20; ...
}, 'VariableNames', { ...
'DeviceName', 'data1', 'data2'} );
% The "unique" built-in function is pretty efficient, and
% outputs some useful secondary outputs. We're going to use
% the 3rd argument, that I have names ixs2
[deviceNames, ixs1, ixs2] = unique( exampleData.DeviceName );
%Now, based on the "deviceNames", and "ixs2" output, we can just loop
% through and save output
for ixDevice = 1:length(deviceNames)
curDevice = deviceNames{ixDevice};
curMask = (ixs2 == curDevice);
curData = exampleData(curMask,:);
%Save data here. Save the whole thing at once.
% Name, if needed, is: curDevice
% Datatable is: curData
end
For anyone that is not running a live version of Matlab on the side, the outputs of the unique call in this case are as follows:
%The standard output, a list of unique names
deviceNames =
3×1 cell array
{'d1'}
{'d2'}
{'d3'}
%A set of indexes, which point from the original set into the new set.
% Strictly speaking, this doesn't have to be unique. But the function
% always points to the first one.
ixs1 =
1
5
7
%A set of indexes, which map each element of the original set to the
% unique set from output argument #1. This is often the most
% useful output. This question is a decent example. It can also
% be used as the fist input to an "accumarray" function call, which
% can be incredibly powerful.
ixs2 =
1
1
1
1
2
2
3
3
3
3

Related

How can I convert this table to a cell array as shown in the screenshot?

I am trying to convert a table imported from a CSV file (see below) and convert it to two cell arrays. As shown in the screenshots, Cell 1 contains "measure" of he table, measures of the same ID go in the same cell. Similarly Cell 2 contains "t" in the same way.
Matlab is not my language but I have a function to test out written in Matlab only, so I am really unsure how I could achieve this task.
Let the data be defined as
data = array2table([1 100 1; 1 200 2; 1 300 3; 2 500 3; 2 600 4; 2 700 5; 2 800 6], ...
'VariableNames', {'id' 'measure' 't'}); % example data
You can use findgroups and splitapply as follows:
g = findgroups(data.id); % grouping variable
result_measure = splitapply(#(x){x.'}, data.measure, g).'; % split measure as per g
result_t = splitapply(#(x){x.'}, data.t, g).'; % split t as per g
Alternatively, findgroups for a single grouping variable can be replaced by unique (third output), and splitapply for a single data variable can be replaced by accumarray:
[~, ~, g] = unique(data.id); % grouping variable
result_measure = accumarray(g, data.measure, [], #(x){x.'}).'; % split measure as per g
result_t = accumarray(g, data.t, [], #(x){x.'}).'; % split t as per g
In case someone is interested in my non-Matlab solution to the problem:
array_id=table2array(data(:,'id'));
array_t=table2array(data(:,'t'));
array_measure=table2array(data(:,'measure'));
uni_id=unique(array_id);
t_cell=cell(1,length(uni_id));
measure_cell=cell(1,length(uni_id));
for i=1:length(uni_id)
temp_t=table2array(data(data.id==uni_id(i),'t'));
temp_measure=table2array(data(data.id==uni_id(i),'measure'));
t_cell{i}=temp_t';
measure_cell{i}=temp_measure';
end
Apparently this is nothing comparable to what Luis has, but it gets the job done.

Importing Data from a .txt file into Matlab

First of all thank you for reading my question.
I am trying to import data from a file with following format into Matlab:
#Text
#Text: Number
...
#Text: Number
Set1:
1 2
3 4
Set2:
5 6
7 8
...
I would like to get those numbers into two matrices of the form:
(1 5
3 7)
and
(2 6
4 8)
I started by only building the first of those two matrices.
Winkel = 15;
xp = 30;
M = readtable('Ebene_1.txt')
M([1:4],:) = [];
M(:,3) = [];
for i=0:Winkel-1
A = table2array(M((2+i*31:31+i*31),1))
end
But this solution only gave me cell arrays which I could not transform into normal vectors.
I also tried to use the importdata command, but could not find a way to get this to work either. I know there are many other questions similar to mine, but I could not find one where all the data were in a single column. Also, there are many Matlab-commands for importing data into Matlab and I am not sure which would be the best.
First time asking such a question online so feel free to ask me for more details.
You can import the data you provided in your sample using readtable, however because of the format of your file you will need to tweak the function a bit.
You can use detectImportOptions to tell the function how to import the data.
%Detect import options for your text file.
opts = detectImportOptions('Ebene_1.txt')
%Specify variable names for your table.
opts.VariableNames = {'Text','Number'};
%Ignore last column of your text file as it does not contain data you are interested in.
opts.ExtraColumnsRule = 'ignore';
%You can confirm that the function has successfully identified that the data is numeric by inspecting the VariableTypes property.
%opts.VariableTypes
%Read your text file with detectImportOptions.
M = readtable('Ebene_1.txt',opts)
Now that you have table M, simply apply basic Matlab operations to obtain the matrices as you specified.
%Find numerical values in Text and Number variables. Ignore NaN values.
A = M.Text(~isnan(M.Text));
B = M.Number(~isnan(M.Number));
%Build matrices.
A = [A(1:2:end)';A(2:2:end)']
B = [B(1:2:end)';B(2:2:end)']
Output:
A =
1 5
3 7
B =
2 6
4 8

Read a complex, and long text file in Matlab

I have a very long text file which contains the data from 4 different stations with different time steps:
1:00
station 1
a number 1 (e.g.0.6E-06)
matrix1 (41x36)
station 2
number 2 (e.g.0.1E-06)
matrix2 (41x36)
station 3
number 3 (e.g.0.2E-06)
matrix3 (41x36)
station 4
number 4 (e.g.0.4E-06)
matrix4 (41x36)
2:00
station 1
a number (e.g.0.24E-06)
matrix5 (41x36)
station 2
a number (e.g.0.3E-06)
matrix6 (41x36)
station 3
number (e.g.0.12E-06)
matrix7 (41x36)
station 4
number (e.g.0.14E-06)
matrix8 (41x36)
.....
and so on
I need to read this data by each station and each step, and noted that each matrix should be scaled by multiplying with a number above it. An example is here: https://files.fm/u/sn447ttc#/view/example.txt
Could you please help?
Thank you a lot.
My idea here would be to read the textfile using fopen and textscan. Afterwards you can search for appearances of the Keyword FACTOR to subdivide the output. Here's the code:
fid=fopen('example.txt'); % open the document
dataRaw=textscan(fid,'%s','Delimiter',''); % read the file with no delimiter to achieve a cell array with 1 cell per line of the text file
fclose(fid); % close the document
rows=cellfun(#(x) strfind(x,'FACTOR'),dataRaw,'uni',0); % search for appearances of 'FACTOR'
hasFactor=find(~cellfun(#isempty,rows{1})); % get rownumbers of the lines that contain the word FACTOR
dataRaw=dataRaw{1}; % convert array for easier indexing
for ii=1:(numel(hasFactor)-1) % loop over appearances of the word FACTOR
array=cellfun(#str2num,dataRaw(hasFactor(ii)+2:hasFactor(ii+1)-1),'uni',0); % extract numerical data
output{ii}=str2num(dataRaw{hasFactor(ii)+1})*cat(1,array{:}); % create output scaled by the factor
end
array=cellfun(#str2num,dataRaw(hasFactor(end)+2:end),'uni',0);
output{end+1}=str2num(dataRaw{hasFactor(end)+1})*cat(1,array{:}); % These last 2 lines add the last array to the ouput
outputMat=cat(3,output{:}); % convert to a 3-dimensional matrix
outputStations=[{output(1:4:end)} {output(2:4:end)} {output(3:4:end)} {output(4:4:end)}]; % Sort the output to have 1 cell for each station
outputColumnSums=cellfun(#(x) cellfun(#sum,x,'uni',0),outputStations,'uni',0); % To sum up all the columns of each matrix
outputRowSums=cellfun(#(x) cellfun(#(y) sum(y,2),x,'uni',0),outputStations,'uni',0);
This approach is pretty slow and probably can be vectorized, but if you don't need it to be fast it should do the job. I created a cell-output with 1 cell per array and a 3 dimensional array as optional output. Hope that's fine with you
I have looked into your situation and it seems that the problem not trivial as anticipated. Keep in mind that if I have made mistakes on the assumption of the location of the data, you can let me know so I can edit it, or you can just change the numbers to that which suits your case. In this case, I initially loaded the delimited file into an Excel spreadsheet, just to visualize it.
After reading up on dlmread, I found that one can specify the exact rows and columns to pull from example.txt, as shown here:
data = dlmread('example.txt', ' ', [4 1 45 37]); % [r1 c1 r2 c2]
data2 = dlmread('example.txt', ' ', [47 1 88 37]);
The result of which is two matrices that are 41-by-37, containing only numbers. I started data at row 4 to bypass the header information/strings. Noticing the pattern, I set it up as a loop:
No_of_matrices_expected = 4;
dataCell = cell(No_of_matrices_expected, 1);
iterations = length(dataCell)
% Initial Conditions
rowBeginning = 4;
col1 = 1; % Constant
rowEnd = rowBeginning + 40; % == 44, right before next header information
col2 = 36; % Constant
for n = 1 : iterations
dataCell{n} = dlmread('example.txt', ' ', [rowBeginning, col1, rowEnd, col2]);
rowBeginning = rowBeginning + 41 + 2; % skip previous matrix and skip header info
rowEnd = rowBeginning + 40;
end
However, I stumbled across what you stated earlier which was that there are four different stations, each with their own time stamps. So running this loop more than 4 times led to unexpected results and MATLAB crashed. The reason is that the new timestamp creates an extra row for the date. Now, you could change the loop above to compensate for this extra row, or you can make multiple for loops for each station. This will be your decision to make.
Now if you wanted to save the header information, I would recommend taking a look into textscan. You can simply use this function to pull the first column of all the data into a cell array of strings. Then you can pull out the header information that you want. Keep in mind, use fopen if you want to use textscan.
I'll let you use what I have found thus far, but let me know if you need more help.
Numbers

Matlab matching first column of a row as index and then averaging all columns in that row

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.
File with data format:
(This is the StarData variable)
ID>>>>Values
002141865 3.867144e-03 742.000000 0.001121 16.155089 6.297494 0.001677
002141865 5.429278e-03 1940.000000 0.000477 16.583748 11.945627 0.001622
002141865 4.360715e-03 1897.000000 0.000667 16.863406 13.438383 0.001460
002141865 3.972467e-03 2127.000000 0.000459 16.103060 21.966853 0.001196
002141865 8.542932e-03 2094.000000 0.000421 17.452007 18.067214 0.002490
Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID.
Thanks for any help given.
A modification of this answer does the job, as follows:
[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end
The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.
Compatibility issues for Matlab 2013a onwards:
The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by
[~, ii, jj] = unique(value_sort,'legacy')

for loop in matlab

Hi I have the following code which I believe have indexed wrongly and so Im not getting the answer I am looking for
Diesel_matrix = xlsread('test_diesel.xlsx','Sheet2');
Diesel_supply = Diesel_matrix(:,1); % Power output of diesel generator
hourly_cost = Diesel_matrix(:,2); % Diesel cost of running genreator at that output
for z = 1:21
A = [-PV_supply -WT_supply -Diesel_supply(z)*ones(24,1)];
f = [CRF_PV*CC_PV; CRF_WT*CC_WT; (CRF_Diesel_generator*CC_Diesel)+sum(hourly_cost(1:z))] ;
b = -Demand;
[x,fval,exitflag] = linprog(f,A,b,[],[],lb,ub)
end
I am trying to loop only for the third column of matrix A.
I would like to loop for all the rows in "Diesel_supply" per row of matrix A
at the moment, the code works for 21 sets of x outputs but column 3 is either row 1,2,3 etc up to row 21 of "Diesel_supply". Wheras I am trying to get it for row 1 and 2 and 3 and 4 etc up to row 21 of "Diesel_supply".
This will allow me to examine all the elements in "Diesel_Supply"
Per the conversation #user643469 and I had in chat (see link in comments section) and looking at the documentation for linprog afterwards, I think you need to store the results of each z-iteration in a data structure and then pick the best one after the loop has finished.
As I understand, the generator has 21 different modes you can run it in and it is subject to 24 different constraints. Each mode changes the constaints a little.
Instead of
[x,fval,exitflag] = linprog(f,A,b,[],[],lb,ub)
use
val = linprog(f,A,b,[],[],lb,ub)
results(z) = val;
After the loop has finished, you will be left with a results matrix with the dimensions 4x21 where the first column contains x-values, second contains fval values and third contains exitflag values. You can then you through this 'results' matrix to determine which of the 21 modes you have available to run the generator in.