Aggregating second-by-second sampling interval to 30 sec interval, POSIXct - aggregate

New to [R]studio and respectfully requesting help.
Goal: I'd like to take data collected at 1 second intervals, collapse it to 30 sec intervals, and, subsequently, have the "mean" of each variable associated with it.
Here is what my data looks like:
line datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
What I'd like to see is this:
line datetime AA BB CC
1 2016-06-27 14:13:16 18 1 50.5
2 2016-06-27 14:13:46 19 1 52.8
(here, variables AA, BB, and CC were averaged).
There have been questions similar to this, but none that were similar enough to give me a foundation to work on with my little coding and programming knowledge. I've been pacing back and forth between probable base r solutions and probable package solutions to no avail; mainly because the language/syntax implementation is still a bit foreign to me.

I think you want to try this: (base solution)
etw
datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
aggregate(x = etw, by = list(cut(etw$datetime,breaks = "10 sec")), FUN=mean )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:16 2016-06-27 14:13:20 7.8 0.3 48.22
2 2016-06-27 14:13:26 2016-06-27 14:13:26 4.0 1.0 53.40
you can change the 10 sec part to 30 sec. however - take care: breaks = "10 sec" will cut the range into 10 sec slices starting with the minimum time. which in your case result in a single slice.
you can also manually define the range using
breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))
aggregate(x = etw,FUN=mean, by = list(cut(etw$datetime,breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))) )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:10 2016-06-27 14:13:17 9.000000 0.0000000 38.75000
2 2016-06-27 14:13:20 2016-06-27 14:13:23 6.571429 0.5714286 54.37143
this is not exactly what you wanted to get but imho - your sample data does not correspond to the desired output :)

Related

Adding 0 for missing data rather than excluding the category in matlab

I have the two following tables of data, one named data1, the other named data2. The left-hand column is a categorical variable and the right hand column is frequency I would like to rewrite these tables but where there are missing categories in the left-hand column I would like it to put in the correct missing category and then put a '0' in the right-hand frequency column.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
9 8
10 2
11 1
14 1
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
12 1
19 1
];
To be clearer I will explain with an example. In data1, 8 12 and 13 are missing in the left-hand column. I would like matlab to recreate this table but with 0 values for 8, 12 and 13 so it looks as follows. I would also like it to have additional empty categories after '14' because data2 is longer and has more categories. I have also included what data2 should look like with filled in values.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
8 0
9 8
10 2
11 1
12 0
13 0
14 1
15 0
16 0
17 0
18 0
19 0
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
11 0
12 1
13 0
14 0
15 0
16 0
17 0
18 0
19 1
];
I have a handful of datasets which generally all start with 1,2,3,4,5...etc but then they all have slightly different categories on the left-hand column, because where values are missing it just omits the category rather than putting 0. How do i write a code so that it automatically fills in any blanks with a 0. It would be good if the code could identify what the 'highest' number of categories is amongst all the datasets and then fill in blanks based on this.
my aim is to put together a grouped bar chart with data series that are all the same length.
UPDATED OUTPUT WITH 3 DATASETS
this is what your AllJoins code outputs in my matlab:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 0
4 40 60 0
5 30 50 0
6 20 40 0
7 10 30 0
8 0 20 0
9 8 8 0
10 2 2 0
11 1 0 0
12 0 1 0
14 1 0 0
19 0 1 0
20 0 0 1800
I would like the code to fill in the missing consecutive numbers in column A so that it looks as follows:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 0
4 40 60 0
5 30 50 0
6 20 40 0
7 10 30 0
8 0 20 0
9 8 8 0
10 2 2 0
11 1 0 0
12 0 1 0
13 0 0 0
14 1 0 0
15 0 0 0
16 0 0 0
17 0 0 0
18 0 0 0
19 0 1 0
20 0 0 1800
You can convert the datasets to a table and then use outerjoin. Then you can replace the NaNs with whatever you want using fillmissing.
table1 = array2table(data1);
table1.Properties.VariableNames = {'A', 'B'};
table2 = array2table(data2);
table2.Properties.VariableNames = {'A', 'B'};
newTable = outerjoin(table1, table2, 'LeftKeys', {'A'}, 'RightKeys', {'A'}, 'MergeKeys', true)
which produces:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 NaN 20
9 8 8
10 2 2
11 1 NaN
12 NaN 1
14 1 NaN
19 NaN 1
And then get your zeros with newTable2 = fillmissing(newTable, 'constant', 0), which prints:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 0 20
9 8 8
10 2 2
11 1 0
12 0 1
14 1 0
19 0 1
UPDATE
To combine multiple tables, you can either nest the outerjoin or write a function to loop over it (see similar Matlab forum question). Here's an example.
Given data1 and data2 in OP, plus a new data3:
data3 = [
1 2400
2 2000
20 1800
];
Contents of myscript.m:
table1 = MakeTable(data1);
table2 = MakeTable(data2);
table3 = MakeTable(data3);
AllJoins = MultiOuterJoin(table1, table2, table3);
% Functions
function Table = MakeTable(Array)
Table = array2table(Array);
Table.Properties.VariableNames = {'A', 'B'}; % set your column names, e.g. {'freq', 'count'}
end
function Joined = MultiOuterJoin(varargin)
Joined = varargin{1};
Joined.Properties.VariableNames{end} = inputname(1); % set #2 column name to be based on table name
for k = 2:nargin
Joined = outerjoin(Joined, varargin{k}, 'LeftKeys', {'A'}, 'RightKeys', {'A'}, 'MergeKeys', true);
name = inputname(k);
Joined.Properties.VariableNames{end} = name; % set merged column name to be based on table name
end
end
Which returns AllJoins:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 NaN
4 40 60 NaN
5 30 50 NaN
6 20 40 NaN
7 10 30 NaN
8 0 20 NaN
9 8 8 NaN
10 2 2 NaN
11 1 0 NaN
12 0 1 NaN
13 0 0 NaN
14 1 0 NaN
15 0 0 NaN
16 0 0 NaN
17 0 0 NaN
18 0 0 NaN
19 0 1 NaN
20 NaN NaN 1800
Feel free to change the maximum length of the array, this is a generic answer. The maximum length is max(data1(:,1)), but you can compute this in any way, e.g. the maximum value of multiple arrays.
% make new data
new_data1=zeros(max(data1(:,1),2));
new_data(:,1)=1:max(data1(:,1));
% Fill data. You can do this in a loop if its easier for you to understand.
% in essence, it says: in all the data1(:,1) indices of new_data's second column, put data1(:,2)
new_data(data1(:,1),2)=data1(:,2);

Assign new matrices for a certain condition, in Matlab

I have a matrix,DataFile=8x8. One of those columns(column 6 or "coarse event") can only be 0 or a 1. It will be 0 for a non-stable condition and 1 for a stable condition.Now for the example:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
With slight changes from the code in the comments:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))];
startInd = [find(diff(DataFile(:,6))==1)];
endInd = [find(diff(DataFile(:,6)) <0)];
B={};
for n=1:1:numel(endInd)
B(n)={DataFile(startInd(n):endInd(n),:)};
end
FirstBlock=B{1};
SecondBlock=B{2};
The result is 2 matrices(FirstBlock=3x8,SecondBlock=3x8), which wrongfully includes 0's in the 6th column. It should be giving two matrices(dataIs1(1)=2x8 and dataIs1(2)=2x8), with only 1's in the 6th column.
In reality I would like have the a n-amount of matrices, for which the "coarse event" is 1. Thank you for the help!
The magic word is logical indexing:
If you have a Matrix A:
A=[1 2 3 4 5;...
0 6 7 8 9;...
1 7 8 9 10]
you can extact Row 1 and 2 by:
B=A(A(:,1)==1)
Hope thats waht your looking for, have fun.
To seperate the groups we need to know where they start and end:
endInd = [find(diff(A(:,1))<0) size(A,1)]
startInd = [1 find(diff(A(:,1))==1)]
Then assigne the Data to arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
Edit:
Your new Data:
DataFile = [ 11 5 66 1.2 14.1 0 -1 0.1;...
12 6 67 1.4 15.1 0 -1 0.1;...
13 7 68 1.6 16.1 1 -1 0.2;...
14 8 69 1.7 16.5 1 -2 0.1;...
15 9 68 1.6 16.2 0 -1 0.3;...
16 8 66 1.3 15.7 1 -2 0.0;...
17 5 65 1.5 16.1 1 0 0.0;...
18 6 66 1.2 16.6 0 1 1.0];
I add some padding to avoid mistakes:
DataFile =[zeros(1,size(DataFile,2)); DataFile; zeros(1,size(DataFile,2))]
Now, as before, we look for the starts and ends of the blocks:
endInd = [find(diff(A(:,1)) <0) -1]
startInd = [find(diff(A(:,1))==1)]
Then assigne the Data to a cell in a arrays:
B={};
for n=1:1:numel(endInd)
B(n)={A(startInd(n):endInd(n),:)};
end
If you want to retrive, say, the second block:
secondBlock=B{2};

How to calculate intensity inhomogeneity based on average filter by matlab

I have a question about intensity inhomogeneity. I read a paper, it defined a way to calculate the intensity inhomogeneity based on average filter:
Let see my problem, I have a image I (below code) and a average filter with r=3. I want to calculate image transformation J based on formula (17). Could you help me to implement it by matlab code? Thank you so much.
This is my code
%Create image I
I=[3 5 5 2 0 0 6 13 1
0 3 7 5 0 0 2 8 6
4 5 5 4 2 1 3 5 9
17 10 3 1 3 7 9 9 0
7 25 0 0 5 0 10 13 2
111 105 25 19 13 11 11 8 0
103 105 15 26 0 12 2 6 0
234 238 144 140 51 44 7 8 8
231 227 150 146 43 50 8 16 9
];
%% Create filter AF
size=3; % scale parameter in Average kernel
AF=fspecial('average',[size,size]); % Average kernel
%%How to calculate CN and J
CN=mean(I(:));%Correct?
J=???
You're pretty close! The mean intensity is calculated correctly; all you are missing to calculate J is apply the filter defined with fspecial to your image:
Here is the code:
clc
clear
%Create image I
I=[3 5 5 2 0 0 6 13 1
0 3 7 5 0 0 2 8 6
4 5 5 4 2 1 3 5 9
17 10 3 1 3 7 9 9 0
7 25 0 0 5 0 10 13 2
111 105 25 19 13 11 11 8 0
103 105 15 26 0 12 2 6 0
234 238 144 140 51 44 7 8 8
231 227 150 146 43 50 8 16 9
];
% Create filter AF
size=3; % scale parameter in Average kernel
AF=fspecial('average',[size,size]); % Average kernel
%%How to calculate CN and J
CN=mean(I(:)); % This is correct
J = (CN*I)./imfilter(I,AF); % Apply the filter to the image
figure;
subplot(1,2,1)
image(I)
subplot(1,2,2)
image(J)
Resulting in the following:

comparing lines and columns for zero data

I have a file containing the following data.
File1:
Server counter 1:00 2:00 3:00 4:00
site1 serverdowntime 15 0 3 500
site1 serverdowntimesuc 15 0 3 500
...
site12 serverdowntime 2 7 8 5
site12 serverdowntimesuc 2 7 8 5
...
site50 serverdowntime 2 12 8 45
site50 serverdowntimesuc 2 0 0 45
...
site57 serverdowntime 2 12 8 45
site57 serverdowntimesuc 2 0 0 0
Each 2 lines are for the same site. First colum is equipment, second is problem and the third could contain as many columns for the amount of hours pulled. Im looking for a way to look under the time data and find each two lines that contain only single zeros.
Output after parsing data:
site57 serverdowntime 2 12 8 45
site57 serverdowntimesuc 2 0 0 0
site1 serverdowntime 15 0 3 500
site1 serverdowntimesuc 15 0 3 500
site50 serverdowntime 2 12 8 45
site50 serverdowntimesuc 2 0 0 45
$ awk 'NR==1{next} !(NR%2){line1=$0;next} {$0=line1"\n"$0} /\<0\>/' file
site1 serverdowntime 15 0 3 500
site1 serverdowntimesuc 15 0 3 500
site50 serverdowntime 2 12 8 45
site50 serverdowntimesuc 2 0 0 45
site57 serverdowntime 2 12 8 45
site57 serverdowntimesuc 2 0 0 0
This might work for you (GNU sed):
sed -r '$!N;/^(\S+)\s.*\n\1/!D;/(^|\n)(\S+\s+){2}[^\n]*\s0(\s+|\n|$)/p;d' file
This gets a pair of lines with the first field as key and then searches for a 0 pattern in the 3rd onwards fields.
perl -ne '($k)=/^(\w+)/; if (/\b0\b/){ print $v{$k}, $_ }else{ $v{$k}=$_ }' file

Extracting portions of matrix into cell array

I have a pretty large matrix M and I am only interested in a few of the columns. I have a boolean vector V where a value of 1 represents a column that is of interest. Example:
-1 -1 -1 7 7 -1 -1 -1 7 7 7
M = -1 -1 7 7 7 -1 -1 7 7 7 7
-1 -1 7 7 7 -1 -1 -1 7 7 -1
V = 0 0 1 1 1 0 0 1 1 1 1
If multiple adjacent values of V are all 1, then I want the corresponding columns of M to be extracted into another matrix. Here's an example, using the matrices from before.
-1 7 7 -1 7 7 7
M1 = 7 7 7 M2 = 7 7 7 7
7 7 7 -1 7 7 -1
How might I do this efficiently? I would like all these portions of the matrix M to be stored in a cell array, or at least have an efficient way to generate them one after the other. Currently I'm doing this in a while loop and it is not as efficient as I'd like it to be.
(Note that my examples only include the values -1 and 7 just for clarity; this isn't the actual data I use.)
You can utilize the diff function for this, to break your V vector into blocks
% find where block differences exist
diffs = diff(V);
% move start index one value forward, as first value in
% diff represents diff between first and second in original vector
startPoints = find(diffs == 1) + 1;
endPoints = find(diffs == -1);
% if the first block begins with the first element diff won't have
% found start
if V(1) == 1
startPoints = [1 startPoints];
end
% if last block lasts until the end of the array, diff won't have found end
if length(startPoints) > length(endPoints)
endPoints(end+1) = length(V);
end
% subset original matrix into cell array with indices
results = cell(size(startPoints));
for c = 1:length(results)
results{c} = M(:,startPoints(c):endPoints(c));
end
The one thing I'm not sure of is if there's a better way to find the being_indices and end_indices.
Code:
X = [1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20
1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20];
V = logical([ 1 1 0 0 1 1 1 0 1 1]);
find_indices = find(V);
begin_indices = [find_indices(1) find_indices(find(diff(find_indices) ~= 1)+1)];
end_indices = [find_indices(find(diff(find_indices) ~= 1)) find_indices(end)];
X_truncated = mat2cell(X(:,V),size(X,1),[end_indices-begin_indices]+1);
X_truncated{:}
Output:
ans =
1 2
6 7
11 12
16 17
1 2
6 7
11 12
16 17
ans =
5 1 2
10 6 7
15 11 12
20 16 17
5 1 2
10 6 7
15 11 12
20 16 17
ans =
4 5
9 10
14 15
19 20
4 5
9 10
14 15
19 20