I have two timetables, each of them have 4 columns, where the first 2 columns are of my particular interest. The first column is a date and the second is an hour.
How can I know which observations (by date an hour) are in the timetable 1 but not in the timetable 2 and, therefore, drop those observations from my timetable 1?
So for example, just by looking I realized that timetable1 included the day 25/05/2015 with hours 1 and 2, but the timetable 2 did not include them, therefore I would like to drop those observations from timetable 1.
I tried using the command groups_timetable1 = findgroups(timetable1.Date,timetable1.Hour);but unfortunately this command does not tell you a lot how to distinguish between observations.
Thank you!
call ismember to find one set of data in another.
to find multiple records as a group in another composite records, you call ismember(..., 'rows').
for example
baseline=[
100, 2.1
200, 7.5
120, 11.0
];
isin=ismember(baseline,[200, 7.5],'rows');
pos=find(isin)
if you have time date strings or datetime objects, please convert those to numerical values, such as by calling datenum or posixtime first.
You can use the timetable method innerjoin to do this. Like so:
% Fabricate some data
dates1 = datetime(2015, 5, ones(10,1));
hours1 = (1:10)';
timetable1 = timetable(dates1(:), hours1, rand(10,1), rand(10,1), ...
'VariableNames', {'Hour', 'Price', 'Volume'});
% Subselect a few rows for timetable2
timetable2 = timetable1([1:3, 6:10],:);
% Use innerjoin to pick rows where Time & Hour intersect:
innerjoin(timetable1, timetable2, 'Keys', {'Time', 'Hour'})
By default, the result of innerjoin contains the table variables from both input tables - that may or may not be what you want.
Related
I have a bunch of daily change % data. I would like to calculate cumulative change, which should just be (1+change)*previous day in a chart in Tableau.
Seems simple enough right? I can do it in a few seconds in Excel, but I've tried for hours to get it to work in Tableau and cannot do it.
My thought was that I can create a column that is (1+daily change%), then try to do a compound product. However, I can't seem to get it to work.
I can't attach any files here so I pasted the data, along with a column that is "cum change", which is what I would like the calculation to be.
Thank you much in advance!
Date Daily Change Cum Change
4/1/2015 0.47% 1
4/2/2015 0.56% 1.0056
4/3/2015 -0.72% 0.99835968
4/6/2015 -0.56% 0.992768866
4/7/2015 -0.80% 0.984826715
4/8/2015 0.44% 0.989159952
4/9/2015 -0.66% 0.982631497
4/10/2015 0.99% 0.992359549
4/13/2015 0.92% 1.001489256
4/14/2015 0.73% 1.008800128
4/15/2015 0.95% 1.018383729
4/16/2015 0.42% 1.022660941
4/17/2015 0.52% 1.027978778
4/20/2015 0.02% 1.028184373
4/21/2015 0.56% 1.033942206
4/22/2015 0.35% 1.037561004
4/23/2015 -0.34% 1.034033296
4/24/2015 0.18% 1.035894556
4/27/2015 0.61% 1.042213513
4/28/2015 0.46% 1.047007695
4/29/2015 0.94% 1.056849568
Create a calculated field:
IF INDEX() = 1
THEN 1
ELSE
(1 + AVG([Daily Change])) * PREVIOUS_VALUE(1)
END
The condition checking to see if it's the first row of the partition (INDEX() = 1) is necessary to ensure that the first value of the field is a 1. After that, you can just use the self-referential PREVIOUS_VALUE() to get the previous value of this same calculation.
I've produced a code which separates data within a text file into the required format, filters the data and averages the output (in this case, the value in the fourth column)
I am trying to filter the data in column one for a list of values at the same time, with no strict pattern for the values. e.g 1001, 1007, 1048, 1192, 1200 ....
Currently my code only filters by a certain value (1001) is there a way of incorporating a list of values into this function?
C_f = C(C(:,1) == 1001 , :);
Any help would be much appreciated!
See if this is what you want,
val = [1000 1001];
ind = ismember(C(:,1),val);
C_f = C(ind,:)
I am trying to find the index in Date3, a column vector of date numbers from 01/01/2008 to 01/31/2014 that has been repeated many times, matches day. I basically want to organize idx into a cell array like idx{i} in which each cell is one day (So return all the indexes where Date3 equals day, where day is one of the days between 01/01/2008 and 01/31/2014. Eventually, I want to pull out the data for each day by applying the index I found to the variable Data2 (Reshape Data2 so that instead of a long column vector of data concentrations, I'll have a cell array in which each cell is all the data from one day)
This is what I have been doing:
for day = datenum(2008,01,01):1:datenum(2014,01,31); % All the days under consideration
idx = find(Date3, day); % Index of where Date3 equals the day under consideration
Data_PM25 = Data2(idx); % Pull out the data based on the idx
end
Example:
If Date3 looks like the following (It's actually much larger and repeats many many more times)
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
I want idx to be
`idx{1}` = (1, 15) % Where 733408 repeats
`idx{2}` = (2, 16) % Where 733409 repeats
...
And then Data2, which looked like:
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
[NaN]
'25.8'
'26.1'
'28.9'
'37.5'
'25.2'
'20'
'32.3'
'41'
'46.7'
'28.2'
'34.5'
'31.8'
'37.6'
'45.5'
'54.9'
'54.8'
'36.3'
'18.5'
Will now look like
'Data_PM25{1}' = ([NaN], '25.2')
'Data_PM25{2}' = ([NaN], '20')
...
Of course, the actual outputs will be much longer than just two matches.
What appears to be happening though is that I am comparing every day against Date3, a list of days, so I am getting all the days back.
This question expands off of a previous question: Find where a value matches and concatentate into column vector MATLAB
Use find(ismember…) to find where a certain day shows up in the long column of Date3 and then used that to pull out all the data, Lat, and Lon from that day.
group2cell for some reason, created double the number of days there were supposed to be. Basically, it somehow pulled out two years in which the first year it pulled out (1:365 or 1:366) was random.
day = datenum(years(y), 01, 01):datenum(years(y), 12, 31); % Create a column vector of all days in one year
for d = 1:length(day)
% Find index where Date3 matches one day, one day at a time
ind{d} = find(ismember(datenum(Date3), day(d)) == 1);
data_O3{d} = Data2(ind{d});
end
How about this: The result should be a cell array of vectors, each vector corresponding to the data from a given day.
This of course assumes Date3 and Data2 are the same size
Date3=[733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421
733408
733409
733410
733411
733412
733413
733414
733415
733416
733417
733418
733419
733420
733421];
Data2 = Date3.*50; % //just some dummy data for testing
Data_pm25=cell(0);
for day=datenum(2008,01,01):datenum(2014,01,01)
idx=day==Date3;
Data_pm25 = [Data_pm25,Data2(idx)];
end
The only problem I can see with this is if you don't have data for every day, you wouldn't necessarily know which day each cell is representing. This could easily be solved by storing a vector of dates with data.
As I mentioned in your other question try GROUP2CELL function from FileExchange.
You can use it on index as
[newidx, uniqueDate] = group2cell(idx,Date2);
or directly on data
[Data_PM25, uniqueDate] = group2cell(Data2,Date2);
uniqueDate will be an array of all unique Dates. uniqueDate(i) will correspond to Data_PM25(i).
I have a .mat file that contains data from the years 2006-2100. Each year, there is a different number of lines. I need to count how many lines are 2006, how many are 2007, etc.
The set up, by column, is: Year, Month, Day, Lat, Long
I just want to count the number of rows containing the same Year entry and get an array back with an array containing that info.
I'm thinking a for or while loop should work, but I don't know how to right it.
If we assume your data are in a numeric matrix, you can just do:
num_lines2006 = sum(data(:,1)==2006);
data2006 = data(data(:,1)==2006),:);
If you want to add a column with number of rows for corresponding year, here is a solution with a loop:
for k=size(data,1):-1:1
num_year(k,1) = sum(data(:,1)==data(k,1));
end
data = [data num_year];
Here is a solution without loop:
[unq_year,~,idx] = unique(data(:,1),'stable');
num_year = grpstats(data(:,1),unq_year,#numel);
data = [data num_year(idx)];
To count numeric entries, you may want to use histc
years = unique(data(:,1);
counts = histc(data(:,1),years);
Since you just want to count the number of rows you could just write something simple like:
years = unique(data(:, 1));
counts = arrayfun(#(year) nnz(data(:, 1) == year), years);
years contains the unique years, and numRows the number of times they are found.
You could also use a one-liner inspired by Jonas' answer:
[counts, years] = hist(data(:,1), unique(data(:,1))');
I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.