DAX Calculate with Filter? - date

I need to calculate % of Anonymus Cases by Quarter.
I created a measure which calculates % of Anonymus cases but when I place it into a matrix table with Quarters as rows and measure as columns, the % numbers become really small. Which makes me wonder if the measure is calculating based on all rows as opposed to per row.
Please take a look at the image of the issue Matrix Table
Here is my DAX, how do I account for the measure being placed against year and quarter?
% of Anon Cases =
DIVIDE (
SUM('Cases'[Custom.Anon]),
CALCULATE(
DISTINCTCOUNT('Cases'[IemCaseKey]),
ALLSELECTED()
))

Related

Pairwise Similarity and Sorting Samples

The following is a problem from an assignment that I am trying to solve:
Visualization of similarity matrix. Represent every sample with a four-dimension vector (sepal length, sepal width, petal length, petal width). For every two samples, compute their pair-wise similarity. You may do so using the Euclidean distance or other metrics. This leads to a similarity matrix where the element (i,j) stores the similarity between samples i and j. Please sort all samples so that samples from the same category appear together. Visualize the matrix using the function imagesc() or any other function.
Here is the code I have written so far:
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array
% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix
Now, I think I've got the coding mostly right to answer the question. My issue is how to sort all the samples so that samples from the same category appear together because I got rid of the names when I created the copy. Is it already sorted by converting to squareform? Other suggestions? Thank you!
It should be in the same order as the original data. While you could sort it afterwards, the easiest solution is to actually sort your data by class after line 2 and before line 3.
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
Consider using sortrows:
tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. Row names of a table label the rows along the first dimension of the table. If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA.

Plotting x-axis and y-axis with different (indep) limits in Matlab

I developed an Android app such that each scan is set to 1 Minute, and during this time the sensor collects many many readings randomly. I want to plot one sensor data of one scan only as follows:
The time of the scan is put manually in seconds for only 1 minute (from 1:60 sec) in the x-axis. While the vector of random readings collected from the sensor (sometimes reach hundreds of values) in the y-axis.
How I can do this in Matlab?
I tried using this code but gives me an error. "Vectors must be the same length."
This is my code:
x1 = linspace(0,60);
plot(x1,vector1,'o-r',x1,vector2,'+-k','LineWidth',lw,'MarkerSize',msz);
xlabel('Time (s)');
ylabel('sensor readings')
In order to match the amount of values you have to modify the input for linspace:
x1 = linspace(0,60,length(vector1));
This way you will automatically get the right amount of entries for your x-axis vector.
You basically tell linspace to create a vector that ranges from 0 to 60 with length(vector1) entries, so that it matches the length of your data set.
Note that if your second data set has a different amount of entries as your first, you will need to create a different x-axis vector that respectively matches its amount of values.

How to quickly/easily merge and average data in matrix in MATLAB?

I have got a matrix of AirFuelRatio values at certain engine speeds and throttlepositions. (eg. the AFR is 14 at 2500rpm and 60% throttle)
The matrix is now 25x10, and the engine speed ranges from 1200-6000rpm with interval 200rpm, the throttle range from 0.1-1 with interval 0.1.
Say i have measured new values, eg. an AFR of 13.5 at 2138rpm and 74,3% throttle, how do i merge that in the matrix? The matrix closest values are 2000 or 2200rpm and 70 or 80% throttle. Also i don't want new data to replace the older data. How can i make the matrix take this value in and adjust its values to take the new value in account?
Simplified i have the following x-axis values(top row) and 1x4 matrix(below):
2 4 6 8
14 16 18 20
I just measured an AFR value of 15.5 at 3 rpm. If you interpolate the AFR matrix you would've gotten a 15, so this value is out of the ordinary.
I want the matrix to take this data and adjust the other variables to it, ie. average everything so that the more data i put in the more reliable and accurate the matrix becomes. So in the simplified case the matrix would become something like:
2 4 6 8
14.3 16.3 18.2 20.1
So it averages between old and new data. I've read the documentation about concatenation but i believe my problem can't be solved with that function.
EDIT: To clarify my question, the following visual clarification.
The 'matrix' keeps the same size of 5 points whil a new data point is added. It takes the new data in account and adjusts the matrix accordingly. This is what i'm trying to achieve. The more scatterd data i get, the more accurate the matrix becomes. (and yes the green dot in this case would be an outlier, but it explains my case)
Cheers
This is not a matter of simple merge/average. I don't think there's a quick method to do this unless you have simplifying assumptions. What you want is a statistical inference of the underlying trend. I suggest using Gaussian process regression to solve this problem. There's a great MATLAB toolbox by Rasmussen and Williams called GPML. http://www.gaussianprocess.org/gpml/
This sounds more like a data fitting task to me. What you are suggesting is that you have a set of measurements for which you wish to get the best linear fit. Instead of producing a table of data, what you need is a table of values, and then find the best fit to those values. So, for example, I could create a matrix, A, which has all of the recorded values. Let's start with:
A=[2,14;3,15.5;4,16;6,18;8,20];
I now need a matrix of points for the inputs to my fitting curve (which, in this instance, lets assume it is linear, so is the set of values 1 and x)
B=[ones(size(A,1),1), A(:,1)];
We can find the linear fit parameters (where it cuts the y-axis and the gradient) using:
B\A(:,2)
Or, if you want the points that the line goes through for the values of x:
B*(B\A(:,2))
This results in the points:
2,14.1897 3,15.1552 4,16.1207 6,18.0517 8,19.9828
which represents the best fit line through these points.
You can manually extend this to polynomial fitting if you want, or you can use the Matlab function polyfit. To manually extend the process you should use a revised B matrix. You can also produce only a specified set of points in the last line. The complete code would then be:
% Original measurements - could be read in from a file,
% but for this example we will set it to a matrix
% Note that not all tabulated values need to be present
A=[2,14; 3,15.5; 4,16; 5,17; 8,20];
% Now create the polynomial values of x corresponding to
% the data points. Choosing a second order polynomial...
B=[ones(size(A,1),1), A(:,1), A(:,1).^2];
% Find the polynomial coefficients for the best fit curve
coeffs=B\A(:,2);
% Now generate a table of values at specific points
% First define the x-values
tabinds = 2:2:8;
% Then generate the polynomial values of x
tabpolys=[ones(length(tabinds),1), tabinds', (tabinds').^2];
% Finally, multiply by the coefficients found
curve_table = [tabinds', tabpolys*coeffs];
% and display the results
disp(curve_table);

How to average values in plot, to make a plot with fewer values

I have a script that plots wind speed in m/s (measured every second) against time in minutes over a period of 24 hours. I want to make a new plot that instead of plotting wind speed each second, averages the wind speed over a period of 10 minutes and then plots this against the time.
Here is a sample image of my data:
Any ideas of how I can do this?
You can use a Moving Average filter using the smooth function as suggested by m.s. in a comment. This is fairly simple:
y = smooth(x,span);
This uses a symmetric smoothing filter, so the span (i.e. the number of samples it takes for smoothing) must be odd: take the current sample plus n before and n after the current sample. That way you still have one sample for every second, they are just smoothed to damp noise and measurement errors.
If you want to reduce the number of points, such that only one point every 10 minutes exists, you can do the following: You take the first 10min * 60s = 600 samples of the vector and put them in the first column of a new matrix. Then take the next 600 samples and put them in the second column, and so on. Now you can column-wise take the mean of the matrix. That way you have a new vector where every element is the mean of 600 samples.
In MATLAB this is easily possible:
X = reshape(x,600,[]); % create matrix with 600 elements per column
y = mean(X,1); % take column-wise mean

How to resample with interp1 in Matlab when input vectors are of different length

I have two variables in a .mat file here:
https://www.yousendit.com/download/UW13UGhVQXA4NVVQWWNUQw
testz is a vector of cumulative distance (in meters, monotonically and regularly increasing)
testSDT is a vector of integrated (cumulative) sound wave travel time (in milliseconds) generated using the distance vector and a vector of velocities
(there is an intermediate step of creating interval travel times)
Since velocity is a continuously variable function the resulting interval travelt times and also the integrated travel times are non integers and variable in magnitude
What I want is to resample the distance vector at regular time intervals (e.g. 1 ms, 2 ms, ..., n ms)
What makes it difficult is that the maximum travel time, 994.6659, is less than the number of samples in the 2 vectors, therefore it is not straightforward to use interp1.
i.e.:
X=testSDT -> 1680 samples
Y=testz -> 1680 samples
XI=[1:1:994] -> 994 samples
This is the code I've come up with. It is a working code and it is not too bad I think.
%% Initial chores
M=fix(max(testSDT));
L=(1:1:M);
%% Create indices
% this loops finds the samples in the integrated travel time vector
% that are closest to integer milliseconds and their sample number
for i=1:M
[cl(i) ind(i)] = min(abs(testSDT-L(i)));
nearest(i) = testSDT(ind(i));
end
%% Remove duplicates
% this is necessary to remove duplicates in the index vector (happens in this test).
% For example: 2.5 ms would be the closest to both 2 ms and 2 ms
[clsst,ia,ic] = unique(nearest);
idx=(ind(ia));
%% Interpolation
% this uses the index vectors to resample the depth vectors at
% integer times
newz=interp1(clsst,testz(idx),[1:1:length(idx)],'cubic')';
As far as I can see there is one issue with this code:
I rely on the vector idx as my XI for interpolation. Vector idx is 1 sample shorter than vector ind (one duplicate was removed).
Therefore my new times will stop one millisecond short. This is a very small issue, and duplicate are unlikely but I am wondering if anybody can think of a workaround, or of a different way to approach the problem altogether.
Thank you
If I understand you correctly, you want to extrapolate to that extra point.
you can do this is many ways, one is to add that extra point to the interp1 line.
If you have some function you expect to follow your data you can use it by fitting it to the data and then obtaining that extra point or with a tool like fnxtr.
But I have a problem understanding what you want because of the way you used the line. The third argument you use, [1:1:length(idx)], is just the series [1 2 3 ...], usually when interpolating, one uses some vector x_i of points of interest, though I doubt your points of interest happen to be the series of integers 1:length(idx), what you want is just [1:length(idx) xi], where xi is that extra point x-axis value.
EDIT:
Instead of the loop just produce matrix forms out of L and testSDT, then matrix operation is somewhat faster in doing the min(abs(...:
MM=ones(numel(testSDT),1)*L;
TT=testSDT*ones(1,numel(L));
[cl ind]=(min(abs(TT-MM)));
nearest=testSDT(ind);