I have a large data set of offline and online data, offline data is only taken every two hourly so wish to interpolate between the gaps. where the data is missing it is replaced with -9.999. I wish to interpolate in order to estimate these values. My idea is to find the missing values in the set to compare against the time intervals, but I cannot get it to work
This is what I have so far:
iv = 33; % column which holds cell weight
ind = find (Data(:,iv)<0); % find the indices of missing values
Interp_iv = interp1 (Data(ind,2),Data(ind,2),Data(:,2),'spline')
Your x and v is the same try this:
interp1 (Data(ind,2),Data(ind,iv),Data(:,2),'spline')
Related
I have a temporal dataset(1000000x70) consisting of info about the activities of 20 subjects. I need to apply subsampling to the dataset as it has more than a million rows. How to select a set of observations of each subject ideally from it? Later, I need to apply PCA and K-means on it. Kindly help me with the steps to be followed. I'm working in MATLAB.
I'm not really clear on what you're looking for. If you just want to subsample a matrix on matlab, here is a way to do it:
myData; % 70 x 1000000 data
nbDataPts = size(myData, 2); % Get the number of points in the data
subsampleRatio = 0.1; % Ratio of data you want to keep
nbSamples = round(subsampleRatio * nbDataPts); % How many points to keep
sampleIdx = round(linspace(1, nbDataPts, nbSamples)); % Evenly space indices of the points to keep
sampledData = myData(:, sampleIdx); % Sampling data
Then if you want to apply PCA and K means I suggest you take a look at the relevant documentation:
PCA
K means
Try to work with it, and open a new question if a specific problem arises.
In my plot I have x-axis in datetime format and y is corresponding observations. There are a few clusters of anomalies which I can visually recognise. I tried to select the anomalies using `Brush/Select data' tool in the figure, but when I tried to copy data on the clipboard and pasted in notepad, the data is not in datetime format and I can not interpret it.
I would like to select the data from plot and remove the indices from the dataset. I am providing a sample of data I copied from brush tool.
56.5868518518519 463.32834344035
56.6596759259259 463.337
56.6603240740741 463.335
56.6608217592593 463.326
Thanks
Have you looked at rmoutliers(A)? If all you need is to remove the outliers this function will do exactly that.
If you for whatever reason cannot use the function, you can use this:
% Compute the median absolute difference
meanValue = mean(vector)
% Compute the absolute differences. It will be a vector.
absoluteDeviation = abs(vector - meanValue)
% Compute the median of the absolute differences
mad = median(absoluteDeviation)
% Find outliers. They're outliers if the absolute difference
% is more than some factor times the mad value.
sensitivityFactor = 6 % Whatever you want.
thresholdValue = sensitivityFactor * mad;
outlierIndexes = abs(absoluteDeviation) > thresholdValue
% Extract outlier values:
outliers = vector(outlierIndexes)
% Extract non-outlier values:
nonOutliers = vector(~outlierIndexes)%Compute the median absolute difference
Credit goes to this guy but its a very simple approach and should do exactly what you need
The following is a problem from an assignment that I am trying to solve:
Visualization of similarity matrix. Represent every sample with a four-dimension vector (sepal length, sepal width, petal length, petal width). For every two samples, compute their pair-wise similarity. You may do so using the Euclidean distance or other metrics. This leads to a similarity matrix where the element (i,j) stores the similarity between samples i and j. Please sort all samples so that samples from the same category appear together. Visualize the matrix using the function imagesc() or any other function.
Here is the code I have written so far:
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array
% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix
Now, I think I've got the coding mostly right to answer the question. My issue is how to sort all the samples so that samples from the same category appear together because I got rid of the names when I created the copy. Is it already sorted by converting to squareform? Other suggestions? Thank you!
It should be in the same order as the original data. While you could sort it afterwards, the easiest solution is to actually sort your data by class after line 2 and before line 3.
load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
Consider using sortrows:
tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. Row names of a table label the rows along the first dimension of the table. If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA.
I'm fairly new to Matlab so any help would be appreciated.
I'm trying to write a function using simple logic operators to create a number of 2D scatter graphs, the problem I've been having is that I cannot work out how to use a input from the user (the number of figures) to actually create that number of figures.
*edit (Just for the sake of clarity I'm plotting multiple sets of data ie columns on each figure but the important bit is that there will be multiple figures as the user specifies how many figures they want, this is the bit I cannot understand. I understand how to use hold on to plot more than one graph on each figure but how do I vary the number of figures depending on the input of the user?)
The user inputs are a matrix with dimensions 4000x30 (this will remain constant for my use) and the number of figures (this will change from 1-30) to plot from this data set. Each column represents a different sensor so the columns represent 1 set of data each.
The simpler the answer the better as I'm not a very experienced coder.
Thanks
GibGib
See if this works for you:
Data = rand(40,30); %// Just a small data set for testing.
%// Ask user how many figures are desired
prompt = {'Enter desired number of figures:'};
dlg_title = 'Input';
num_lines = 1;
def = {'5'};
NumFigures = inputdlg(prompt,dlg_title,num_lines,def);
%// Get # of figures. If the entry is not valid (i.e. remainder of division 30/entry is not 0), ask again.
while rem(size(Data,2),str2double(NumFigures{1})) ~= 0
NumFigures = inputdlg(prompt,dlg_title,num_lines,def);
end
NumFigures = str2double(NumFigures{1}); %// Convert to number
ColPerFig = size(Data,2)/NumFigures; %// Number of columns to plot per figure
ColStart = 1:ColPerFig:size(Data,2) %// Indices of the starting columns to plot
ColStart looks like this:
ColStart =
1 7 13 19 25
So its easier in the loop to index into Data and fetch the appropriate values.
%// Plot
for k = 1:NumFigures;
hFig(k) = figure;
plot(Data(:,ColStart(k):ColStart(k)+ColPerFig-1));
end
Ok, it seems like what you are asking is that you have this data matrix M, where the user defines U, and you to plot U number of plots where each plot is the 2D scatter of U columns that corresponds to M?
in that case, will this do?
figure;
hold on %is optional depending how you want your plot
for i = 1:U
plot(M(:,i))
end
If this is not what you are looking for, please specify your question further.
I am trying to make a histogram in matlab. My data size is huge (3.5 million), x and y data are the same size (both are 3.5 million)
My original data has 200,200,88 3D matrix, I reshaped it to 1 column
the code for this:
[dose , size] = Dose('C:\R1')
s = size(1)*size(2).size(3)
t = reshape(dose, s, [])
When I try the command hist(t), I got a 1 bar only.
My workspace is as the following:
dose <200x200x88 double>
s 3520000
size [200,200,88]
t <3520000x1 double>
Could you tell me how to make a histogram with this data?
I'm able to generate a vector of size 3520000x1 and build a histogram with it.
val=rand(3520000,1);
hist(val)
It's possible your data has a few singular outliers causing your bins to look something like (1,0,0,...,3519999).
If you save your histogram bins like h=hist(data); you can see what happened.
In order to get a single long vector from your 3D array you can use just the (:) operator. Try the following code:
num_of_bins = 100 ; %change to whatever # you want
hist(dose(:),linspace(min(dose(:)),max(dose(:)),num_of_bins));
The hist will take only the relevant limits of dose (min to max) and you can control the # of bins at will. I've used linspace to create a linearly spaced bin vector, but this can be modified also to a different set of bins by assigning a different range vector.