How to compute histogram using three variables in MATLAB? - matlab

I have three variables, e.g., latitude, longitude and temperature. For each latitude and longitude, I have corresponding temperature value. I want to plot latitude v/s longitude plot in 5 degree x 5 degree grid , with mean temperature value inserted in that particular grid instead of occurring frequency.
Data= [latGrid,lonGrid] = meshgrid(25:45,125:145);
T = table(latGrid(:),lonGrid(:),randi([0,35],size(latGrid(:))),...
'VariableNames',{'lat','lon','temp'});
At the end, I need it somewhat like the following image:

Sounds to me like you want to scale your grid. The easiest way to do this is to smooth and downsample.
While 2d histograms also bin values into a grid, using a histogram is not the way to find the mean of datapoints in a smooth grid. A histogram counts the occurrence of values in a set of ranges. In a 2d example, a histogram would take the input measurements [1, 3, 3, 5] and count the number of ones, the number of threes, etc. A 2d histogram will count occurrences of pairs of numbers. (You might want to use histogram to help organize a measurements taken at irregular intervals, but that would be a different question)
How to smooth and downsample without the Image Processing Toolbox
Keep your data in the 2d matrix format rather than reshaping it into a table. This makes it easier to find the neighbors of each grid location.
%% Sample Data
[latGrid,lonGrid] = meshgrid(25:45,125:145);
temp = rand(size(latGrid));
There are many tools in Matlab for smoothing matrices. If you want to have the mean of a 5x5 window. You can write a for-loop, use a convolution, or use filter2. My example uses convolution. For more on convolutional filters, I suggest the wikipedia page.
%% Mean filter with conv2
M = ones(5) ./ 25; % 5x5 mean or box blur filter
C_temp = conv2(temp, M, 'valid');
C_temp is a blurry version of the original temperature variable with a slightly smaller size because we can't accurately take the mean of the edges. The border is reduced by a frame of 2 measurements. Now, we just need to take every fifth measurement from C_temp to scale down the grid.
%% Subsample result
C_temp = C_temp(1:5:end, 1:5:end);
% Because we removed a border from C_temp, we also need to remove a border from latGrid and lonGrid
[h, w] = size(latGrid)
latGrid = latGrid(5:5:h-5, 5:5:w-5);
lonGrid = lonGrid(5:5:h-5, 5:5,w-5);
Here's what the steps look like
If you use a slightly more organized, temp variable. It's easier to see that the result is correct.
With Image Processing Toolbox
imresize has a box filter method option that is equivalent to a mean filter. However, you have to do a little calculation to find the scaling factor that is equivalent to using a 5x5 window.
C_temp = imresize(temp, scale, 'box');

Related

How do I interpret the SOM weight positions plot?

I am a newbie to SOMs and am using the Matlab SOM package to examine sea level pressure over time. My 2D input array is (row x column): pressure (function of latitude and longitude) x time. When training is complete, and I plot the SOM weight positions, I get the following:
Is this correct? All of the weight position plots that I see are not so 1:1, so my plot appears strange.
Here is my code (note: code will not execute, just for conceptual purposes)
slp = somedata; % dim: 30200 x 1550 [pressure x time]
% Calculate mean for each location over time
mean_slp = nanmean(slp,2);
% Calculate anomalies for each location over time
slp_anom = nan((i2-i1+1)*(j2-j1+1),nfiles);
for i = 1:time
slp_anom(:,i) = slp3(:,i) - mean_slp(i,1);
end
% Normalize data
[slp_anom2,PS] = mapminmax(slp_anom);
net = selforgmap([4 4]);
net.trainParam.epochs = 1000;
net = train(net,slp_anom2);
I appreciate any and all feedback. Thanks!
What is a Weight positions plot in the context of the SOM algorithms's training?
How SOM are trained
The SOM algorithm essentially computes a set of prototype/codebook vectors of the same dimension as the input data. It does so by initializing # neurons according to some rule (random, PCA, etc.) inside the input space and then shifting their positions inside the input space so as to minimize a distance metric under the constraint of a neighborhood function that determines the influence of data point in the neurons' receptive field at each iteration.
Weight Position Plot
The Weight Positions Plot is a 3D plot (!) so you need to use the rotate 3D tool to be able to make sense of the map.
What you then see, depending on dimensionality, is a collection of pale-blue dots and red lines. The pale blue dots are the projections of the neuron positions according onto the two dimensions selected for the plot that have been shifted around the input space.
So the plot would look differently depending on which Weight Dimensions (aka. Input columns) you choose to compute the plot. Matlab typically chooses the first two input columns.
What about the water pressure?
I cannot help here as your dataset appear to be extremely wide for a simple pressure/time vector. Are the columns representing different measurement points on the globe? If so, you should ask yourself what would be gained by having a SOM model of that. What would you do when you got a new vector from a new timestamp? What would you like to do with it? What additinoal information would you gain?

How to make smooth plot with matrix that don't have the same column and line [duplicate]

Let's say we have the following data:
A1= [41.3251
18.2350
9.9891
36.1722
50.8702
32.1519
44.6284
60.0892
58.1297
34.7482
34.6447
6.7361
1.2960
1.9778
2.0422];
A2=[86.3924
86.4882
86.1717
85.8506
85.8634
86.1267
86.4304
86.6406
86.5022
86.1384
86.5500
86.2765
86.7044
86.8075
86.9007];
When I plot the above data using plot(A1,A2);, I get this graph:
Is there any way to make the graph look smooth like a cubic plot?
Yes you can. You can interpolate in between the keypoints. This will require a bit of trickery though. Blindly using interpolation with any of MATLAB's commands won't work because they require that the independent axes (the x-axis in your case) to increase. You can't do this with your data currently... at least out of the box. Therefore you'll have to create a dummy list of values that span from 1 up to as many elements as there are in A1 (or A2 as they're both equal in size) to create an independent axis and interpolate both arrays independently by specifying the dummy list with a finer spacing in resolution. This finer spacing is controlled by the total number of new points you want to introduce in the plot. These points will be defined within the range of the dummy list but the spacing in between each point will decrease as you increase the total number of new points. As a general rule, the more points you add the less spacing there will be and so the plot should be more smooth. Once you do that, plot the final values together.
Here's some code for you to run. We will be using interp1 to perform the interpolation for us and most of the work. The function linspace creates the finer grid of points in the dummy list to facilitate the interpolation. N would be the total number of desired points you want to plot. I've made it 500 for now meaning that 500 points will be used for interpolation using your original data. Experiment by increasing (or decreasing) the total number of points and seeing what effect this has in the smoothness of your data.
I'll also be using the Piecewise Cubic Hermite Interpolating Polynomial or pchip as the method of interpolation, which is basically cubic spline interpolation if you want to get technical. Assuming that A1 and A2 are already created:
%// Specify number of interpolating points
N = 500;
%// Specify dummy list of points
D = 1 : numel(A1);
%// Generate finer grid of points
NN = linspace(1, numel(A1), N);
%// Interpolate each set of points independently
A1interp = interp1(D, A1, NN, 'pchip');
A2interp = interp1(D, A2, NN, 'pchip');
%// Plot the data
plot(A1interp, A2interp);
I now get the following:

Smooth plot of non-dependent variable graph

Let's say we have the following data:
A1= [41.3251
18.2350
9.9891
36.1722
50.8702
32.1519
44.6284
60.0892
58.1297
34.7482
34.6447
6.7361
1.2960
1.9778
2.0422];
A2=[86.3924
86.4882
86.1717
85.8506
85.8634
86.1267
86.4304
86.6406
86.5022
86.1384
86.5500
86.2765
86.7044
86.8075
86.9007];
When I plot the above data using plot(A1,A2);, I get this graph:
Is there any way to make the graph look smooth like a cubic plot?
Yes you can. You can interpolate in between the keypoints. This will require a bit of trickery though. Blindly using interpolation with any of MATLAB's commands won't work because they require that the independent axes (the x-axis in your case) to increase. You can't do this with your data currently... at least out of the box. Therefore you'll have to create a dummy list of values that span from 1 up to as many elements as there are in A1 (or A2 as they're both equal in size) to create an independent axis and interpolate both arrays independently by specifying the dummy list with a finer spacing in resolution. This finer spacing is controlled by the total number of new points you want to introduce in the plot. These points will be defined within the range of the dummy list but the spacing in between each point will decrease as you increase the total number of new points. As a general rule, the more points you add the less spacing there will be and so the plot should be more smooth. Once you do that, plot the final values together.
Here's some code for you to run. We will be using interp1 to perform the interpolation for us and most of the work. The function linspace creates the finer grid of points in the dummy list to facilitate the interpolation. N would be the total number of desired points you want to plot. I've made it 500 for now meaning that 500 points will be used for interpolation using your original data. Experiment by increasing (or decreasing) the total number of points and seeing what effect this has in the smoothness of your data.
I'll also be using the Piecewise Cubic Hermite Interpolating Polynomial or pchip as the method of interpolation, which is basically cubic spline interpolation if you want to get technical. Assuming that A1 and A2 are already created:
%// Specify number of interpolating points
N = 500;
%// Specify dummy list of points
D = 1 : numel(A1);
%// Generate finer grid of points
NN = linspace(1, numel(A1), N);
%// Interpolate each set of points independently
A1interp = interp1(D, A1, NN, 'pchip');
A2interp = interp1(D, A2, NN, 'pchip');
%// Plot the data
plot(A1interp, A2interp);
I now get the following:

How to compute distance and estimate quality of heterogeneous grids in Matlab?

I want to evaluate the grid quality where all coordinates differ in the real case.
Signal is of a ECG signal where average life-time is 75 years.
My task is to evaluate its age at the moment of measurement, which is an inverse problem.
I think 2D approximation of the 3D case is hard (done here by Abo-Zahhad) with with 3-leads (2 on chest and one at left leg - MIT-BIT arrhythmia database):
where f is a piecewise continuous function in R^2, \epsilon is the error matrix and A is a 2D matrix.
Now, I evaluate the average grid distance in x-axis (time) and average grid distance in y-axis (energy).
I think this can be done by Matlab's Image Analysis toolbox.
However, I am not sure how complete the toolbox's approaches are.
I think a transform approach must be used in the setting of uneven and noncontinuous grids. One approach is exact linear time euclidean distance transforms of grid line sampled shapes by Joakim Lindblad et all.
The method presents a distance transform (DT) which assigns to each image point its smallest distance to a selected subset of image points.
This kind of approach is often a basis of algorithms for many methods in image analysis.
I tested unsuccessfully the case with bwdist (Distance transform of binary image) with chessboard (returns empty square matrix), cityblock, euclidean and quasi-euclidean where the last three options return full matrix.
Another pseudocode
% https://stackoverflow.com/a/29956008/54964
%// retrieve picture
imgRGB = imread('dummy.png');
%// detect lines
imgHSV = rgb2hsv(imgRGB);
BW = (imgHSV(:,:,3) < 1);
BW = imclose(imclose(BW, strel('line',40,0)), strel('line',10,90));
%// clear those masked pixels by setting them to background white color
imgRGB2 = imgRGB;
imgRGB2(repmat(BW,[1 1 3])) = 255;
%// show extracted signal
imshow(imgRGB2)
where I think the approach will not work here because the grids are not necessarily continuous and not necessary ideal.
pdist based on the Lumbreras' answer
In the real examples, all coordinates differ such that pdist hamming and jaccard are always 1 with real data.
The options euclidean, cytoblock, minkowski, chebychev, mahalanobis, cosine, correlation, and spearman offer some descriptions of the data.
However, these options make me now little sense in such full matrices.
I want to estimate how long the signal can live.
Sources
J. Müller, and S. Siltanen. Linear and nonlinear inverse problems with practical applications.
EIT with the D-bar method: discontinuous heart-and-lungs phantom. http://wiki.helsinki.fi/display/mathstatHenkilokunta/EIT+with+the+D-bar+method%3A+discontinuous+heart-and-lungs+phantom Visited 29-Feb 2016.
There is a function in Matlab defined as pdist which computes the pairwisedistance between all row elements in a matrix and enables you to choose the type of distance you want to use (Euclidean, cityblock, correlation). Are you after something like this? Not sure I understood your question!
cheers!
Simply, do not do it in the post-processing. Those artifacts of the body can be about about raster images, about the viewer and/or ... Do quality assurance in the signal generation/processing step.
It is much easier to evaluate the original signal than its views.

Matlab cdfplot: how to control the spacing of the marker spacing

I have a Matlab figure I want to use in a paper. This figure contains multiple cdfplots.
Now the problem is that I cannot use the markers because the become very dense in the plot.
If i want to make the samples sparse I have to drop some samples from the cdfplot which will result in a different cdfplot line.
How can I add enough markers while maintaining the actual line?
One method is to get XData/YData properties from your curves follow solution (1) from #ephsmith and set it back. Here is an example for one curve.
y = evrnd(0,3,100,1); %# random data
%# original data
subplot(1,2,1)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
%# reduced data
subplot(1,2,2)
h = cdfplot(y);
set(h,'Marker','*','MarkerSize',8,'MarkerEdgeColor','r','LineStyle','none')
xdata = get(h,'XData');
ydata = get(h,'YData');
set(h,'XData',xdata(1:5:end));
set(h,'YData',ydata(1:5:end));
Another method is to calculate empirical CDF separately using ECDF function, then reduce the results before plotting with PLOT.
y = evrnd(0,3,100,1); %# random data
[f, x] = ecdf(y);
%# original data
subplot(1,2,1)
plot(x,f,'*')
%# reduced data
subplot(1,2,2)
plot(x(1:5:end),f(1:5:end),'r*')
Result
I know this is potentially unnecessary given MATLAB's built-in functions (in the Statistics Toolbox anyway) but it may be of use to other viewers who do not have access to the toolbox.
The empirical CMF (CDF) is essentially the cumulative sum of the empirical PMF. The latter is attainable in MATLAB via the hist function. In order to get a nice approximation to the empirical PMF, the number of bins must be selected appropriately. In the following example, I assume that 64 bins is good enough for your data.
%# compute a histogram with 64 bins for the data points stored in y
[f,x]=hist(y,64);
%# convert the frequency points in f to proportions
f = f./sum(f);
%# compute the cumulative sum of the empirical PMF
cmf = cumsum(f);
Now you can choose how many points you'd like to plot by using the reduced data example given by yuk.
n=20 ; % number of total data markers in the curve graph
M_n = round(linspace(1,numel(y),n)) ; % indices of markers
% plot the whole line, and markers for selected data points
plot(x,y,'b-',y(M_n),y(M_n),'rs')
verry simple.....
try reducing the marker size.
x = rand(10000,1);
y = x + rand(10000,1);
plot(x,y,'b.','markersize',1);
For publishing purposes I tend to use the plot tools on the figure window. This allow you to tweak all of the plot parameters and immediately see the result.
If the problem is that you have too many data points, you can:
1). Plot using every nth sample of the data. Experiment to find an n that results in the look you want.
2). I typically fit curves to my data and add a few sparsely placed markers to plots of the fits to differentiate the curves.
Honestly, for publishing purposes I have always found that choosing different 'LineStyle' or 'LineWidth' properties for the lines gives much cleaner results than using different markers. This would also be a lot easier than trying to downsample your data, and for plots made with CDFPLOT I find that markers simply occlude the stairstep nature of the lines.