Any idea on visualization of multiple correlation matrix? - visualization

I would like to compare the correlation structure in a group of variables in three different cases, I'd like to produce a graph like this: sample graph
It's a combination of three correlation matrices.
Hope someone is more creative that others ...
p.s. It's also like this one: data-visualization

something similar from R-blogger ;
might be your find it helpful,
http://www.r-bloggers.com/visualizing-the-correlations-of-a-matrix/

Related

Matlab Scatterplot showing relation between two arrays w/ 10 datapoints in each?

I am trying to get a scatterplot like the one seen below in matlab: But I absolutely cannot get any matlab scatterplot techniques to collaborate with me in the slighest.
I have used my data to create boxplots with no issue, so suppose there is something wrong with my plotting approach.
My code currently looks like this:
Data = readtable("Dataset.xlsx"); gscatter(Data.Var1,Data.Var2)
Anyone got any guides or code advice that can help me create the intended plot?
Thanks ahead.

Is there an effective way to fit the following two datasets with lsqcurvefit?

I have two complex datasets for which I intend to find a suitable function to fit them. The first dataset is presented as follows:
As you can see, although complicated, it seems that this dataset is a combination of rectangle functions. These data describe the relation of 'Amplitude' of complex numbers with time. The second picture looks like this:
And this relation actually describes the 'Phase' of the above complex numbers with time, it seems that they are also combinations of rectangle functions. At first, I want to use combinations of Fourier cosine and sine series to fit the amplitude and phase using
lsqcurvefit
in MATLAB, but it seems that the provided parameters fail to converge to the correct values. (I have tried a number of options, like adjusting FiniteDifferenceStepSize, FiniteDifferenceType, StepTolerance and so on). Despite many failures, I saw someone said we could use Normal cumulative distribution function (CDF) to fit a step function, and I thought that it might be possible if we use the combinations of parameterized CDF and
y = erfc(x)
to achieve successful fitting. So, could anyone provide any solutions or ways to fit the above two relations? Giving some valuable ideas will also be very helpful to me.
PS: For now I don't care any hidden physics inside these data, and all I want to do is to find a mathematical way to fit the above two relations in MATLAB.
Thanks!

What is the official Matlab way to plot the values of histcounts into a histogram with any normalization option?

Assume that I have an array of counts (ideally returned by histcounts). Is there an official Matlab way to plot such a histogram with all the standard normalization options available?
It seems that the best suggestion I have is to get the counts from histcounts and then plot them with bar. Something like:
edges = linspace(0,bound,nbins);
hist_c = histcounts(X,nbins);
bar(edges(1:nbins-1),hist_c);
unfortunately as far as I know it seems that using bar is really not recommended according to this link. Probably because as its obvious from the code, it seems that it moves a lot of implementation details into user code (like produces edges array manually when only needing nbins or having to know if to use 1:nbins-1 vs 2:nbins).
Furthermore, which I believe is the worst, is that it leave the user to have to implement the normalization options on its own. One may point out that histcounts can do the normalization options for you, however, it can only do them given the data matrix X. If one had an extremely large matrix X, then one would be in trouble because producing the histogram counts of X could be done on the fly (as done in this question) but the other normalization options could not be easily be done on the fly. One practice the user could try to implement each normalization option as described by the equations in the documentation but it seems extremely inefficient to have users implement this by hand. Is there a way to get access to the code that actually performs this normalization?
In reality what my question is going for is, is there an official matlab way to produce histogram having only the histogram counts? In particular hiding all the implementation details of producing the counts, normalization, binning, edges, etc?
The ideal code in my mind should look like this to the user:
histogram_counts = get_hist_count(X)
plot_histogram(histogram_counts,'Normalization',normalization)
and produces the desired histogram plot.
Related question:
https://www.mathworks.com/matlabcentral/answers/332178-how-does-one-plot-a-histogram-from-the-histogram-counts
https://www.mathworks.com/matlabcentral/answers/275278-what-is-the-recommended-practice-for-plotting-the-outputs-of-histcounts
https://www.mathworks.com/matlabcentral/answers/91944-how-can-i-combine-the-options-histc-and-stack-in-a-bar-plot-in-matlab-7-4-r2007a#answer_101295

Matlab: Eliminating freak values in dataset

I am searching for a method to eliminate freak values out of given dataset. For example:
All these peaks should be eliminated. I've tried different filters like medfilt, but the peaks are still there. I've also tried a lowpass filter, but it didn't work either. I am a beginner in filtering signals, so I probably did it wrong.
You can download data sets for the x array here and y array here.
I could also imagine a loop to compare the values next to each other, but I am sure there has to be a built-in function?
Here is the result using medfilt1(input,15):
The peaks are vanishing, but the then I get these ugly steps, which I don't want.
just use median filter! medfilt1(data,3) will do if this is a 1 pixel "cosmic" spike. If the peaks remain, increase the window size to 5 or more...
EDIT:
so this is how op's data looks like:
So we see that the data is not exactly uniform or ordered, and there are a lot of data points in the spikes unlike what one first understand from the question (guys please plot your data correctly!) The question is now, is the data in the spikes or on in the baseline?

MANOVA - huge matrices

First, sorry by the tag as "ANOVA", it is about MANOVA (yet to become a tag...)
From the tutorials I found, all the examples use small matrices, following them would not be feasible for the case of big ones as it is the case of many studies.
I got 2 matrices for my 14 sampling points, 1 for the organisms IDs (4493 IDs) and other to chemical profile (190 variables).
The 2 matrices were correlated by spearman and based on the correlation, split in 4 clusters (k-means regarding the square euclidian clustering values), the IDs on the row and chemical profile on line.
The differences among them are somewhat clear, but to have it in a more robust way I want to perform MANOVA to show the differences between and within the clusters - that is a key factor for the conclusion, of course.
Problem is that, after 8h trying, could not even input the data in a format acceptable to the analysis.
The tutorials I found are designed to very few variables and even when I think I overcame that, the program says that my matrices can't be compared by their difference in length.
Each cluster has its own set of IDs sharing all same set of variables.
What should I do?
Thanks in advance.
Diogo Ogawa
If you have missing values in your data (which practically all data sets seem to contain) you can either remove those observations or you can create a model using those observations. Use the first approach if something about your methodology gives you conviction that there is something different about those observations. Most of the time, it is better to run the model using the missing values. In this case, use the general linear model instead of a balanced ANOVA model. The balanced model will struggle with those missing data.