Histogram plot for very large data - matlab

I have very large data of size (1 x 23750811). I would like to visualise this data in histogram-Matlab.
As the data is very large, I am getting only a single dot in my plot. But I could visualise them separately, the first 1/4th of data and so on.
Any suggestion to visualise the entire data in a single plot at once.
Thanks !

Loading all your data into MatLab is inefficient; you can try using DuckDB; it allows you to use SQL to query very large datasets in several formats like CSV or Parquet; you can pre-compute the bins and heights, then export them and plot them using matlab.
This is a snippet you can use:
select
floor(column/bin_size)*bin_size,
count(*) as count
from "path/to/file.csv"
group by 1
order by 1;
Alternatively, you can try sampling your data.

Related

How to apply tsne() to MATLAB tabular data?

I have a 33000 x 1975 table in MATLAB, obviously requiring dimensionality reduction before I do any further analysis. The features are the 1975 columns and the rows are instances of the data. I tried using tsne() function on the MATLAB table, but it seems tsne() only works on numeric arrays. The thing is that is there a way to apply tsne on my MATLAB table. The table consists of both numeric as well as string data types, so table2array() doesn't work in my case for converting the table to a numeric array.
Moreover, it seems from the MATHWORKS documentation, as applied to the fisheriris dataset as an example, that tsne() takes the feature columns as the function argument. So, I would need to separate the predictors from the resonses, which shouldn't be a problem. But, initially, it seems confusing as to how I can proceed further for using the tsne. Any suggestions in this regard would be highly appreciated.
You can probably use table indexing using {} to get out the data that you want. Here's a simple example adapted from the tsne reference page:
load fisheriris
% Make a table where the first variable is the species name,
% and the other variables are the measurements
data = table(species, meas(:,1), meas(:,2), meas(:,3), meas(:,4))
% Use {} indexing on 'data' to extract a numeric matrix, then
% call 'tsne' on that
Y = tsne(data{:, 2:end});
% plot as per example.
gscatter(Y(:,1),Y(:,2),data.species)

Getting Values from Excel Table on Matlab

I've been trying to simulate a chemical reactor on Matlab. For this to work I need to be able to get some values from excel tables with thermodynamic properties. Each table is for a chemical species and the first row is always the temperature.
I would like to input the temperature so that Matlab would search the temperature column for the value and return, for example, the associated enthalpy.
I used xlsread to load the tables as matrices but I am having trouble to tell Matlab what to do next.
I've been told to try
[row, ~] = find(temp<=data(:,2)); %temp is the given temperature
but this just returns me
row = 1, 2, ...
I would also like to be able to handle intermediate temperatures that are not explicit in the table, it could be some kind of interpolation or just getting the nearest value.
Any help will be truly appreciated.

How to use 3D surface data from cftool in Simulink lookup table?

I am designing a battery model with an internal resistance which is dependant on two variables: SoC and temperature.
I have interpolated the data I have (x,y and z basically - a total of 131 points each) with MATLAB's curve fitting toolbox and was able to generate the desired 3D map of that dependence (see the picture below):
My question is how can I use that map now for my Simulink model? As input parameters I will have SoC and temperature and the resistance in ohm should be the output. However, I have not been able to find a convenient way to export the data in a suitable lookup table (or similarly useful, my first guess was that I should use a 2-D lookup table in this case) in Simulink. However, I am quite new to this and I do not know how to generate the the table data for the Simulink LUT.
Simulink LUT:
Table data is your interpolated z-data from curve fitting. I guess it will have a value for every combination of breakpoints (i.e. it covers every grid intersection in your first diagram). So if Breakpoint 1 is 100 elements and Breakpoint 2 is 40 elements, Table data is 100x40.
If you can't get the data out from the GUI-based interactive curve fit, I guess you can extract the data from the command line. The following is an excerpt of Mathworks' curve fitting documentation. It would be good to verify this because I don't have the toolbox to test it though.
•Interpolation: fittedmodel = fit([Time,Temperature], Energy, 'cubicinterp');
•Evaluation: fittedmodel(80, 40)
Based on your LUT inputs u1 and u2, the table will interpolate or extrapolate the grid to get your output value.
Hope that helps.
I did find a solution after all, thanks Tom for your help, the fittedmodel() function was indeed the key of it. I then used two FOR loops to populate my matrix which was 49x51 (as seen by the grid in the image) after the cftool interpolation. After that it was all a matter of two for loops in one another to populate my matrix with the z values of my T and SoC parameters.
for x = 1:49
for y = 1:51
TableData(x,y)=fittedmodel(B_SoC(x),B_Temp(y));
end
end
Where TableData is the 49x51 matrix required for my LUT, B_SoC and B_Temp being [0:2.083:100] and [-10:1.1:45] respectively (determined as the desired start and end of my x and y axis with the spacing taken from the image with the data cursor).

Plot Condensed Dataset

So I have a sample dataset which I need to plot using Matlab.
The columns look like this:
Obviously due to this data set the plot looks exceptionally condensed.
Now I am totally new to plotting and statistical data processing.
What can be done to make the data plot more visually comparable/perusal-able (plotting at larger intervals?)?
Here's the code I wrote:
fid=fopen('me.dat', 'r');
s=textscan(fid,'%s %s %f %f', 'headerlines', 1);
fclose(fid);
a=s{1};
b=s{2};
c=s{3};
d=s{4};
plot(c,d)
Thanks.
When I have this kind of problem, I usually use the following methods:
1) Plot only every certain point. If you have 1D arrays a and b and you want to plot, say, every 5th point, use plot(a(1:5:end),b(1:5:end)), instead of plot(a,b). This works, because a(1:5:end) returns a(1), a(6), a(11), ..., so that you will plot roughly 1/5 of your data points. Here you simply omit most of your data points, so I prefer the second method.
2) If you have Image Processing toolbox, use imresize. Before plotting, resize your data aplot=imresize(a,0.2); If you want to decrease the size of your array by a factor of N, the second argument of the imresize should be 1/N. This generally works better, since you have an idea what's going on in your full dataset.

MATLAB: Split large array into smaller arrays, and plotting them as part of the same series

Problem
Running out of memory when running the scatter plot function with large arrays as inputs (6.7E6 elements).
Approach
I have two large sets of data in cells with size (n x 1).
xCell = cell(n,1);
yCell = cell(n,1);
The data inside of the cells are multiple arrays of variable size (VariableSize x 1). I concatenate all of the arrays from each cell into one array each.
% Combine cells into one array
x = cat(1,xCell{:});
y = cat(1,yCell{:});
% Clear unnecessary variables
clear xCell yCell
I end up with two arrays x and y with the same size (6.7E6 elements) ready as inputs for a scatter plot. After executing my code, I end up with a memory error.
Output
??? Out of memory. Type HELP MEMORY for your options.
I have maxed out the amount of heap space available for my computer and I have nothing else running on the computer.
Objective
I would like to load only parts (sub-arrays) of the data at a time while creating the scatter plot and conserving the fact that the smaller sub-arrays are all part of only one larger series.
you can use tools such as cloudPlot and plot(Big) from the FEX. cloudPlot will help visualize the distribution of a 2-dimensional dataset. It is especially helpful when looking at extremely large datasets where a regular plot(x,y,'.') will just fill the plot with a solid color because the measurement points overlap each other. plot(Big) intercepts data going into a plot and reduces it to the smallest possible set that looks identical given the number of pixels available on the screen. It then updates the data as a user zooms or pans. This is useful when a user must plot a very large amount of data and explore it visually.
See more here on how to visualize distributions of 2d data.