I have 10,000 data in column A that hold values for a normal distribution; I have to estimate the variance using the series
series here
I've already estimated the sample mean x̄
, but my question is how do I apply the series to all the cells in a column? Or if I have to split it up into parts (subtraction, then squaring, then summation) how do I change the data in each cell by the same amount?
Related
I have some raw spreadsheet data that's in a format, like:
12/7/2016 3:07:00, 88.05,
12/7/2016 3:08:00, 89.10,
12/7/2016 3:13:00, 87.00,
etc
These data points are not sampled at a regular interval, but are randomly collected throughout the day.
Using Google Sheets I'm able to graph this easily onto a Timeline chart. This puts the values at the correct position on the timeline and takes the uneven sampling intervals into account.
I would like to generate a histogram of the timeline data while taking into account the timestamps and calculate an average value over a timeframe. I believe if I simply run this through the built-in histogram chart or select my data values and run it through an averaging function, it will be skewed by the uneven sampling intervals.
What's the easiest way to quantize the sampling intervals (ideally within Google Sheets) for generating my histogram and averaging?
or
Is there a built-in method to generate histograms/averaging of values while taking timestamp data into account, eliminating the need for quantized data?
You can calculate the appropriate average as follows (assuming your data is in the range A2:B50)
=sum(arrayformula((A3:A50-A2:A49)*(B3:B50+B2:B49)/2))/(A50-A2)
This formula implements the Trapezoidal rule: the value assigned to each time interval is the average of observed values at the ends of that interval.
There isn't a built-in "weighted histogram" tool, so it appears that needs re-sampling to create a representative histogram. Here is one way to resample. Let's say you want 20 samples; then in C2 enter
=arrayformula(A2+(row(1:20)-1)*(A50-A2)/19)
to get 20 uniformly distributed time values. (Division by 19 because of the fence-post distinction.) Then in D2,
=arrayformula(vlookup(C2:C21, A2:B50, 2))
will lookup a value for each sample time. Then you can build a histogram from column D.
I have very large data of size (1 x 23750811). I would like to visualise this data in histogram-Matlab.
As the data is very large, I am getting only a single dot in my plot. But I could visualise them separately, the first 1/4th of data and so on.
Any suggestion to visualise the entire data in a single plot at once.
Thanks !
Loading all your data into MatLab is inefficient; you can try using DuckDB; it allows you to use SQL to query very large datasets in several formats like CSV or Parquet; you can pre-compute the bins and heights, then export them and plot them using matlab.
This is a snippet you can use:
select
floor(column/bin_size)*bin_size,
count(*) as count
from "path/to/file.csv"
group by 1
order by 1;
Alternatively, you can try sampling your data.
I am currently working on a project in which I will be creating a vertical scatter plot using on average 6 points of y-axis data using Sigmaplot. The units of the graph are depth of snow in cm vs time. However the data I have collected is gathered over a range of days (i.e. 173-176) and I am having trouble applying my data sets to their respective ranged abscissa. I've noticed inputting the data in this manner finds the difference in the abscissa (i.e. 173-176 would correspond to 3) rather than interpreting the data as a range. Can anyone help me find a way in which to input abscissa not as singular values but rather ranges of those values using Sigmaplot?
Leaving the abscissa ranges in tact, plot all of the data. Sigmaplot does not want sort these data and rather places them in their respective chronological order. I did not find a way to make the width of the points as wide as the abscissa.
I have two sets of data. One set of data is a matrix containing different samples in each row and information regarding each sample in the columns, one of these columns contains longitude data and another one contains latitude data for the sample. The other dataset consists of three grids. One grid contains the latitude of the data, the second grid contains the longitude of the data and the third grid, the data for the 1° latitude longitude grid.
What I would like is to find out which data in the second dataset corresponds with the data in the second dataset. What I mean by this, is if a sample falls into a particular grid of the second dataset, the data in this grid needs to be extracted and it needs to be known for what sample the data applies.
So just say in the grid between latitudes 60 and 59, and longitudes 100 and 101 sample x falls. Just say the data in the gridded dataset is 10 for this particular grid. I would like to know that 10 (the data in the grid) applies to sample x.
In the end I would like to have the grid data that corresponds to a sample in a new matrix that could act as a partner to the sample dataset (ie. if sample x is in row 40 then in the matrix 10 is in row 40), or alternatively added to the same dataset as a new column. Keeping in mind that some samples will fall into the same grid.
I'm fairly inexperienced in matlab, I have tried the brushing tool but this does not work for this example. All I could think of that could potentially work is round each long and lat in the sample data to an even number and then find which samples overlap in long and lat and then intersect the long in the sample data with the long grid and then do the same for the lat grid finding the row and column each sample falls into and then find the data for each sample. This seems like a long way to go about it and I'm not too sure how well it would work as well.
I have completed this method and it has worked to an extent.... I have the rows and columns in which the data is for each sample (ie. sample x can be found in row 8 column 100). However when I try and extract this data from the grid it is not a matrix containing one column but many columns, the answer is still in the sample place of the matrix. How do I go about taking one data point from each row of a grid and ending up with a matrix of only one column (or one row in which I can turn into a column)?
Thank you
Presuming your first data set is in a matrix X, where the first column is latitude and the second is longitude and any other columns are existing data:
xlat = floor(X(1,:));
xlon = floor(X(2,:));
lat = % a list of the latitudes covered in the grid
lon = % a list of the longitudes covered in the grid
data = % the matrix of data you want to extract - a 2D grid
Such that lat(n),lon(m) together forms a reference to the left-bottom corner of the grid square, which contains data in data(n,m). floor rounds down, so that anything between 100 and 101 will be linked to 100, etc.
Now:
[~ n] = ismember(xlat,lat);
[~ m] = ismember(xlon, lon);
n and m are not latitudes or longitudes but indexes which relate the points in your data set X to some values in lat and lon which then refer to some value in the grid data.
Here's the last trick - use sub2ind to convert your n and m references to a single reference to the position in the grid, then extract all the required data in one go:
ind = sub2ind(size(data),n,m); % presuming the size of data is lat x lon);
Xdata = data(ind);
Xdata should be a single column, with the same size as the number of rows in X.
I'm currently using Matlab and I am plotting the contents of the rows of a matrix, where each column is an independent data set. As the matrix is large I don't want to have to go through the tedius task of writing up the plot labels for each data set individually, so I was wondering if there is a specific way to include a handle/name for each column in such a way that it will automatically apply the plot label, and will adjust accordingly if columns are added or removed from the matrix?
Thanks!
Specifics, if they help:
Amplified spontaneous emission (ASE) in an optical fibre amplifier. Rows act as storage for a discretised ASE spectrum, columns are a given position along the fibre amplifier (it is this position -- the distance along the fibre corresponding to the column -- which I want to use as the label) and each element contains power information. The plot gives spectral power of ASE in the fibre for different positions along its length.
If by labels you mean the plot legend, you can do that by using cells. Consider matrix A
A = repmat([1:3], 3, 1)
A =
1 2 3
1 2 3
1 2 3
You can call plot to plot the columns of the matrix
plot(A);
Here, you will get 3 horizontal lines at y=1, 2 and 3. You can create your legend as follows
l{1} = 'dataset1';
l{2} = 'dataset2';
l{3} = 'dataset3';
Then you type
legend(l)
to show the legend. However, no one will create the legend for you, so you must create the cell array yourself. You can do it automatically, of course, e.g. the above legend can be created by a simple loop
for i=1:size(A, 2)
l{i} = ['dataset' num2str(i)];
end