Rearrange distribution function Matlab - matlab

I have the following data representing values over a 12 month period:
1. 0
2. 253
3. 168
4. 323
5. 556
6. 470
7. 225
8. 445
9. 98
10. 114
11. 381
12. 187
How can I smooth this line forward?
What I need is that going through the list sequentially any value that is above the mean (268) be evenly distributed amongst the remaining months- but in such a way that it produces as smooth a line as possible. I need to go through from Jan to Dec in order. Looking forward I want to sweep any excess (peaks) into the months still to come such that the distribution is as even as possible (such that the troughs are filled first). So the issue is to, at each point, determine what the "excess" for that month is and secondly how to distribute that amongst the months still to come.
I have used
p = find(Diff>0);
n = find(Diff<=0);
POS = Diff(p,1);
NEG = Diff(n,1)
to see where shortfall/ excesses against the mean exist but unsure how to construct a code that will redistribute forward by allocating to the "troughs" of the distribution first. An analogy is that these numbers represent harvest quantities and I must give out the harvest to the population. How do I redistribute the incoming harvest over the year such that I minimise excess supply/ under supply? I obviously cannot give out anything I haven't received in a particular month unless I have stored some harvest from previous months.
e.g. I start in Jan, I see that I cannot give anything to the months Feb to Dec so the value for Jan is 0. In Feb I have 253- do I adjust 253 downwards or give it all out? If so by how much? and where do I redistribute the excess I trim between Mar to Dec? And so on and so forth.. How do I do this to give as smooth (even) a distribution as possible?
For any month the new value assigned to that month cannot exceed the current value. The sum for the 12 months must be equal before and after smoothing. As the first position January will always be 0.

Simple version, just loops through and if the next month is lower than the current month, passes value forward to equalise them.
for n = 1:11
if y(n)>y(n+1);
y(n:n+1)=(y(n)+y(n+1))/2;
end
end

It's not very clear to me what you're asking...It sounds a bit like a roundabout way of asking how to fit a straight line to data. If that is the case, see below. Otherwise: please clarify a bit more what you want. Provide a toy example input data, and expected output data.
y = [ 0 253 168 323 556 470 225 445 98 114 381 187 ].';
x = (0:numel(y)-1).';
A = [ones(size(x)) x];
plot(...
x, y, 'b.',...
x, A*(A\y), 'r')
xlabel('Month'), ylabel('Data')
legend('original data', 'fit')

I dont get exactly what you want either, maybe something simple like this?
year= [0 253 168 323 556 470 225 445 98 114 381 187];
m= mean(year);
total_before = sum(year)
linear_year = linspace(0,m*2,12);
toal_after= sum(linear_year)
this gives you a line, the sum stays the same and the line is perfectly smooth ...

Related

Are there any instances where a negative time type could give unexpected results?

Are there any instances where a negative time type could give unexpected results if used for specific purposes? When time deltas are calculated between negative time values and non-negative time values for example, there do not appear to be any issues.
time
val
00:00:31.384
-0.3170017
00:06:00.139
0.9033492
00:07:01.099
-0.7661049
Then, for the purpose of a window join later over a 10-min window
win:00:10:00;
winForJoin: (neg win;00:00:00) +\:(exec time from data);
first[winForJoin] gives -00:09:28.616 -00:03:59.861 -00:02:58.901
winForJoin[1]-winForJoin[0] gives 10 minutes as expected
If I understand correctly, you're asking how would a window join behave if the opening interval was a negative time? (due to the interval subtraction taking the values into negative territory, relative to 00:00).
The simple answer is that it won't behave any differently than if the times were numbers, but in practice you may see results you don't expect depending on how your table is set up and what you're trying to achieve.
Taking the example in the official wiki as a starting point: https://code.kx.com/q/ref/wj/
q)t:([]sym:3#`ibm;time:10:01:01 10:01:04 10:01:08;price:100 101 105)
q)a:101 103 103 104 104 107 108 107 108
q)b:98 99 102 103 103 104 106 106 107
q)q:([]sym:`ibm; time:10:01:01+til 9; ask:a; bid:b)
q)f:`sym`time
q)w:-2 1+\:t.time
/add volume too so it's easier to follow:
q)s:908 360 522 257 858 585 90 683 90;
q)update size:s from `q
/add an alternative range which has negative starting time
q)w2:(-11:00;1)+\:t.time
The window join takes all rows in q whose times are between the pairs of time ranges:
q)q[`time]within/:flip w
110000000b
011110000b
000001111b
Under the covers it's asking: are these positive numbers (the quote times) in between those two positive numbers (the window range). There's no reason it can't also ask: are these positive numbers in between this negative number and this positive number
q)q[`time]within/:flip w2
110000000b
111110000b
111111111b
You'll notice that all of them are greater than the negative time - meaning that it will include all rows from the beginning of the q table, up until the end time of that pair. This can be considered expected behaviour - if your start time is negative you must mean "from the beginning of time" - aka, all rows from the beginning of the table.
Comparing sum of size shows how the results differ:
q)wj[w;f;t;(q;(sum;`size))]
sym time price size
-----------------------
ibm 10:01:01 100 1268
ibm 10:01:04 101 1997
ibm 10:01:08 105 1448
q)wj[w2;f;t;(q;(sum;`size))]
sym time price size
-----------------------
ibm 10:01:01 100 1268
ibm 10:01:04 101 2905
ibm 10:01:08 105 4353
Finally - where it might get complicated.....it depends on what "negative" time means in your table. If you're at 00:00 (midnight) and you subtract 10 minutes, are you trying to access data from 23:50 the day before? Or does 00:00 represent the starting time (row zero) of your table? If you're trying to access 23:50 from the day before then you will have problems because 23:50 is NOT in between your negative start time and your positive end time, e.g:
q)23:50 within(-00:58:59;10:01:02)
0b
Again this all depends on how your data looks and what you're trying to do

Getting started with LibSVM for a particular case

I have read quite a lot about LibSVM library, but I would like to ask you for some advices in my particular case. The problem is that I have some 3D medical images (DCE-MRI) of a stomach. My goal is to perform a segmentation of a kidney, and find its three parts. Therefore, I need to train a classifier - I'm going to use SVM and neural network
Feature vectors:
What is available is the pixel (voxel) brightness value (I guess the value range is [0; 511]). In total, there are 71 frames, each taken every second. So the crucial feature of every voxel is how the voxel brightness/intensity is changing during the examination time. In my case, every part of a kidney has a different chart (see an example below), so the way how the voxels brightness is changing over the time will be used by the classifier.
Training sets:
Every training set is a vector of intensity value of one voxel (74 numbers). An example is presented below:
[22 29 21 7 19 12 23 25 33 28 25 5 21 18 27 21 11 11 26 12 12 31 15 15 12 29 17 34 30 11 12 24 35 28 27 26 29 22 15 23 24 14 14 37 241 313 350 349 382 402 333 344 332 366 339 383 383 379 394 398 402 357 346 379 365 376 366 365 360 363 376 383 389 385]
Summary and question to you:
I have many training sets consisting of 74 values from the range [0; 511]. I have 3 groups of voxels, which have a characteristic feature - the brightness is changing in the similar way. What I want to obtain is a classificator, which after getting one voxel vector with 74 numbers, will assess if the voxel belongs to one of these 3 groups, or to none of them.
Question: how to start with LibSVM, any advices? From what I know now is that I should transform input values to be from the range [0; 1] or [-1; 1]. I have many training sets prepared belonging to one of these 3 groups. I will be grateful for any advice, as I'm a newbie and I just need some tips just to start.
You can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically. If you have 3 classes, you can label them using {1,2,3}.Its length is equal to the number of samples.
The feature is not restricted. It can be what ever you want.
However, you'd better preprocess them to make the results better. For example, you can change range[0:511] to range[0:1] or minus the mean of the feature.
Notice that the testset data should be preprocessed in the same way.
Hope this will help you!

Writing (and using) principal component analysis in matlab

I (hope to) obtain a matrix with data on different characteristics on rat calls (in the ultrasound). Variables include starting frequency, ending frequency, duration etc etc. The observations will include all the rat calls in my audio recording.
I want to use PCA to analyze my data, hopefully decorrelating any principal components that are not important to the structure of these calls and how they work, allowing me to group the calls up.
My problem is that while I have a basic understanding of how PCA works, I don't have an understanding of the finer points including how to implement this in Matlab.
I know you should standardise my data. All methods I have seen involve means adjusting by subtracting the mean. However some others also divide by the standard deviation or divide the transpose of the means adjusted data by the square root of N-1 (N being the number of variables).
I know with the standardised data, you can then find the covariance matrix, and extract the eigen values and vectors such as with using eig(cov(...)). some others use svd(...) instead. I still don't understand what this is and why it is important
I know there are different ways to implement PCA, but I don't like how I get different results for all of them.
There is even a pca(...) command also.
While reconstructing the data, some people multiply the means adjust data with the principal component data, others do the same but with the transpose of the principal component data
I just want to be able to analyse my data by plotting graphs of the principal components, and of the data (with the most insignificant principal components removed). I want to know about the variances of these eigen vectors and how much they represent the total variance of the data. I want to be able to fully exploit all the information PCA can allow me to get out
can anyone help?
=========================================================
This code seems to work based on pg 20 of http://people.maths.ox.ac.uk/richardsonm/SignalProcPCA.pdf
X = [105 103 103 66; 245 227 242 267; 685 803 750 586;...
147 160 122 93; 193 235 184 209; 156 175 147 139;...
720 874 566 1033; 253 265 171 143; 488 570 418 355;...
198 203 220 187; 360 365 337 334; 1102 1137 957 674;...
1472 1582 1462 1494; 57 73 53 47; 1374 1256 1572 1506;...
375 475 458 135; 54 64 62 41];
[M,N] = size(X);
mn = mean(X,2);
data = X - repmat(mn,1,N);
Y = data' / sqrt(N-1);
[~,S,PC] = svd(Y);
S = diag(S);
V = S .* S;
signals = PC' * data;
%plotting single PC1 on its own
figure;
plot(signals(1,:),zeros(1,size(signals,2)),'b.','markersize',15)
xlabel('PC1')
title('plotting single PC1 on its own')
%plotting PC1 against PC2
figure;
plot(signals(1,:),signals(2,:),'b.','markersize',15)
xlabel('PC1'),ylabel('PC2')
title('plotting PC1 against PC2')
figure;
plot(PC(:,1),PC(:,2),'m.','markersize',15)
xlabel('effect(PC1)'),ylabel('effect(PC2)')
but where is the standard deviation? how is the result different to
B=zscore(X);
[PC, D] = eig(cov(B));
D = diag(D);
cumsum(flipud(D)) / sum(D)
PC*B %(note how this says PC whereas above it says PC')
If the principal components are represented as columns, then I can remove the most insignificant eigen vectors by finding the smallest eigenvalue and setting its corresponding eigen vector column to a column of zeros.
How can either of these methods above be applied by using the pca(...) command and achieve THE SAME result? can anyone help explain this to me (and ideally show me how all of these can achieve the same results)?

irregular time series data interpolation

i'm a newbie in Matlab. after using a specific application, i get a file which contains a data acceleration recorded every 160ms.
16 25 50 32 234 199 6
16 25 50 192 240 196 3
16 25 50 352 236 199 8
16 25 50 512 238 198 7
16 25 50 671 242 195 11
16 25 50 832 237 198 9
as we saw here that the interval value vary between +/- 160ms, its not fixed .
the 4 first column designed a 'data time series' and the rest designed a data acceleration.
here sample rate is not constant. so my goal is how can i get a data acceleration every 160ms.
i was thinking to resample data acceleration by interpolation.
first, i convert my data to seconds
s=data(:,3)+data(:,4)/1000; % convert to seconds+fractions
dt=diff(datenum(2013,1,1,data(:,1),data(:,2),s))*86400;
t= cumsum(diff(datenum(2014,06,09,data(:,1),data(:,2),s))*86400);
sample = interp1(t,data(:,5:end),[0:160:t(end)]);
is that correct?
thanks in advance
I'm not sure if this is what you're doing already with all that diff/cumsum stuff by I would think make t start at 0:
t = datenum(2013,1,1,data(:,1),data(:,2),s)*(24*60*60);
t = t-t(1);
sample = interp1(t,data(:,5:end), 0:0.16:t(end));
The idea here is that we know we want to sample every 0.16 seconds but only relative to the starting time. So if we reset the starting time to be 0, then we can just use 0:0.16:(end time - start time) as our sampling vector. The easiest way to make the start time 0 is to simply subtract the start time from the whole time vector, hence t = t - t(1). This also has the bonus effect of make t(end) equal the end time minus the start time.

Removing some unwanted elements from the vector

I have a vector that consists of some numbers in the following way:
A = [153 244 253 353 453 530 653 ...]
The pattern is that there is always 153,253,353,...,2353 (these represent time i.e. 1:53am,...11:53pm) for a day. In between these *53 numbers there are some numbers that I don't wish to keep them. For example between 353 and 453, a 433 appears which needs to be removed from the vector. So the final result I wish to get is vector
A = [153 253 353 ...2353]
(of course in the original vector I have, this pattern for one day is repeated for a whole year).
Any thought on how to do this?
I would really appreciate any answer.
An alternative (and possibly faster) answer to Oleg is to use the modulus operator:
A((mod(A,100))==53)
Keep only the 53'' of each hour:
idx = ismember(A,53:100:2353);
A(idx)