Getting started with LibSVM for a particular case - neural-network

I have read quite a lot about LibSVM library, but I would like to ask you for some advices in my particular case. The problem is that I have some 3D medical images (DCE-MRI) of a stomach. My goal is to perform a segmentation of a kidney, and find its three parts. Therefore, I need to train a classifier - I'm going to use SVM and neural network
Feature vectors:
What is available is the pixel (voxel) brightness value (I guess the value range is [0; 511]). In total, there are 71 frames, each taken every second. So the crucial feature of every voxel is how the voxel brightness/intensity is changing during the examination time. In my case, every part of a kidney has a different chart (see an example below), so the way how the voxels brightness is changing over the time will be used by the classifier.
Training sets:
Every training set is a vector of intensity value of one voxel (74 numbers). An example is presented below:
[22 29 21 7 19 12 23 25 33 28 25 5 21 18 27 21 11 11 26 12 12 31 15 15 12 29 17 34 30 11 12 24 35 28 27 26 29 22 15 23 24 14 14 37 241 313 350 349 382 402 333 344 332 366 339 383 383 379 394 398 402 357 346 379 365 376 366 365 360 363 376 383 389 385]
Summary and question to you:
I have many training sets consisting of 74 values from the range [0; 511]. I have 3 groups of voxels, which have a characteristic feature - the brightness is changing in the similar way. What I want to obtain is a classificator, which after getting one voxel vector with 74 numbers, will assess if the voxel belongs to one of these 3 groups, or to none of them.
Question: how to start with LibSVM, any advices? From what I know now is that I should transform input values to be from the range [0; 1] or [-1; 1]. I have many training sets prepared belonging to one of these 3 groups. I will be grateful for any advice, as I'm a newbie and I just need some tips just to start.

You can train and use you model like this:
model=svmtrain(train_label,train_feature,'-c 1 -g 0.07 -h 0');
% the parameters can be modified
[label, accuracy, probablity]=svmpredict(test_label,test_feaure,model);
train_label must be a vector,if there are more than two kinds of input(0/1),it will be an nSVM automatically. If you have 3 classes, you can label them using {1,2,3}.Its length is equal to the number of samples.
The feature is not restricted. It can be what ever you want.
However, you'd better preprocess them to make the results better. For example, you can change range[0:511] to range[0:1] or minus the mean of the feature.
Notice that the testset data should be preprocessed in the same way.
Hope this will help you!

Related

Stopping criteria for fminsearch in Matlab

I am using fminsearch to fit parameters for a system of DEs to observed data. I am not expecting to get a great fit.
fminsearch pretty quickly finds what appears to be an acceptable min for the objective function, but then does not stop. It's running for a really long time, and I cannot figure out why.
I am using the options
options = optimset('Display','iter','TolFun',1e-4,'TolX',1e-4,'MaxFunEvals',1000);
which I understood to mean that when the value of the objective function drops to below 1e-4 that would be considered sufficient. Alternatively when they could no longer change the parameters whatever is the best would be returned.
The output is
Iteration Func-count min f(x) Procedure
0 1 8.13911e+10
1 8 7.2565e+10 initial simplex
2 9 7.2565e+10 reflect
3 10 7.2565e+10 reflect
4 11 7.2565e+10 reflect
5 12 7.2565e+10 reflect
6 13 7.2565e+10 reflect
7 15 6.85149e+10 expand
8 16 6.85149e+10 reflect
9 17 6.85149e+10 reflect
10 19 6.20681e+10 expand
11 20 6.20681e+10 reflect
12 22 5.55199e+10 expand
13 23 5.55199e+10 reflect
14 25 4.86494e+10 expand
15 26 4.86494e+10 reflect
16 27 4.86494e+10 reflect
17 29 3.65616e+10 expand
18 30 3.65616e+10 reflect
19 31 3.65616e+10 reflect
20 33 2.82946e+10 expand
21 34 2.82946e+10 reflect
22 36 2.02985e+10 expand
23 37 2.02985e+10 reflect
24 39 1.20011e+10 expand
25 40 1.20011e+10 reflect
26 41 1.20011e+10 reflect
27 43 5.61651e+09 expand
28 44 5.61651e+09 reflect
29 45 5.61651e+09 reflect
30 47 2.1041e+09 expand
31 48 2.1041e+09 reflect
32 49 2.1041e+09 reflect
33 51 5.15751e+08 expand
34 52 5.15751e+08 reflect
35 53 5.15751e+08 reflect
36 55 7.99868e-05 expand
37 56 7.99868e-05 reflect
38 58 7.99835e-05 reflect
39 59 7.99835e-05 reflect
I have previously let this run for a lot longer and it's stuck with the same min f(x) for at least the next 30 print outs.
How do I set the options correctly so that when it finds a solution within an acceptable value for the objective function it stops?
Matlab requires that both TolX AND TolFun be satisfied before terminating ("Unlike other solvers, fminsearch stops when it satisfies both TolFun and TolX." See: https://www.mathworks.com/help/matlab/ref/fminsearch.html). You should check what the "x" value (your solution) is doing. I suspect that is changing more than your tolerance specification for each step. (i.e. the value of x is changing more than TolX between iterations but f(x) is not changing by more than TolFun).

CPU and Memory Friendly Solution to Merge Large Matrix

For the following typical case:
n = 1000000;
r = randi(n,n,2);
(assume there are 0.05% common numbers between all rows; n could be even tens of millions)
I am looking for a CPU and Memory efficient solution to merge rows based on any common items (here integer numbers). A list of sample codes in Python is available here and a quick try to translate one into Matlab can be found here.
In my attempt they take ages (minutes to hours), so I am in favor of finding faster solution.
For the above example, the typical output should look like (cell):
{
[1 90 34 67 ... 9]
[35 89]
[45000 23 828 130 8999 45326 ... 11]
...
}
Note also that, I have tried to compile as mex but failed due to no-support for cell in Matlab-Coder.
Edit: A tiny demonstration example
%---------------------------------------
clc
n = 100;
r = randi(n,n,2); % random integers in [1,n], size(n,2)
%---------------------------------------
>> r
r =
82 17 % (1) 82 17
91 13 % (2) 91 13
13 32 % (3) 91 13 32 merged with (2), common 13
82 53 % (4) 82 17 53 merged with (1), common 82
64 17 % (5) 82 17 53 64 merged with (4), common 17
...
94 45
13 31 % (77) 91 13 32 31 merged with (3), common 13
57 51
47 52
2 13 % (80) 91 13 32 31 2 merged with (77), common 13
34 80
%---------------------------------------
c = merge(r); % cpu and memory friendly solution is searched for.
%---------------------------------------
c =
[82 17 53 64]
[91 13 32 31 2]
...
You need an index.
In Python, use a dict. In MATLAB - I'd not use MATLAB, because open-source is the future, and MATLAB is dying out.
But Python is quite slow. You can likely get a 10x speedup by using e.g. Cython to translate and optimize the code in C. Avoid using Python data types such as a list of int, because they are very memory intensive. numpy has memory-efficient arrays of integer.
If you get a new pair (a,b) you can use this dictionary to find existing items to merge. Then update the dict after the merge.
Actually for integers, you should use an array instead of a dict.
The trickiest part is handling the case when both a and b exist, but are large different groups. There are some neat optimizations possible here, if that isn't fast enough yet.
It's not clustering, but connected components.

How to display all x-labels on 'bar' plot?

I have the following data that I wish to plot in a bar graph in MatLab:
publications = [15 12 35 12 19 14 21 15 7 16 40 28 6 13 16 6 7 22 23 16 45];
bar(publications,0.4)
set(gca,'XTickLabel',{'G1','G2','G3','G4','G5','G6','G7','G8','G9','G10',...
'G11','G12','G14','G16','G17','G18','G19','G20','G21','G22','G23'})
However, when I execute this, I get the following plot:
Obviously the x-label is incorrect here as the first bar should have the x-label 'G1', the second should have 'G2', etc, until we get to the last bar which is supposed to have 'G23'.
If anyone knows how I can fix this, I would really, really appreciate it!
Add the following line:
set(gca,'XTick',1:numel(publications))
before you set the labels.
Now it depends how big your resulting plot is, because the labels are a little packed.
You may adjust fontsize or Orientation or the gaps between the bars.
Probably the publication names are a little longer so a 90° rotation is the best and you may find this answer or this link helpful.
Another suggestion would be to use barh and rotate after print:
publications = [15 12 35 12 19 14 21 15 7 16 40 28 6 13 16 6 7 22 23 16 45];
bh = barh(publications,0.4)
set(gca','XAxisLocation','top')
set(gca,'YTick',1:numel(publications))
set(gca,'YTickLabel',{'G1','G2','G3','G4','G5','G6','G7','G8','G9','G10',...
'G11','G12','G14','G16','G17','G18','G19','G20','G21','G22','G23'})

irregular time series data interpolation

i'm a newbie in Matlab. after using a specific application, i get a file which contains a data acceleration recorded every 160ms.
16 25 50 32 234 199 6
16 25 50 192 240 196 3
16 25 50 352 236 199 8
16 25 50 512 238 198 7
16 25 50 671 242 195 11
16 25 50 832 237 198 9
as we saw here that the interval value vary between +/- 160ms, its not fixed .
the 4 first column designed a 'data time series' and the rest designed a data acceleration.
here sample rate is not constant. so my goal is how can i get a data acceleration every 160ms.
i was thinking to resample data acceleration by interpolation.
first, i convert my data to seconds
s=data(:,3)+data(:,4)/1000; % convert to seconds+fractions
dt=diff(datenum(2013,1,1,data(:,1),data(:,2),s))*86400;
t= cumsum(diff(datenum(2014,06,09,data(:,1),data(:,2),s))*86400);
sample = interp1(t,data(:,5:end),[0:160:t(end)]);
is that correct?
thanks in advance
I'm not sure if this is what you're doing already with all that diff/cumsum stuff by I would think make t start at 0:
t = datenum(2013,1,1,data(:,1),data(:,2),s)*(24*60*60);
t = t-t(1);
sample = interp1(t,data(:,5:end), 0:0.16:t(end));
The idea here is that we know we want to sample every 0.16 seconds but only relative to the starting time. So if we reset the starting time to be 0, then we can just use 0:0.16:(end time - start time) as our sampling vector. The easiest way to make the start time 0 is to simply subtract the start time from the whole time vector, hence t = t - t(1). This also has the bonus effect of make t(end) equal the end time minus the start time.

Rearrange distribution function Matlab

I have the following data representing values over a 12 month period:
1. 0
2. 253
3. 168
4. 323
5. 556
6. 470
7. 225
8. 445
9. 98
10. 114
11. 381
12. 187
How can I smooth this line forward?
What I need is that going through the list sequentially any value that is above the mean (268) be evenly distributed amongst the remaining months- but in such a way that it produces as smooth a line as possible. I need to go through from Jan to Dec in order. Looking forward I want to sweep any excess (peaks) into the months still to come such that the distribution is as even as possible (such that the troughs are filled first). So the issue is to, at each point, determine what the "excess" for that month is and secondly how to distribute that amongst the months still to come.
I have used
p = find(Diff>0);
n = find(Diff<=0);
POS = Diff(p,1);
NEG = Diff(n,1)
to see where shortfall/ excesses against the mean exist but unsure how to construct a code that will redistribute forward by allocating to the "troughs" of the distribution first. An analogy is that these numbers represent harvest quantities and I must give out the harvest to the population. How do I redistribute the incoming harvest over the year such that I minimise excess supply/ under supply? I obviously cannot give out anything I haven't received in a particular month unless I have stored some harvest from previous months.
e.g. I start in Jan, I see that I cannot give anything to the months Feb to Dec so the value for Jan is 0. In Feb I have 253- do I adjust 253 downwards or give it all out? If so by how much? and where do I redistribute the excess I trim between Mar to Dec? And so on and so forth.. How do I do this to give as smooth (even) a distribution as possible?
For any month the new value assigned to that month cannot exceed the current value. The sum for the 12 months must be equal before and after smoothing. As the first position January will always be 0.
Simple version, just loops through and if the next month is lower than the current month, passes value forward to equalise them.
for n = 1:11
if y(n)>y(n+1);
y(n:n+1)=(y(n)+y(n+1))/2;
end
end
It's not very clear to me what you're asking...It sounds a bit like a roundabout way of asking how to fit a straight line to data. If that is the case, see below. Otherwise: please clarify a bit more what you want. Provide a toy example input data, and expected output data.
y = [ 0 253 168 323 556 470 225 445 98 114 381 187 ].';
x = (0:numel(y)-1).';
A = [ones(size(x)) x];
plot(...
x, y, 'b.',...
x, A*(A\y), 'r')
xlabel('Month'), ylabel('Data')
legend('original data', 'fit')
I dont get exactly what you want either, maybe something simple like this?
year= [0 253 168 323 556 470 225 445 98 114 381 187];
m= mean(year);
total_before = sum(year)
linear_year = linspace(0,m*2,12);
toal_after= sum(linear_year)
this gives you a line, the sum stays the same and the line is perfectly smooth ...