Java Non-negative multiple linear regression library - linear-regression

I am working on a Java project, and I have to compute a multiple linear regression, but I want the gotten parameters to be non-negative. Is there an existing commercial-friendly-licensed library to do such a thing? I've been looking for Non-Negative Least Squares libs, without success.

Well, I could not find any pure java library so I built it myself from the article of [1], wich can be found in [2] and [3]. I give the algorithm:
P, R are the active and the passive sets. t() is transpose
The problem is to solve Ax = b under the condition x>0
P=null
R = {1,2,...,m}
x = 0
w = t(A)*(b-A*x)
while R<>null and max{wi|i in R}>0 do:
j = argmax{wi|i in R}
P = P U {j}
R = R\{j}
s[P] = invert[t(A[P])A[P]]t(A[P])b
while sp<=0 do:
a = -min{xi/(di-xi)|i in P and di<0}
x = x + a*s -x
update(P)
update(R)
sP = invert[t(A[P])A[P]]t(A[P])b
sR = 0
x = s
w = t(A)*(b-A*x)
return x
For the other definitions, I strongly advise to read the papers [2] and [3], which are online (see below for the links ;) )
[1] Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems (Vol. 161). Englewood Cliffs, NJ: Prentice-hall.
[2] Rasmus Bro et Sijmen De Jong : A fast non-negativity-constrained least squares
algorithm. Journal of chemometrics, 11(5) :393–401, 1997. http://www.researchgate.net/publication/230554373_A_fast_non-negativity-constrained_least_squares_algorithm/file/79e41501a40da0224e.pdf
[3] Donghui Chen et Robert J Plemmons : Nonnegativity constraints in numerical analysis. In Symposium on the Birth of Numerical Analysis, pages 109–140, 2009. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.9203&rep=rep1&type=pdf

You can use Apache Commons Math, making your constraints an additional cost to the objective function. See section 14.4 here: http://commons.apache.org/proper/commons-math/userguide/leastsquares.html

Have your tried Weka? It's Java and under GNU General Public License. It's mainly a GUI-Tool for experiments, but you can use it as a library too. It should have implementations of linear regressions.

As George Foreman pointed out you can use apache commons math.
In particular there is the object OLSMultipleLinearRegression which provides tha methods for performing multiple regression analysis.
Here is some code on how to do it.
OLSMultipleLinearRegression ols = new OLSMultipleLinearRegression();
double[] data = new double[75];
int numberOfObservations = 25;
int numberOfIndependentVariables = 3;
try {
ols.newSampleData(data, numberOfObservations, numberOfIndependentVariables);
} catch (IllegalArgumentException e) {
e.printStackTrace();
return;
}
And here is the whole github project from where you can download a working example on how to use multiple regression in Java : https://github.com/tekrar/MultRegressionInJava

Related

Has anyone used Dijkstra's algorithm in OPL?

I have a model for a mining problem. I am working on adding into the model to use the shortest path inside a mine(open pit) for hauling ore and waste. For this, I was thinking of Dijkstra's algorithm. I could not find any example of the use of Dijkstra's algorithm in OPL. Has anyone done it before and can you share some ideas, please.
if you need to write Dijsktra's algorithm then Daniel is right and you d rather use the scripting part. Now if you need a shortest path within an existing OPL model you could use the following shortest path example:
.mod
tuple edge
{
key int o;
key int d;
int weight;
}
{edge} edges=...;
{int} nodes={i.o | i in edges} union {i.d | i in edges};
int st=1; // start
int en=8; // end
dvar int obj; // distance
dvar boolean x[edges]; // do we use that edge ?
minimize obj;
subject to
{
obj==sum(e in edges) x[e]*e.weight;
forall(i in nodes)
sum(e in edges:e.o==i) x[e]
-sum(e in edges:e.d==i) x[e]
==
((i==st)?1:((i==en)?(-1):0));
}
{edge} shortestPath={e | e in edges : x[e]==1};
execute
{
writeln(shortestPath);
}
.dat
edges=
{
<1,2,9>,
<1,3,9>,
<1,4,8>,
<1,10,18>,
<2,3,3>,
<2,6,6>,
<3,4,9>,
<3,5,2>,
<3,6,2>,
<4,5,8>,
<4,7,7>,
<4,9,9>,
<4,10,10>,
<5,6,2>,
<5,7,9>,
<6,7,9>,
<7,8,4>,
<7,9,5>,
<8,9,1>,
<8,10,4>,
<9,10,3>,
};
which gives
// solution (optimal) with objective 19
{<1 4 8> <4 7 7> <7 8 4>}
If you have a problem that can be solved using Dijkstra's algorithm then it seems a bit of overkill to use OPL or CPLEX to solve it. You could code up the algorithm in any programming language and use it from there. I guess that is why you don't find any examples.
If you still want to implement in OPL then use a scripting (execute) or a main block. The scripting code you can provide there is a superset of JavaScript, so you can implement Dijkstra's algorithm in JavaScript and put it there.

How to forecast electricity consumption using MATLAB's command "forecast"?

I have the data of electricity consumption of a region during the year of 2017. So I have to matrix 1x1, one with the month and other with the consumption. I want to use the command forecast to forecast the consumption of the first month of 2018, but I don't know how to do this even after reading the examples on MATLAB's help page.
Example:
data = {1166974.25000000, 1132479.36000000, 1137173.86000000, 1145853.58000000, 1118875.72000000, 1071456.85000000 ,1047171.87000000, 1071179.65000000 ,1077986.32000000 ,1112111.10000000, 1149668.47000000 ,1161649.19000000, 1175576.25000000 ,1126753.31000000 ,1204843.11000000 ,1183946.03000000, 1153080.36000000, 1120182.07000000, 1104726.03000000 ,1108110.02000000 ,1137729.28000000 ,1189699.45000000, 1252975.55000000, 1218118.20000000 ,1259580 ,1208193 ,1194430, 1244458, 1218867, 1205705 ,1177362, 1185584, 1164758, 1226991 ,1286044 ,1305312, 1360681.70000000 ,1332020 ,1306497.90000000 ,1299819.10000000 ,1316167.70000000 ,1246959.40000000 ,1256700.20000000 ,1266490.60000000, 1275642.90000000, 1358839.80000000, 1361440.10000000, 1398059.40000000};
data = [data{:}];
sys = ar(data,4)
K = 49;
p = forecast(sys,data,K);
plot(data,'b',p,'r'), legend('measured','forecasted')
Why does this not work?
I hope you found a solution to your problem. If you have not, maybe I can be of assistance.
MathWork's documentation of the function notes that the "PastData" entry (labeled "data" in your code) can either be an iddata object or an N x N_y matrix of doubles. Your implementation uses a matrix, so I decided to try out the code with an iddata object.
rawdat = [1166974.25000000, 1132479.36000000, 1137173.86000000, 1145853.58000000, 1118875.72000000, 1071456.85000000 ,1047171.87000000, 1071179.65000000 ,1077986.32000000 ,1112111.10000000, 1149668.47000000 ,1161649.19000000, 1175576.25000000 ,1126753.31000000 ,1204843.11000000 ,1183946.03000000, 1153080.36000000, 1120182.07000000, 1104726.03000000 ,1108110.02000000 ,1137729.28000000 ,1189699.45000000, 1252975.55000000, 1218118.20000000 ,1259580 ,1208193 ,1194430, 1244458, 1218867, 1205705 ,1177362, 1185584, 1164758, 1226991 ,1286044 ,1305312, 1360681.70000000 ,1332020 ,1306497.90000000 ,1299819.10000000 ,1316167.70000000 ,1246959.40000000 ,1256700.20000000 ,1266490.60000000, 1275642.90000000, 1358839.80000000, 1361440.10000000, 1398059.40000000];
data = iddata(rawdat',[]);
sys = ar(data,4);
K = 49;
p = forecast(sys,data,K);
plot(data,'b',p,'r'), legend('measured','forecasted')
Notice that I also changed the initial data's variable name and type.
The above code leads to the following figure.
Please update us. Thanks.

AR terms in SUR models - Matlab

I am trying to estimate a SUR model of the form
y_{1,t} = \alpha_1 +\beta_1 x_{1,t} + \beta_2 x_{2,t} + \beta_3 y_{1,t-1} +\epsilon_{1,t}
y_{2,t} = \alpha_2 +\beta_4 x_{1,t} + \beta_5 x_{2,t} + \beta_6 y_{2,t-1} +\epsilon_{2,t}
Define mY = [y_1 y_2] and mX = [x_1 x_2].
For this purpose I am doing
iT = size(mY,1); iN = size(mY,2);
mXsur = kron(mX, eye(iN));
mXsurCell = mat2cell(mXsur, iN*ones(iT,1));
iR = size(mXsur,2);
Mdl = vgxset('n', iN, 'nAR',1, 'nX',iR,'Constant',true);
[SurOutput, SurSDerror, ~,SURcov] = vgxvarx(Mdl, mY, mXsurCell);
The issue is that the bit of code nAR, 1 seems to add 1 lag of both y variables to each equation and I only wish to add one per equation. Is there a quick way to do this?
(Of course I can include the lagged terms manually in the mX matrix, but my question is whether we can do this via vgxset in a quicker way. I think not based on my reading of the help file, but still want to double check). Thanks

How to select the last column of numbers from a table created by FoldList in Mathematica

I am new to Mathematica and I am having difficulties with one thing. I have this Table that generates 10 000 times 13 numbers (12 numbers + 1 that is a starting number). I need to create a Histogram from all 10 000 13th numbers. I hope It's quite clear, quite tricky to explain.
This is the table:
F = Table[(Xi = RandomVariate[NormalDistribution[], 12];
Mu = -0.00644131;
Sigma = 0.0562005;
t = 1/12; s = 0.6416;
FoldList[(#1*Exp[(Mu - Sigma^2/2)*t + Sigma*Sqrt[t]*#2]) &, s,
Xi]), {SeedRandom[2]; 10000}]
The result for the following histogram could be a table that will take all the 13th numbers to one table - than It would be quite easy to create an histogram. Maybe with "select"? Or maybe you know other ways to solve this.
You can access different parts of a list using Part or (depending on what parts you need) some of the more specialised commands, such as First, Rest, Most and (the one you need) Last. As noted in comments, Histogram[Last/#F] or Histogram[F[[All,-1]]] will work fine.
Although it wasn't part of your question, I would like to note some things you could do for your specific problem that will speed it up enormously. You are defining Mu, Sigma etc 10,000 times, because they are inside the Table command. You are also recalculating Mu - Sigma^2/2)*t + Sigma*Sqrt[t] 120,000 times, even though it is a constant, because you have it inside the FoldList inside the Table.
On my machine:
F = Table[(Xi = RandomVariate[NormalDistribution[], 12];
Mu = -0.00644131;
Sigma = 0.0562005;
t = 1/12; s = 0.6416;
FoldList[(#1*Exp[(Mu - Sigma^2/2)*t + Sigma*Sqrt[t]*#2]) &, s,
Xi]), {SeedRandom[2]; 10000}]; // Timing
{4.19049, Null}
This alternative is ten times faster:
F = Module[{Xi, beta}, With[{Mu = -0.00644131, Sigma = 0.0562005,
t = 1/12, s = 0.6416},
beta = (Mu - Sigma^2/2)*t + Sigma*Sqrt[t];
Table[(Xi = RandomVariate[NormalDistribution[], 12];
FoldList[(#1*Exp[beta*#2]) &, s, Xi]), {SeedRandom[2];
10000}] ]]; // Timing
{0.403365, Null}
I use With for the local constants and Module for the things that are other redefined within the Table (Xi) or are calculations based on the local constants (beta). This question on the Mathematica StackExchange will help explain when to use Module versus Block versus With. (I encourage you to explore the Mathematica StackExchange further, as this is where most of the Mathematica experts are hanging out now.)
For your specific code, the use of Part isn't really required. Instead of using FoldList, just use Fold. It only retains the final number in the folding, which is identical to the last number in the output of FoldList. So you could try:
FF = Module[{Xi, beta}, With[{Mu = -0.00644131, Sigma = 0.0562005,
t = 1/12, s = 0.6416},
beta = (Mu - Sigma^2/2)*t + Sigma*Sqrt[t];
Table[(Xi = RandomVariate[NormalDistribution[], 12];
Fold[(#1*Exp[beta*#2]) &, s, Xi]), {SeedRandom[2];
10000}] ]];
Histogram[FF]
Calculating FF in this way is even a little faster than the previous version. On my system Timing reports 0.377 seconds - but such a difference from 0.4 seconds is hardly worth worrying about.
Because you are setting the seed with SeedRandom, it is easy to verify that all three code examples produce exactly the same results.
Making my comment an answer:
Histogram[Last /# F]

how can i swap value of two variables without third one in objective c

hey guys i want your suggestion that how can change value of two variables without 3rd one. in objective cc.
is there any way so please inform me,
it can be done in any language. x and y are 2 variables and we want to swap them
{
//lets say x , y are 1 ,2
x = x + y; // 1+2 =3
y = x - y; // 3 -2 = 1
x = x -y; // 3-1 = 2;
}
you can use these equation in any language to achieve this
Do you mean exchange the value of two variables, as in the XOR swap algorithm? Unless you're trying to answer a pointless interview question, programming in assembly language, or competing in the IOCCC, don't bother. A good optimizing compiler will probably handle the standard tmp = a; a = b; b = tmp; better than whatever trick you might come up with.
If you are doing one of those things (or are just curious), see the Wikipedia article for more info.
As far as number is concerned you can swap numbers in any language without using the third one whether it's java, objective-C OR C/C++,
For more info
Potential Problem in "Swapping values of two variables without using a third variable"
Since this is explicitly for iPhone, you can use the ARM instruction SWP, but it's almost inconceivable why you'd want to. The complier is much, much better at this kind of optimization. If you just want to avoid the temporary variable in code, write an inline function to handle it. The compiler will optimize it away if it can be done more efficiently.
NSString * first = #"bharath";
NSString * second = #"raj";
first = [NSString stringWithFormat:#"%#%#",first,second];
NSRange needleRange = NSMakeRange(0,
first.length - second.length);
second = [first substringWithRange:needleRange];
first = [first substringFromIndex:second.length];
NSLog(#"first---> %#, Second---> %#",first,second);