How to do time-series simple forecast? - scala

I have a time-series uni-variate data. So just TimeStamp and Value. Now I want to extrapolate(forecast) this Value for next day/month/year. I know there are methods such as Box-jenkins (ARIMA) etc.
Spark has Linear Regression and I tried it, but I did not get satisfactory results. Did anybody tried time-series simple forecast in Spark. Can share their implementation approach?
PS: I check at User Mailing list for this issue, Almost all the questions regarding this issue are unanswered there.

Yes I have been already applied ARIMA in spark for uni variate time series.
public static void main(String args[])
{
System.setProperty("hadoop.home.dir", "C:/winutils");
SparkSession spark = SparkSession
.builder().master("local")
.appName("Spark-TS Example")
.config("spark.sql.warehouse.dir", "file:///C:/Users/abc/Downloads/Spark/sparkdemo/spark-warehouse/")
.getOrCreate();
Dataset<String> lines = spark.read().textFile("C:/Users/abc/Downloads/thunderbird/Time series/trainingvector_arima.csv");
Dataset<Double> doubleDataset = lines.map(line>Double.parseDouble(line.toString()),
Encoders.DOUBLE());
List<Double> doubleList = doubleDataset.collectAsList();
//scala.collection.immutable.List<Object> scalaList = new
Double[] doubleArray = new Double[doubleList.size()];
doubleArray = doubleList.toArray(doubleArray);
double[] values = new double[doubleArray.length];
for(int i = 0; i< doubleArray.length; i++)
{
values[i] = doubleArray[i];
}
Vector tsvector = Vectors.dense(values);
System.out.println("Ts vector:" + tsvector.toString());
//ARIMAModel arimamodel = ARIMA.fitModel(1, 0, 1, tsvector, true, "css-bobyqa", null);
ARIMAModel arimamodel = ARIMA.autoFit(tsvector, 1, 1, 1);
Vector forcst = arimamodel.forecast(tsvector, 10);
System.out.println("forecast of next 10 observations: " + forcst);
}
This code works for me. Here any values which you want to forecast pass as input data.

Related

Spark mapPartitionsToPai execution time

In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
#Override
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
//System.out.println(srcPincode+","+destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
}
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
}
}
mongoclient.close();
return list.iterator();
}
}).reduceByKey((priceRow1,priceRow2)->{
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
}).collect().forEach(tuple2->{
// Business logic to push computed price to mongodb
}
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)

Transform training data into array in Neural network

I am trying a time prediction model in deeplearning4j for text processing which takes no of words,sentences,char as input features and produces time as output.But while modelling input data to output i am having difficulties to transform these values and how to tell the network for these values of input these are respective output values.
Also should i reduce dimensionality from just having x1 and y.instead of x1-x4?
training-data.csv has the below columns with 100 values.
x1,x2,x3,x4(inputs) y(output)
I tried using SequenceRecorder and Iterator which can capture variant inputs.
below is my code
public static void main(String[] args) throws Exception
{
// Initlizing parametres
final Logger log = LoggerFactory.getLogger(MainExpert.class);
final int seed =123;
final int numInput = 4;
final int numOutput = 1;
final int numHidden = 20;
final double learningRate = 0.015;
final int batchSize =30;
final int nEpochs =30;
//final int inputFeatures =4;
//Constructing Training data
final File baseFolder =new File("/home/aj/my/samples/corpus");
final File testFolder = new File("/home/aj/my/samples/corpus/train_data_0.csv");
SequenceRecordReader trainReader = new CSVSequenceRecordReader(0,",");
trainReader.initialize(new NumberedFileInputSplit(baseFolder.getAbsolutePath() + "/train_data_%d.csv",0,0));
DataSetIterator trainIterator = new SequenceRecordReaderDataSetIterator(trainReader,batchSize,-1,4,true);
SequenceRecordReader testReader = new CSVSequenceRecordReader(0,",");
testReader.initialize(new NumberedFileInputSplit(baseFolder.getAbsolutePath() + "/test_data_%d.csv",0,0));
DataSetIterator testIterator = new SequenceRecordReaderDataSetIterator(testReader,batchSize,-1,4,true);
DataSet trainData = trainIterator.next();
System.out.println(trainData);
DataSet testData = testIterator.next();
NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
normalizer.fitLabel(true);
normalizer.fit(trainData);
normalizer.transform(trainData);
normalizer.transform(testData);
//Configuring Network
log.info("Building Model");
MultiLayerConfiguration config = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInput)
.nOut(numHidden)
.weightInit(WeightInit.XAVIER).
activation(Activation.RELU)
.build())
.layer(1, new DenseLayer.Builder()
.nIn(numHidden)
.nOut(numHidden)
.weightInit(WeightInit.XAVIER)
.activation(Activation.RELU)
.build())
.layer(2, new OutputLayer.Builder(LossFunction.MSE)
.nIn(numHidden)
.nOut(numOutput)
.weightInit(WeightInit.XAVIER)
.activation(Activation.IDENTITY)
.build())
.pretrain(false).backprop(true).build();
//Initializing network
log.info("initlizing model");
MultiLayerNetwork model = new MultiLayerNetwork(config);
model.init();
model.setListeners(new ScoreIterationListener(1));
log.info("Training Model");
for(int i=0;i<nEpochs;i++)
{
model.fit(trainData);
}
//Evaluation
RegressionEvaluation reval=new RegressionEvaluation(1);
while(testIterator.hasNext())
{
INDArray feat =testData.getFeatureMatrix();
INDArray labels =testData.getLabels();
INDArray prediction =model.output(feat);
reval.eval(labels, prediction);
}
System.out.println(reval.stats());
}
}
my data has four input values and one output values.
But i get an exception
org.deeplearning4j.exception.DL4JInvalidInputException: Input that is not a matrix; expected matrix (rank 2), got rank 3 array with shape [1, 4, 107]
We have an end to end csv classifier example here: https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/seqClassification/UCISequenceClassificationExample.java
An rnn can handle multi variate input. In fact I encourage it. Having only 1 input feature doesn't do much for you.
I don't see the need to reduce it down to x1 and y.

EPPlus chart(pie,barchart) selected(B2,B36,B38) .. etc excel cells

I have similar to the link below problem.
EPPlus chart from list of single excel cells. How?
I tried the code but it shows it twice in the chart. For example:
This code show excel chart -> select data-> horizontal(category) axis labels tab you show 100,100,300,600 write. What is the reason for this? The chart is written twice the first data I did not find a solution to the problem.
I think you just discovered a bug with EPPlus. Shame on me for not noticing that with that post you reference. It seems that when using the Excel union range selector (the cell names separated by commas) the iterator for the ExcelRange class returns a double reference to the first cell, in this case B2.
A work around would be to use the other overload for Series.Add which will take two string ranges. Here is a unit test that show the problem and the workaround:
[TestMethod]
public void Chart_From_Cell_Union_Selector_Bug_Test()
{
var existingFile = new FileInfo(#"c:\temp\Chart_From_Cell_Union_Selector_Bug_Test.xlsx");
if (existingFile.Exists)
existingFile.Delete();
using (var pck = new ExcelPackage(existingFile))
{
var myWorkSheet = pck.Workbook.Worksheets.Add("Content");
var ExcelWorksheet = pck.Workbook.Worksheets.Add("Chart");
//Some data
myWorkSheet.Cells["A1"].Value = "A";
myWorkSheet.Cells["A2"].Value = 100; myWorkSheet.Cells["A3"].Value = 400; myWorkSheet.Cells["A4"].Value = 200; myWorkSheet.Cells["A5"].Value = 300; myWorkSheet.Cells["A6"].Value = 600; myWorkSheet.Cells["A7"].Value = 500;
myWorkSheet.Cells["B1"].Value = "B";
myWorkSheet.Cells["B2"].Value = 300; myWorkSheet.Cells["B3"].Value = 200; myWorkSheet.Cells["B4"].Value = 1000; myWorkSheet.Cells["B5"].Value = 600; myWorkSheet.Cells["B6"].Value = 500; myWorkSheet.Cells["B7"].Value = 200;
//Pie chart shows with EXTRA B2 entry due to problem with ExcelRange Enumerator
ExcelRange values = myWorkSheet.Cells["B2,B4,B6"]; //when the iterator is evaluated it will return the first cell twice: "B2,B2,B4,B6"
ExcelRange xvalues = myWorkSheet.Cells["A2,A4,A6"]; //when the iterator is evaluated it will return the first cell twice: "A2,A2,A4,A6"
var chartBug = ExcelWorksheet.Drawings.AddChart("Chart BAD", eChartType.Pie);
chartBug.Series.Add(values, xvalues);
chartBug.Title.Text = "Using ExcelRange";
//Pie chart shows correctly when using string addresses and avoiding ExcelRange
var chartGood = ExcelWorksheet.Drawings.AddChart("Chart GOOD", eChartType.Pie);
chartGood.SetPosition(10, 0, 0, 0);
chartGood.Series.Add("Content!B2,Content!B4,Content!B6", "Content!A2,Content!A4,Content!A6");
chartGood.Title.Text = "Using String References";
pck.Save();
}
}
Here is the output:
I will post it as an issue on their codeplex page to see if they can get it fixed for the next release.

Unable to add data in archive table in Entity Framework

I wrote the code to update my table (SecurityQuestionAnswer) with new security password questions and move to old questions to another table (SecurityQuestionAnswersArchives). Total no of security questions is 3. I am able to update the current table, but when I add the same rows to history table, it shows weird data: only two records are added instead of 3 and the data is also duplicated. My code is as follows:
if (oldQuestions.Any())
{
var oldquestionstoarchivelist = new List<SecurityQuestionAnswersArchives>();
var oldquestionstoarchive =new SecurityQuestionAnswersArchives();
for (int i = 0; i < 3; i++)
{
oldquestionstoarchive.Id = oldQuestions[i].Id;
oldquestionstoarchive.SecurityQuestionId = oldQuestions[i].SecurityQuestionId;
oldquestionstoarchive.Answer = oldQuestions[i].Answer;
oldquestionstoarchive.UpdateDate = oldQuestions[i].UpdateDate;
oldquestionstoarchive.IpAddress = oldQuestions[i].IpAddress;
oldquestionstoarchive.SecurityQuestion = oldQuestions[i].SecurityQuestion;
oldquestionstoarchive.User = oldQuestions[i].User;
oldquestionstoarchivelist.Add(oldquestionstoarchive);
}
user.SecurityQuestionAnswersArchives = oldquestionstoarchivelist;
//await Store.UpdateAsync(user);
_dbContext.ArchiveSecurityQuestionAnswers.AddRange(oldquestionstoarchivelist);
_dbContext.SecurityQuestionAnswers.RemoveRange(oldQuestions);
await _dbContext.SaveChangesAsync();
oldquestionstoarchivelist.Clear();
}
UPDATE 1
The loop looks fine, It iterates three times(0,1,2), which is expected. First issue is with AddRange function to which I was passing a list , but it takes an IEnumerable input, I rectified it using following code.
IEnumerable<SecurityQuestionAnswersArchives> finalArchiveses = oldquestionstoarchivelist;
_dbContext.ArchiveSecurityQuestionAnswers.AddRange(finalArchiveses);
The other issue is duplicate data , which I am unable to figure out where the issue is. Please help me in finding this out.
Your help is much appreciated !
Got it ! Just sharing in case anybody has same issue.
The problem was with initialization at wrong place. I moved
var oldquestionstoarchive =new SecurityQuestionAnswersArchives();
in side the Forloop, now the variable will hold the unique values over each iteration.
var oldquestionstoarchivelist = new List<SecurityQuestionAnswersArchives>();
for (int i = 0; i < 3; i++)
{
var oldquestionstoarchive = new SecurityQuestionAnswersArchives();
oldquestionstoarchive.SecurityQuestionId = oldQuestions[i].SecurityQuestionId;
oldquestionstoarchive.Answer = oldQuestions[i].Answer;
oldquestionstoarchive.UpdateDate = oldQuestions[i].UpdateDate;
oldquestionstoarchive.IpAddress = oldQuestions[i].IpAddress;
oldquestionstoarchive.SecurityQuestion = oldQuestions[i].SecurityQuestion;
oldquestionstoarchive.User = oldQuestions[i].User;
oldquestionstoarchivelist.Add(oldquestionstoarchive);
}

Devexpress xtraChart How to assign x and y values?

I have values in datasource and there is no problem with datasource. I have to assign X and Y values to the chart. Chart throws an error and says there is no column with named "TotalInboundArrivals".
ChartControl chart = new ChartControl();
chart.Location = new Point(38, 301);
chart.Size = new Size(789, 168);
Series series = new Series("Series1", ViewType.Bar);
chart.Series.Add(series);
series.DataSource = ds;
series.ArgumentScaleType = ScaleType.Numerical;
series.ArgumentDataMember = "TotalInboundArrivals"; //throws error here
series.ValueScaleType = ScaleType.Numerical;
series.ValueDataMembers.AddRange(new string[] { "StartTime" }); //throws error here
((SideBySideBarSeriesView)series.View).ColorEach = true;
((XYDiagram)chart.Diagram).AxisY.Visible = true;
chart.Legend.Visible = true;
chart.Visible = true;
chart.Dock = DockStyle.Fill;
xtraTabPage1.Controls.Add(chart);
Where is my Problem? Any Suggestions?
Have you went through Series.DataSource Property. You are making the mistake of assigning DataSet as DataSource to series. Think about it, how could it search columns in Data Source. Try to assign Ds.Tables["TableName"] as the datasource.
Creating DataSource Table
private DataTable CreateChartData(int rowCount) {
// Create an empty table.
DataTable table = new DataTable("Table1");
// Add two columns to the table.
table.Columns.Add("Argument", typeof(Int32));
table.Columns.Add("Value", typeof(Int32));
// Add data rows to the table.
Random rnd = new Random();
DataRow row = null;
for (int i = 0; i < rowCount; i++) {
row = table.NewRow();
row["Argument"] = i;
row["Value"] = rnd.Next(100);
table.Rows.Add(row);
}
Specifying Series properties corresponding to datasource
Series series = new Series("Series1", ViewType.Bar);
chart.Series.Add(series);
// Generate a data table and bind the series to it.
series.DataSource = CreateChartData(50);
// Specify data members to bind the series.
series.ArgumentScaleType = ScaleType.Numerical;
series.ArgumentDataMember = "Argument";
series.ValueScaleType = ScaleType.Numerical;
series.ValueDataMembers.AddRange(new string[] { "Value" });
Check the Examples and go through the Creating Charts -> Providing Data section to better understand it.
Reference
Hope this helps.