Apache Spark ALS - how is it solving the least square? - scala

The source code for the Apache Spark ALS can be found here.
I am wondering where the Least Squares solving is going on in this source code? I can't find it for the life of me.
When following a tutorial/walkthrough on Collaborative Filtering, it shows that to perform the ALS function on some ratings you call ALS.train(ratings, rank, numIterations, lambda). Checking the source code and the train function calls the run function which returns a MatrixFactorizationModel with the predicted ratings in it.
Additionally, the API for ALS (found here) says there is a method called solveLeastSquares but it isn't in the source code found in the first link. I would like to learn how the least squares problem is being solved so I can adjust it as necessary.

From the documentation:
(Breaking change) In ALS, the extraneous method solveLeastSquares has been removed. The DeveloperApi method analyzeBlocks was also removed.
However, you can change the branch to be 1.1 per the docs you referenced and you will see the solveLeastSquares method

Related

How do I make use of ILocation source and target in custom routing?

This is my sample network and idea of trying how to make a new routing instead of just the shortest path (the path i want to follow is via the pink arrows)
What am I missing here to make my predefined function work?
Fixing the basics
As explained here, you can't instantiate a Java List, because it is an interface. You can however instantiate any implementing Class of a List, for example an ArrayList.
With this in mind your code will look like this:
List<Path> myPath = new ArrayList<Path>();
myPath.add(path14);
myPath.add(path8);
myPath.add(path);
myPath.add(path1);
myPath.add(path4);
myPath.add(path13);
return myPath;
So far for the basics.
Where to go from here
To get it to consider your actual source and destination for the route planning, define both as input parameters of type ILocation in the properties of the function.
Now comes the really tricky part: writing your own or importing a routing algorithm that can give you that list of paths automatically based on criteria that you define. This is however a topic too broad for this question. The basic steps will be:
Create a graph that represents your AnyLogic path network
Solve the graph routing problem with a solving algorithm (eg. Dijkstra Algorithm), using the graph, the startpoint and the endpoint
Convert the solution you get from the solver back again to an ArrayList that you can work with in AnyLogic
You can do these steps on your own, eg. by implementing the Dijkstra Algorithm yourself, or you import into AnyLogic one of the available graph solving Java packages like JUNG or Graphhopper. In this article I explain step by step how to do so with JUNG.

How do I find selfor in Octave?

I need to implement unsupervised neural network using Octave. For that, I need to use "selforgmap" function. How do I find that function in octave or what are the packages include this function?
When I use "selforgmap", I got an error like this.
selforgmap
error: 'selforgmap' undefined near line 1 column 1
help selforgmap
error: help: 'selforgmap' not found
As of now there does not appear to be any implementation of selforgmap in octave or any of it's packages. The current neural net package, nnet, can be found at Octave Forge and the Function Reference link will show you everything currently included.
The link Andy commented with above to a current reworking if the nnet package also does not currently include selforgmap, but this could obviously change. The included function files can be seen inside the inst folder.
if MATLAB's selforgmap is not an option for you, you will either need to code your own implementation or switch bto another programming language. A quick search does reveal a Python implementation of selforgmap that may serve your purpose.

how to pass data in Zeppelin to DS3.js for Spark visualization

The graph options with Zeppelin are pretty basic. So I am looking for an example of how to do something simple, like a barchart, with ds3.js. From what I can tell that would be the best graphing library to use to create stunning graphs.
Anyway my question is how to pass data to the JavaScript code. With regular Zeppelin charts you write scala or other code and then save that in a dataframe. Then on the next line you use the %sql option and you can write a SQL command and then buttons appear to let you graph the data.
But what I have found looking on the internet is no indication that data created in the scala code section would be passed to the Angular section where you put the ds3.js code.
Some examples I found are like this one where all the html and Javascript is put in one giant print statement in the scala code https://rawkintrevo.org/2016/09/20/gelly-on-apache-flink/
And then there is an example like this one Using d3.js with Apache Zeppelin where the Zeppelin line is all JavaScript, but the data is just a locally created array.
So I need (1) an example and (2) some understanding of how RDDs ad Dataframes can be passed into the JavaScript code, which of course is on a different line that the scala code. How do you bring objects in the scala section of the notebook into scope for the Javascript section.
You can refer to zeppelin docs for a good getting-started guide to creating custom visualization. Also, you might want to check out the code of some of the built-ins viz.
Regarding how data from DataFrames are passed to js, I'm pretty sure z.show or %sql triggers dataFrame.take(${zeppelin.spark.maxResult}) which collects the RDD[T] as a Seq[T] object to the driver whose elements are then used to render graphs.
Alternatively if you have a javascript graph defined in another paragraph, you can also usez.angularBind("values", rdd.take(maxResult)) to send the data to the angular view. There's a really nice answer here on the subject which might help.
Hope you find this helpful.

Is 'Digit Classification Using HOG Features' matlab example only available in the 2014 version?

I use the R2013a version of matlab. I tried to follow the path,
syntheticDir = fullfile(toolboxdir('vision'), 'visiondemos','digits','synthetic');
handwrittenDir = fullfile(toolboxdir('vision'), 'visiondemos','digits','handwritten');
but there were no files named digits.
Also upon running,
trainingSet = imageSet(syntheticDir, 'recursive');
testSet = imageSet(handwrittenDir, 'recursive');
I got the following error : Undefined function 'imageSet' for input arguments of type 'char' .
I'm trying to attempt this example, http://www.mathworks.in/help/vision/examples/digit-classification-using-hog-features.html
Usually I would consult the release notes of the respective toolboxes to find out when a new function was introduced.
For the digit classification example mentioned, it uses imageSet a new feature in R2014b, as well as the extractHOGFeatures function introduced in R2013b. It also uses fitcecoc from the Statistics toolbox. This a new function in R2014b.
It would be nice if the documentation provided this information in easier way...
This example was added in R2013b. Generally, if you have an older version of MATLAB, you should use the documentation that came with it. The documentation on the web is for the current release, and so it will naturally contain new examples and functions not present in the older versions. However, if you click "Other Releases", you can see the archive of the documentation for the previous releases. This way you can easily check when a particular function or example was added.

training a new model using pascal kit

need some help on this.
Currently I am doing a project on computer vision that requires me to train a new model to detect a certain object.
In this case, I am using the system provided by P. Felzenszwalb, D. McAllester, D. Ramaman and his team => Discriminatively trained deformable part models which is implemented in Matlab.
Project webpage: http://www.cs.uchicago.edu/~pff/latent/.
However I have no idea how to direct the system to use my dataset(a collection of images and annotation) which is different from the the PASCAL datasets so as to train a new model.
By directing, I meant a line of code that allows me to change the dataset the system reads from, for training a model.
E.g.
% directory for caching models, intermediate data, and results
cachedir = ['/var/tmp/rbg/YOURPATH/' VOCyear '/'];
I tried looking at their Readme and documentation guides but they do not make any mention. Do correct me if I am wrong.
Let me know if I have not made my problem clear enough.
I tried looking at some files such as global.m but no go.
Your help is much appreciated and thanks in advance!
You can try to read pascal.m in the DPM package(voc-release5), there are similar code working on VOC2007/2010 dataset.
There are plenty of parts that need to be adapted to achieve this. For example the voc_config has to be adapted in order to read from your files.
The same with the pascal_train.m function. Depending on the images and the way you parse them, this may require quite some time to adapt this function.
Other functions to consider:
imreadx
pascal_test
pascaleval