I am new to functional programming paradigm and Scala. I am trying to solve a problem using scala. I have a text file containing graph edges in following format:
3, 5
4, 6
7, 8
where 3,5 represents an edge from 3 to 5 in the graph
I am using a type of Map[Vertex,List[Vertex]] to handle graphs. My approach is to read line by line using foreach and process it, which I think is not a functional way to do it. Any help in this is appreciated.
I will leave the file reading to you, as there are many ways to do it depending on your particular application. Here is one source you might find useful for it.
Assuming you've managed to read the file into an Array[(Int, Int)], i.e., an array of tuples, in your example Array((3,5), (4,6), (7,8)), we can turn it into the the adjacency map you're looking for as follows:
arr.groupBy(_._1).mapValues(arr => arr.map(_._2))
Explanation:
We first group the tuples by first element (._1). This produces a Map[Int, Array[(Int, Int)]] representing a map from each vertex to all its edges.
Next, since the we transform the arrays to not contain the full edge information (u,v) but only the neighbour vertex v corresponding to that edge.
And we're done!
NB: This is assuming your graph is directed. If you want to turn it into an undirected graph, you can do this simply by adding (v,u) for every (u,v).
Related
I am totally new in asp, I am learning clingo and I have a problem with variables. I am working on graphs and paths in the graphs so I used a tuple such as g((1,2,3)). what I want is to add new node to the path in which the tuple sequence holds. for instance the code below will give me (0, (1,2,3)) but what I want is (0,1,2,3).
Thanks in advance.
g((1,2,3)).
g((0,X)):-g(X).
Naive fix:
g((0,X,Y,Z)) :- g((X,Y,Z)).
However I sense that you want to store the path in the tuple as is it is a list. Bad news: unlike prolog clingo isn't meant to handle lists as terms of atoms (like your example does). Lists are handled by indexing the elements, for example the list [a,b,c] would be stored in predicates like p(1,a). p(2,b). p(3,c).. Why? Because of grounding: you aim to get a small ground program to reduce the complexity of the solving process. To put it in numbers: assuming you are searching for a path which includes all n nodes. This sums up to n!. For n=10 this are 3628800 potential paths, introducing 3628800 predicates for a comparively small graph. Numbering the nodes as mentioned will lead to only n*n potential predicates to represent the path. For n=10 these are just 100, in comparison to 3628800 a huge gain.
To get an impression what you are searching for, run the following example derived from the potassco website:
% generating path: for every time exactly one node
{ path(T,X) : node(X) } = 1 :- T=1..6.
% one node isn't allowed on two different positions
:- path(T1,X), path(T2,X), T1!=T2.
% there has to be an edge between 2 adjascent positions
:- path(T,X), path(T+1,Y), not edge(X,Y).
#show path/2.
% Nodes
node(1..6).
% (Directed) Edges
edge(1,(2;3;4)). edge(2,(4;5;6)). edge(3,(1;4;5)).
edge(4,(1;2)). edge(5,(3;4;6)). edge(6,(2;3;5)).
Output:
Answer: 1
path(1,1) path(2,3) path(3,4) path(4,2) path(5,5) path(6,6)
Answer: 2
path(1,1) path(2,3) path(3,5) path(4,4) path(5,2) path(6,6)
Answer: 3
path(1,6) path(2,2) path(3,5) path(4,3) path(5,4) path(6,1)
Answer: 4
path(1,1) path(2,4) path(3,2) path(4,5) path(5,6) path(6,3)
Answer: 5
...
I have a multiclass classification problem I'm looking to sort with logistic regression. I know this can also be tackled by decision trees and random forest, but wish to stick specifically with "LogisticRegressionWithLBFGS".
I have all the data tidying done. I have my data nice and tidy in a dataframe with a:
label field (String), a feature vector (vector of features/ numbers) and a third column "LabelIndex" (numbers representing the class).
When I do a train test split on the data frame and try to fit it to: LogisticRegressionWithLBFGS
val model = new LogisticRegressionWithLBFGS().setNumClasses(10).setIntercept(true).setValidateData(true).run("trainingData")
It doesn't like the "run" part.
The example I am working off, loads a data file in via:
val data = MLUtils.loadLibSVMFile(Spark.sparkContext, "data/mnist.bz2")
(i'm trying to copy the example, and slot in my own data. But its in a different format, looks different etc)
I was doing a bit of reading, and I'd come across, I need to convert my dataframe to a RDD[LabeledPoint].
I need to map it.
I'm having problems finding good info on how to do this.
How do I simply convert a Dataframe with 3 fields as described above, "Label" (String), "Features" (feature vector), "IndexedLabel" (Double)
into a RDD[LabeledPoint]?
Got it working:
Can't convert Dataframe to Labeled Point
This link showed me how to make the conversion successfully.
If I have an Array[Array[Double]] in Scala, is there an idomatic way to map over the second axis?
For instance, consider the following matrix:
val M : Array[Array[Double]] = Array(Array(1d,2d),Array(3d,4d),Array(5d,6d))
To normalize the rows I simply run:
M.map(x=>x.map(_/x.sum))
However, the normalize the columns it seems like I must execute:
M.transpose.map(x=>x.map(_/x.sum)).transpose
This is workable, but it becomes extremely tedious if I have more than two indices. In generally if I want to map over the last axis of a bunch of nested Array, i.e., Array[Array[...Array[Double]...]], then I need to bubble the last axis to the front via map and transpose, then map over it, then bubble it back to the back.
I have two DStreams. Let A:DStream[X] and B:DStream[Y].
I want to get the cartesian product of them, in other words, a new C:DStream[(X, Y)]
containing all the pairs of X and Y values.
I know there is a cartesian function for RDDs. I was only able to find this similar question but it's in Java and so does not answer my question.
The Scala equivalent of the linked question's answer (ignoring Time v3, which isn't used there) is
A.transformWith(B, (rddA: RDD[X], rddB: RDD[Y]) => rddA.cartesian(rddB))
or shorter
A.transformWith(B, (_: RDD[X]).cartesian(_: RDD[Y]))
I am trying to update a MST by adding a new vertex in the MST. For this, I have been following "Updating Spanning Tree" by Chin and Houck. http://www.computingscience.nl/docs/vakken/al/WerkC/UpdatingSpanningTrees.pdf
A step in the paper requires me to find the largest edge in the path/paths between two given vertices. My idea is to find all the possible paths between the vertices and then, subsequently find the largest edge from the paths. I have been trying to implement this in MATLAB. However, so far, I have been unsuccessful. Any lead / clear algorithm to find all paths between two vertices or even the largest edge in the path between two given nodes/ vertices would be really welcome.
For reference, I would like to put forward an example. If the graph has following edges 1-2, 1-3, 2-4 and 3-4, the paths between 4 and 4 are:
1) 4-2-1-3-4
2) 4-3-1-2-4
Thank you
The algorithm works by lowering the t value to exclude large edges from the new MST. When the algorithm completes, t will be the lowest edge that remains to be inserted to complete the MST.
The m value represents the largest edge on a path from r to z, local to each run of INSERT. m is lowered at each iteration of the loop if possible, thereby removing the previous m edge as a possible candidate for t.
It's not easy to explain in words, I recommend doing a run of the algorithm on paper until the steps are clear.
I made a quick attempt to sketch the steps here: http://jacob.midtgaard-olesen.dk/?p=140
But basically, the algorithm adds edges from the old MST unless it finds a smaller edge to add between the new node z and another node in the old MST. In the example, the edge (A,B) is not in the new tree, since a better connection to B was found by the algorithm.
Note that on selecting h and k, if t and (w,r) have equal edge value, I believe you should choose (w,r)
Finally you should probably go trough the proof following the algorithm to understand why the algorithm works. (I didn't read it all :) )