Create key/value-­‐array pairs Scala/Spark [closed] - scala

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
How can I create key/value-array pair in Scala. By this I mean in place of value I need an array.
val newRdd1 = rdd1.flatMap(x=>x.split(" "))
.map({case (key, Array(String)) => Array(String) })

You can achieve it using map(), it is similar in either plain scala program or Scala-in-SparkContext.
Example, you have a list of strings:
var sRec = List("key1,a1,a2,a3", "key2,b1,b2,b3", "key3,c1,c2,c3")
You can split it & convert to key/value(array of strings) assuming key is in 0th position, using:
sRec.map(x => (x.split(",")(0), Array(x.split(",")(1), x.split(",")(2), x.split(",")(3)))).
foreach(println)
(key1,[Ljava.lang.String;#7a81197d)
(key2,[Ljava.lang.String;#5ca881b5)
(key3,[Ljava.lang.String;#24d46ca6)
If you want to read a particular array element by key:
sRec.map(x => (x.split(",")(0),Array(x.split(",")(1), x.split(",")(2), x.split(",")(3)))).
map(x => (x._1, x._2(0))).foreach(println)
Output:
(key1,a1)
(key2,b1)
(key3,c1)

Related

How to Check if Variable exists in Scala [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
I want to check if a variable is already defined/exists in scala or not. Lets say a function called checkVar do this operation:
var x = 10
checkVar(x) -> returns boolean True
checkVar(y) -> returns boolean False
I am asking this question because I want to create a mechanism to define a variable if it doesn't exist.
Variables only exist at compile time so you can't dynamically create or delete variables at runtime. So both x and y must be defined at compile time or else the compiler will reject the code.
What you can do is use Option to indicate whether a variable has a value or not:
def checkVar(v: Option[Int]) = v.nonEmpty
var x = Some(10)
checkVar(x) // True
val y = None
checkVar(y) // False
x = None
checkVar(x) // False

How to compute the mean-square in Matlab? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to compute the square of mean-square for each element of l
l=[0.02817088, -0.74100320, -0.54062120, -0.24612808, 0.06945337, -0.58415690, -0.51238549,
-0.07862326, -0.42417337, -0.33482340, -0.21339753, -0.03890844, -0.59325371, 0.28154593,
-0.32133359,-0.13534792, 0.14060645, 0.32204972, 0.44438052, -0.21750973,-0.59107599,
-0.60809913]'
k= -0.2224834
sum(l-k)^2/22
I am not sure if sum(l-k)^2/22 is the sum of each (l[j]-k) for j=1,2,...,22?
ans = 2.4223e-14
I guess what you need might be
>> mean((l-k).^2)
ans = 0.10945
Data (You need ... for line continuation if you have data in different lines for l)
l=[0.02817088, -0.74100320, -0.54062120, -0.24612808, 0.06945337, -0.58415690, -0.51238549, ...
-0.07862326, -0.42417337, -0.33482340, -0.21339753, -0.03890844, -0.59325371, 0.28154593, ...
-0.32133359,-0.13534792, 0.14060645, 0.32204972, 0.44438052, -0.21750973,-0.59107599, ...
-0.60809913]'
k= -0.2224834

How do I Print an Array of names as pairs [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
let players = ["Greg", "Jenn", "Steve", "Anthony", "Krista", "Marti", "Erin", "Brandon",].shuffled()
I want to loop over the array and have it print out all pairs after being shuffled... so if the above was the outcome after being shuffled... it would print out
Greg, Jenn
Steve, Anthony
Krista, Marti
Erin, Brandon
you could use this:
if !players.isEmpty {
let arrTpl = stride(from: 1, to: players.count, by: 2).map { (players[$0-1], players[$0]) }
print("\(arrTpl)")
}

Scala: sort comparing with adjacent elements [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Assuming I have the following Scala classes:
Human(id: String, task: Task)
Task(id: String, time: Duration)
And having a List[(Human, Task)] with the following elements:
("H2", Task("T3", 5 minute))
("H3", Task("T1", 10 minute))
("H1", Task("T1", 10 minute))
("H1", Task("T2", 5 minute))
Now I want to functionally check if close elements have the same duration, and if so, order them by the human id.
In this case, the final list would have the elements sorted like so:
("H2", Task("T3", 5 minute))
("H1", Task("T1", 10 minute))
("H3", Task("T1", 10 minute))
("H1", Task("T2", 5 minute))
I tried to use sortBy to do so, but the way I'm doing, the final list will be fully ordered by the Human ID, not comparing the times.
Does anyone have any idea how can I do this?
Your question is a bit confused. You say you have a List of (Human,Task) tuples, but then you describe a collection of (String,Task) tuples.
Here's a way to sort a List[Human] according to the rules you've described.
def sortHumans(hs: List[Human]): List[Human] =
if (hs.isEmpty) Nil
else {
val target = hs.head.task.time
hs.takeWhile(_.task.time == target).sortBy(_.id) ++
sortHumans(hs.dropWhile(_.task.time == target))
}

How do i create a Graph in GraphX with this [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am struggling to understand how i am going to create the following in GraphX in Apache spark. I am given the following:
a hdfs file which has loads of data which comes in the form:
node: ConnectingNode1, ConnectingNode2..
For example:
123214: 521345, 235213, 657323
I need to somehow store this data in an EdgeRDD so that i can create my graph in GraphX, but i have no idea how i am going to go about this.
After you read your hdfs source and have your data in rdd, you can try something like the following:
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.Edge
// Sample data
val rdd = sc.parallelize(Seq("1: 1, 2, 3", "2: 2, 3"))
val edges: RDD[Edge[Int]] = rdd.flatMap {
row =>
// split around ":"
val splitted = row.split(":").map(_.trim)
// the value to the left of ":" is the source vertex:
val srcVertex = splitted(0).toLong
// for the values to the right of ":", we split around "," to get the other vertices
val otherVertices = splitted(1).split(",").map(_.trim)
// for each vertex to the right of ":", we create an Edge object connecting them to the srcVertex:
otherVertices.map(v => Edge(srcVertex, v.toLong, 1))
}
Edit
Additionally, if your vertices have a constant default weight, you can create your graph straight from the Edges, so you don't need to create a verticesRDD:
import org.apache.spark.graphx.Graph
val g = Graph.fromEdges(edges, defaultValue = 1)