Why I can write something like this without compilation errors:
wordCount foreach(x => println("Word: " + x._1 + ", count: " + x._2)) // wordCount - is Map
i.e. I declared the x variable.
But I can't use magic _ symbol in this case:
wordCount foreach(println("Word: " + _._1 + ", count: " + _._2)) // wordCount - is
You should check this answer about placeholder syntax.
Two underscores mean two consecutive variables, so using println(_ + _) is a placeholder equivalent of (x, y) => println(x + y)
In first example, you just have a regular Tuple, which has accessors for first (._1) and second (._2) element.
it means that you can't use placeholder syntax when you want to reference only one variable multiple times.
Every underscore is positional. So your code is desugared to
wordCount foreach((x, y) => println("Word: " + x._1 + ", count: " + y._2))
Thanks to this, List(...).reduce(_ + _) is possible.
Moreover, since expansion is made relative to the closest paren it actually will look like:
wordCount foreach(println((x, y) => "Word: " + x._1 + ", count: " + y._2))
Related
I am quite new to Scala and functional programming.
I wrote the simple codes as below, which manipulates the string by counting the word.
When the 4th comma-delimitted part is empty then, I concated only three columns, otherwise I concated all the columns including the values as code above.
But I think that it is not quite proper to the functional programming. Because I used the if statement to see the input value contains the value or not.
How to change it to the more scala-like code?
str = "aa,bb,1668268540040,34.0::aa,bb,1668268540040"
val parts = str.split("::")
for (case <- parts) {
val ret = case.map(c => if (c.value.isEmpty) {
c.columnFamily + "," + c.qualifier + "," + c.ts
} else {
c.columnFamily + "," + c.qualifier + "," + c.ts + "," + c.value
})
}
str = "aa,bb,1668268540040,34.0::aa,bb,166826434343"
val parts = str.split("::")
for (part <- parts) {
val elem = part.split(",", 4)
if (elem.length == 4) {
val Array(f, q, t, v) = elem
state.put(f + ":" + q, (v, t.toLong))
} else {
val Array(f, q, t) = elem
state.put(f + ":" + q, ("", t.toLong))
}
}
#LeviRamsey's comment tells you actually everything, but just to make your code more "scala-ish", you should avoid mutable data structures in the first place (what you're doing with state, which I think is a Map object), and use immutable data structures. About your if-else part, it's actually okay in FP, but in Scala, you can use pattern matching on a list, rather than manual length checking and using Arrays. Something like this:
parts.foldLeft(Map.empty[String, (String, Long)]) {
case (state, part) =>
part.split(",", 4).toList match {
case f :: q :: t :: v :: Nil =>
state.updated(f + ":" + q, (v, t.toLong))
case f :: q :: t :: Nil =>
state.updated(f + ":" + q, ("", t.toLong))
case _ => state // or whatever thing you want to do, in case neither 4 nor 3 elements are splitted
}
}
while performing a hands on code practice i am facing cannot resolve symbol x from intellj
error code line
println(lst.reduceLeft((x,y) => {println(x + " , "+ y) x +y}))
println(lst.reduceRight((x,y) => {println(x + " , "+ y) x -y}))
i have tried to debugg from the suggestions from intellj but not working
Intellj
Build #IC-221.5591.52, built on May 10, 2022
scala version
scala-sdk-2.11.12
reference
http://www.codebind.com/scala/scala-reduce-fold-scan-leftright/?unapproved=192475&moderation-hash=8cdabb0f7834cbe19792b863eb952538#comment-192475
//In Scala Reduce, fold or scan are most commonly used with collections in the form of reduceLeft, reduceRight, foldLeft, foldRight, scanLeft or scanRight.
// In general, all functions apply a binary operator to each element of a collection.
// The result of each step is passed on to the next step.
package pack {
}
object obj2 {
println
println("=====started=======")
println
//val is a constant (which is an un changeable variable),A variable holds a value / address to a value in memory
val lst = List(1, 2, 3, 5, 7, 10, 13)
val lst2 = List("A", "B", "C")
def main(args: Array[String]) {
println(lst.reduceLeft(_ + _))
println(lst2.reduceLeft(_ + _))
println(lst.reduceLeft((x,y) => {println(x + " , "+ y) x +y}))
println(lst.reduceLeft(_ - _))
println(lst.reduceRight(_ - _))
println(lst.reduceRight((x,y) => {println(x + " , "+ y) x -y}))
println(lst.foldLeft(100)(_ + _))
println(lst2.foldLeft("z")(_ + _))
println(lst.scanLeft(100)(_ + _))
println(lst2.scanLeft("z")(_ + _))
}
}
println(lst.reduceLeft((x,y) => { println(x + " , " + y) x + y }))
The code inside the { } is not a valid expression. It looks like you are expecting Scala to work out that there are two expressions here, but it can't. You need to put in an explicit ; to fix it:
println(lst.reduceLeft((x,y) => { println(x + " , " + y); x + y }))
println(lst.reduceLeft((x,y) => {println(x + " , "+ y) x +y})) is not valid. If you want several instructions in a anonymous function you need either to separate them with ; or make it multiline:
println(lst.reduceLeft((x, y) => { println(x + " , " + y); x + y }))
or
println(lst.reduceLeft((x, y) => {
println(x + " , " + y)
x + y
}))
Also better use string interpolation instead of concatenation:
println(lst.reduceLeft((x, y) => {
println(s"$x , $y")
x + y
}))
I hope to get something by println, but by using AWS, it may not work, How can I save the content of println as a file on AWS using "saveAsTextFile"?
The original content of println is as following:
println("\n[ First output is ]")
output1.foreach(a => println("(" + a +"," + titles(a - 1) + ")"));
println("\n[ Second output us ]")
output2.foreach(a => println("(" + a +"," + titles(a - 1) + ")"));
output1 and output2 are both list made up of numbers. titles is also a list.
Thanks.
Well if both are Lists, you may convert them into RDDs, using SparkContext's method parallelize.
val rdd1 = sc.parallelize(List("[ First output is ]") ++ output1.map(a => "(" + a + "," + titles(a - 1) + ")"))
val rdd2 = sc.parallelize(List("[ Second output is ]") ++ output2.map(a => "(" + a + "," + titles(a - 1) + ")"))
After this you can use saveAsTextFile, in your desired s3 path.
rdd1.saveAsTextFile("s3://yourAccessKey:yourSecretKey#/out1.txt")
rdd2.saveAsTextFile("s3://yourAccessKey:yourSecretKey#/out2.txt")
I recommend you to read this blog, it might help you do understand important things about S3 and Apache-Spark Writing s3 data with Apache Spark
I've a list of nodes (String) that I want to convert into something the following.
create X ({name:"A"}),({name:"B"}),({name:"B"}),({name:"C"}),({name:"D"}),({name:"F"})
Using a fold I get everything with an extra "," at the end. I can remove that using a substring on the final String. I was wondering if there is a better/more functional way of doing this in Scala ?
val nodes = List("A", "B", "B", "C", "D", "F")
val str = nodes.map( x => "({name:\"" + x + "\"}),").foldLeft("create X ")( (acc, curr) => acc + curr )
println(str)
//create X ({name:"A"}),({name:"B"}),({name:"B"}),({name:"C"}),({name:"D"}),({name:"F"}),
Solution 1
You could use the mkString function, which won't append the seperator at the end.
In this case you first map each element to the corresponding String and then use mkString for putting the ',' inbetween.
Since the "create X" is static in the beginning you could just prepend it to the result.
val str = "create X " + nodes.map("({name:\"" + _ + "\"})").mkString(",")
Solution 2
Another way to see this: Since you append exactly one ',' too much, you could just remove it.
val str = nodes.foldLeft("create X ")((acc, x) => acc + "({name:\"" + x + "\"}),").init
init just takes all elements from a collection, except the last.
(A string is seen as a collection of chars here)
So in a case where there are elements in your nodes, you would remove a ','. When there is none you only get "create X " and therefore remove the white-space, which might not be needed anyways.
Solution 1 and 2 are not equivalent when nodes is empty. Solution 1 would keep the white-space.
Joining a bunch of things, splicing something "in between" each of the things, isn't a map-shaped problem. So adding the comma in the map call doesn't really "fit".
I generally do this sort of thing by inserting the comma before each item during the fold; the fold can test whether the accumulator is "empty" and not insert a comma.
For this particular case (string joining) it's so common that there's already a library function for it: mkString.
Move "," from map(which applies to all) to fold/reduce
val str = "create X " + nodes.map( x => "({name:\"" + x + "\"})").reduceLeftOption( _ +","+ _ ).getOrElse("")
I wish to create a Map keyed by name containing the count of things with that name. I have a list of the things with name, which may contain more than one item with the same name. Coded like this I get an error "type mismatch; found : String required: (String, Int)":
//variation 0, produces error
(Map[String, Int]() /: entries)((r, c) => { r + (c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
This confuses me as I though (a, b) was a Tuple2 and therefore suitable for use with Map add. Either of the following variations works as expected:
//variation 1, works
(Map[String, Int]() /: entries)((r, c) => { r + Tuple2(c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
//variation 2, works
(Map[String, Int]() /: entries)((r, c) => {
val e = (c.name, if (r.contains(c.name)) (c.name) + 1 else 1) })
r + e
I'm unclear on why there is a problem with my first version; can anyone advise. I am using Scala-IDE 2.0.0 beta 2 to edit the source; the error is from the Eclipse Problems window.
When passing a single tuple argument to a method used with operator notation, like your + method, you should use double parentheses:
(Map[String, Int]() /: entries)((r, c) => { r + ((c.name, r.get(c.name).map(_ + 1).getOrElse(1) )) })
I've also changed the computation of the Int, which looks funny in your example…
Because + is used to concatenate strings stuff with strings. In this case, parenthesis are not being taken to mean a tuple, but to mean a parameter.
Scala has used + for other stuff, which resulted in all sorts of problems, just like the one you mention.
Replace + with updated, or use -> instead of ,.
r + (c.name, if (r.contains(c.name)) (c.name) + 1 else 1)
is parsed as
r.+(c.name, if (r.contains(c.name)) (c.name) + 1 else 1)
So the compiler looks for a + method with 2 arguments on Map and doesn't find it. The form I prefer over double parentheses (as Jean-Philippe Pellet suggests) is
r + (c.name -> if (r.contains(c.name)) (c.name) + 1 else 1)
UPDATE:
if Pellet is correct, it's better to write
r + (c.name -> r.getOrElse(c.name, 0) + 1)
(and of course James Iry's solution expresses the same intent even better).