How do I "join" an iterable of strings by another string in Scala?
val thestrings = Array("a","b","c")
val joined = ???
println(joined)
I want this code to output a,b,c (join the elements by ",").
How about mkString ?
theStrings.mkString(",")
A variant exists in which you can specify a prefix and suffix too.
See here for an implementation using foldLeft, which is much more verbose, but perhaps worth looking at for education's sake.
Related
I am a newbie in scala and I need to sort a very large list with 40000 integers.
The operation is performed many times. So performance is very important.
What is the best method for sorting?
You can sort the list with List.sortWith() by providing a relevant function literal. For example, the following code prints all elements of sorted list which contains all elements of the initial list in alphabetical order of the first character lowercased:
val initial = List("doodle", "Cons", "bible", "Army")
val sorted = initial.sortWith((s: String, t: String)
=> s.charAt(0).toLower < t.charAt(0).toLower)
println(sorted)
Much shorter version will be the following with Scala's type inference:
val initial = List("doodle", "Cons", "bible", "Army")
val sorted = initial.sortWith((s, t) => s.charAt(0).toLower < t.charAt(0).toLower)
println(sorted)
For integers there is List.sorted, just use this:
val list = List(4, 3, 2, 1)
val sortedList = list.sorted
println(sortedList)
just check the docs
List has several methods for sorting. myList.sorted works for types with already defined order (like Int or String and others). myList.sortWith and myList.sortBy receive a function that helps defining the order
Also, first link on google for scala List sort: http://alvinalexander.com/scala/how-sort-scala-sequences-seq-list-array-buffer-vector-ordering-ordered
you can use List(1 to 400000).sorted
I am trying to parse and concatenate two columns at the same time using the following expression:
val part : RDD[(String)] = sc.textFile("hdfs://xxx:8020/user/sample_head.csv")
.map{line => val row = line split ','
(row(1), row(2)).toString}
which returns something like:
Array((AAA,111), (BBB,222),(CCC,333))
But how could I directly get:
Array(AAA, 111 , BBB, 222, CCC, 333)
Your toString() on a tuple really doesn't make much sense to me. Can you explain why do you want to create strings from tuples and then split them again later?
If you are willing to map each row into a list of elements instead of a stringified tuple of elements, you could rewrite
(row(1), row(2)).toString
to
List(row(1), row(2))
and simply flatten the resulting list:
val list = List("0,aaa,111", "1,bbb,222", "2,ccc,333")
val tuples = list.map{ line =>
val row = line split ','
List(row(1), row(2))}
val flattenedTuples = tuples.flatten
println(flattenedTuples) // prints List(aaa, 111, bbb, 222, ccc, 333)
Note that what you are trying to achieve involves flattening and can be done using flatMap, but not using just map. You need to either flatMap directly, or do map followed by flatten like I showed you (I honestly don't remember if Spark supports flatMap). Also, as you can see I used a List as a more idiomatic Scala data structure, but it's easily convertible to Array and vice versa.
I have an immutable Scala Set[String] containing a few strings, say {"a", "b", "c"}. I want to basically append them into a String that looks like "\"a\",\"b\",\"c\"". I know I can make a var resultStr and use a for-loop to get the result. But since Scala encourages using an immutable val over val and also has so many operations defined over the set, I was wondering if there was a more elegant way to achieve the result.
Thanks.
If you want to make a String from a scala collection, then you can simply use mkString(sep). Eg,
Set("a", "b", "c").mkString(",")
You can use map to surround them with quotes and then mkString to join them up with comma as the separator.
s.map(x => s""""$x"""").mkString(",")
or
s.map(x => "\"" + x + "\"").mkString(",")
(The treble quote is a Scala way to avoid having to escape anything from within, and the dollar is string interpolation.)
This code converts a List collection of Strings to Doubles with the first String in csv removed :
val points = List(("A1,2,10"), ("A2,2,5"), ("A3,8,4"), ("A4,5,8"), ("A5,7,5"), ("A6,6,4"), ("A7,1,2"), ("A8,4,9"))
points.map (m => (m.split(",")(1).toDouble , m.split(",")(2).toDouble))
//> res0: List[(Double, Double)] = List((2.0,10.0), (2.0,5.0), (8.0,4.0), (5.0,8.0), (7.0,5.0), (6.0,4.0), (1.0,2.0), (4.0,9.0))
Can this be re-written using fold or map so that the length number of elements in the CSV list is not hardcoded ? Currently this is just correct where each String contains 3 CSV elements. But I'm unsure how to re-write it using N elements such as ("A1,2,10,4,5")
Update : Here is possible solution :
points.map (m => (m.split(",").tail).map(m2 => m2.toDouble))
Can be achieved using single traversal instead of two ?
scala> val points = List(("A1,2,10"), ("A2,2,5,6,7,8,9"))
points: List[String] = List(A1,2,10, A2,2,5,6,7,8,9)
scala> points.map(_.split(",").tail.map(_.toDouble))
res0: List[Array[Double]] = List(Array(2.0, 10.0), Array(2.0, 5.0, 6.0, 7.0, 8.0, 9.0))
EDIT
Pretty much was you proposed. As to whether it is doable without a nested .map, it's pretty doubtful : your .csv represents a matrix, which are usually manipulated using nested for loops (or .map).
Tuples are not the right choice here, as tuples are generally more useful if you know the number of elements in the tuple in advance.
You could use arrays though and take advantage of the fact that you can treat arrays as collections:
points.map(_.split(',').drop(1).map(_.toDouble))
.split(',') splits at the comma seperator
.drop(1) removes the first element
.map(_.toDouble) converts the strings to floating point numbers
Update: This is equivalent to your proposed solution.
This has one iteration over the outer list:
points.map(_.split(",").tail.map(_.toDouble))
I saw that following answer: Scala split string to tuple, but in the question the OP is asking for a string to a List. I would like to take a string, split it by some character, and convert it to a tuple so they can be saved as vals:
val (a,b,c) = "A.B.C".split(".").<toTupleMagic>
Is this possible? This would be a conversion from an Array[String] to a Tuple3 of (String,String,String)
It is unnecessary:
val Array(a, b, c) = "A.B.C".split('.')
Note that I converted the parameter to split from String to Char: if you pass a String, it is treated as a regex pattern, and . matches anything (so you'll get an array of empty strings back).
If you truly want to convert it to tuple, you can use Shapeless.