Converting from Array[String] to Seq[String] in Scala - scala

In the following Scala code I attempt to convert from a String that contains elements separated by "|" to a sequence Seq[String]. However the result is a WrappedArray of characters. How to make this work?
val array = "t1|t2".split("|")
println(array.toSeq)
results in:
WrappedArray(t, 1, |, t, 2)
What I need is:
Seq(t1,t2)

The below works. ie split by pipe character ('|') instead of pipe string ("|").
since split("|") calls overloaded definition that takes an regex string where pipe is a meta-character. This gets you the incorrect result as shown in the question.
scala> "t1|t2".split('|').toSeq
res10: Seq[String] = WrappedArray(t1, t2)

Related

replace multiple occurrence of duplicate string in Scala with empty

I have a string as
something,'' something,nothing_something,op nothing_something,'' cat,cat
I want to achieve my output as
'' something,op nothing_something,cat
Is there any way to achieve it?
If I understand your requirement correctly, here's one approach with the following steps:
Split the input string by "," and create a list of indexed-CSVs and convert it to a Map
Generate 2-combinations of the indexed-CSVs
Check each of the indexed-CSV pairs and capture the index of any CSV which is contained within the other CSV
Since the CSVs corresponding to the captured indexes are contained within some other CSV, removing these indexes will result in remaining indexes we would like to keep
Use the remaining indexes to look up CSVs from the CSV Map and concatenate them back to a string
Here is sample code applying to a string with slightly more general comma-separated values:
val str = "cats,a cat,cat,there is a cat,my cat,cats,cat"
val csvIdxList = (Stream from 1).zip(str.split(",")).toList
val csvMap = csvIdxList.toMap
val csvPairs = csvIdxList.combinations(2).toList
val csvContainedIdx = csvPairs.collect{
case List(x, y) if x._2.contains(y._2) => y._1
case List(x, y) if y._2.contains(x._2) => x._1
}.
distinct
// csvContainedIdx: List[Int] = List(3, 6, 7, 2)
val csvToKeepIdx = (1 to csvIdxList.size) diff csvContainedIdx
// csvToKeepIdx: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 4, 5)
val strDeduped = csvToKeepIdx.map( csvMap.getOrElse(_, "") ).mkString(",")
// strDeduped: String = cats,there is a cat,my cat
Applying the above to your sample string something,'' something,nothing_something,op nothing_something would yield the expected result:
strDeduped: String = '' something,op nothing_something
First create an Array of words separated by commas using split command on the given String, and do other operations using filter and mkString as below:
s.split(",").filter(_.contains(' ')).mkString(",")
In Scala REPL:
scala> val s = "something,'' something,nothing_something,op nothing_something"
s: String = something,'' something,nothing_something,op nothing_something
scala> s.split(",").filter(_.contains(' ')).mkString(",")
res27: String = '' something,op nothing_something
As per Leo C comment, I tested it as below with some other String:
scala> val s = "something,'' something anything anything anything anything,nothing_something,op op op nothing_something"
s: String = something,'' something anything anything anything anything,nothing_something,op op op nothing_something
scala> s.split(",").filter(_.contains(' ')).mkString(",")
res43: String = '' something anything anything anything anything,op op op nothing_something

Not able to understand a Scala code snippet with `mkString` on `Array[Any]`

I am a beginner exploring scala.The following is a Scala function.
def printArray[K](array:Array[K]) = array.mkString("Array(" , ", " , ")")
val array2 = Array("a", 2, true)
printArray(array2)
The output is
Array(a, 2, true)
My doubts
Here we have given the array type as K. What does K means? Does it mean all types?
How is the fucntion 'mkString' able to give the output as Array(a, 2, true).
Basically I don't understand the concatenation part.
Appreciate your help.
The mkString method called as
arr.mkString(prefix, separator, suffix)
will invoke toString on all array elements, prepend the prefix, then concatenate all strings separating them by the separator, and finally append the suffix.
The type parameter K in printArray[K] is ignored, it could be replaced by an existential. It's just a method with a bad name and confusing signature.
When you store any primitive data types (like Int) together with types that extend AnyRef (like String) into the same array, the least upper bound is inferred to be Any, so in
printArray(array2)
the K is set to Any, and the mkString works as described above, gluing together
Array( prefix
a "a".toString
, separator
2 2.toString
, separator
true true.toString
) suffix
yielding the string Array(a,2,true).
K is not a type here it is a type parameter, for more intuition have a look at other question Type parameter in scala
In this specific example K is infered to by Any - the most specific type that satisfies all 3 values "a", 2 and true
val array2: Array[Any] = Array("a", 2, true)
the mkString function joins all items of collection into single string. It adds separator between items and some strings in the beginning and end. Documentation mkString
If you look at your array2 definition in REPL, you will see that array2 is of type Any, the parent type of all the other types in Scala
scala> val array2 = Array("a", 2, true)
//array2: Array[Any] = Array(a, 2, true)
So when you call the function def printArray[K](array:Array[K]) = array.mkString("Array(" , ", " , ")") K now is treated as Any which returns a string with intitial String as Array( and ending string as ) and all the values separated by ,.
def mkString(start: String, sep: String, end: String): String =
addString(new StringBuilder(), start, sep, end).toString

Split function difference between char and string arguments

I try the following code in scala REPL:
"ASD-ASD.KZ".split('.')
res7: Array[String] = Array(ASD-ASD, KZ)
"ASD-ASD.KZ".split(".")
res8: Array[String] = Array()
Why this function calls have a different results?
There's a big difference in the function use.
The split function is overloaded, and this is the implementation from the source code of Scala:
/** For every line in this string:
Strip a leading prefix consisting of blanks or control characters
followed by | from the line.
*/
def stripMargin: String = stripMargin('|')
private def escape(ch: Char): String = "\\Q" + ch + "\\E"
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separator: Char): Array[String] = toString.split(escape(separator))
#throws(classOf[java.util.regex.PatternSyntaxException])
def split(separators: Array[Char]): Array[String] = {
val re = separators.foldLeft("[")(_+escape(_)) + "]"
toString.split(re)
}
So when you're calling split() with a char, you ask to split by that specific char:
scala> "ASD-ASD.KZ".split('.')
res0: Array[String] = Array(ASD-ASD, KZ)
And when you're calling split() with a string, it means that you want to have a regex. So for you to get the exact result using the double quotes, you need to do:
scala> "ASD-ASD.KZ".split("\\.")
res2: Array[String] = Array(ASD-ASD, KZ)
Where:
First \ escapes the following character
Second \ escapes character for the dot which is a regex expression, and we want to use it as a character
. - the character to split the string by

Why does my println in rdd prints the string of elements?

When I try to print the contents of my RDD, it prints something like displayed below, how can I print the content?
Thanks!
scala> lines
res15: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[3] at filter at <console>:23
scala> lines.take(5).foreach(println)
[Ljava.lang.String;#6d3db5d1
[Ljava.lang.String;#6e6be45e
[Ljava.lang.String;#6d5e0ff4
[Ljava.lang.String;#3a699444
[Ljava.lang.String;#69851a51
This is because it uses the toString implementation for the given object. In this case Array prints out the type and hash. If you convert it to a List then it will be a prettier output due to List's toString implementation
scala>println(Array("foo"))
[Ljava.lang.String;HASH
scala>println(Array("foo").toList)
List(foo)
Depending on how you want to print them out you can change your line that prints the elements to:
scala> lines.take(5).foreach(indvArray => indvArray.foreach(println))

scala - one line convert string split to vals

I saw that following answer: Scala split string to tuple, but in the question the OP is asking for a string to a List. I would like to take a string, split it by some character, and convert it to a tuple so they can be saved as vals:
val (a,b,c) = "A.B.C".split(".").<toTupleMagic>
Is this possible? This would be a conversion from an Array[String] to a Tuple3 of (String,String,String)
It is unnecessary:
val Array(a, b, c) = "A.B.C".split('.')
Note that I converted the parameter to split from String to Char: if you pass a String, it is treated as a regex pattern, and . matches anything (so you'll get an array of empty strings back).
If you truly want to convert it to tuple, you can use Shapeless.