scala> Seq("abc", null).mkString(" ")
res0: String = abc null
but I want to get "abc" only
Is there a scala way to skip nulls?
scala> val seq = Seq("abc", null, "def")
seq: Seq[String] = List(abc, null, def)
scala> seq.flatMap(Option[String]).mkString(" ")
res0: String = abc def
There's always Seq("abc", null).filter(_ != null).mkString(" ")
Combination of Rex's answer and Eric's first comment:
Seq("abc", null).map(Option(_)).collect{case Some(x) => x}.mkString(" ")
The first map wraps the values resulting in Seq[Option[String]]. collect then essentially does a filter and map, discarding the None values and leaving only the unwrapped Some values.
Related
In a function def a(l: List[(Int, String)]): List[(Int, String)] = ??? I want to split a String into their words in lower case. Commas etc. should be ignored, so I guess I need replaceAll("[^A-Za-z]+", " ").toLowerCase() somewhere? The Int value should stay the same as in the sentence.
Example how it should work:
val example = List((11, "That is great!"), (12, "Wow, impossible!"))
print(a(example))
Result
List((11, "that"),(11, "is"),(11, "great"),(12, "wow"),(12, "impossible"))
You can use flatMap for that:
val example = List((11, "That is great!"), (12, "Wow, impossible!"))
example.flatMap { case (int, str) =>
str
.replaceAll("[^A-Za-z]+", " ")
.toLowerCase()
.split(' ')
.map((int, _))
}
Yields:
res0: List[(Int, String)] = List((11,that), (11,is), (11,great), (12,wow), (12,impossible))
This is strictly equivalent to Yuval's answer, but probably more approachable when starting with Scala
for {
(int, str) <- example
word <- str.replaceAll("[^A-Za-z]+", " ").toLowerCase().split(' ')
} yield (int, word)
Say I have a Iterable[(Int, String)]. How do I get an array of just the "values"? That is, how do I convert from Iterable[(Int, String)] => Array[String]? The "keys" or "values" do not have to be unique, and that's why I put them in quotation marks.
iterable.map(_._2).toArray
_._2 : take out the second element of the tuple represented by input variable( _ ) whose name I don't care.
Simply:
val iterable: Iterable[(Int, String)] = Iterable((1, "a"), (2, "b"))
val values = iterable.toArray.map(_._2)
Simply map the iterable and extract the second element(tuple._2),
scala> val iterable: Iterable[(Int, String)] = Iterable((100, "Bring me the horizon"), (200, "Porcupine Tree"))
iterable: Iterable[(Int, String)] = List((100,Bring me the horizon), (200,Porcupine Tree))
scala> iterable.map(tuple => tuple._2).toArray
res3: Array[String] = Array(Bring me the horizon, Porcupine Tree)
In addition to the already suggested map you might want to build the array as you map from tuple to string instead of converting at some point as it might save an iteration.
import scala.collection
val values: Array[String] = iterable.map(_._2)(collection.breakOut)
I am trying to append filename to each record in the file. I thought if the RDD is Array it would have been easy for me to do it.
Some help with converting RDD type or solving this problem would be much appreciated!
In (String, String) type
scala> myRDD.first()(1)
scala><console>:24: error: (String, String) does not take parametersmyRDD.first()(1)
In Array(string)
scala> myRDD.first()(1)
scala> res1: String = abcdefgh
My function:
def appendKeyToValue(x: Array[Array[String]){
for (i<-0 to (x.length - 1)) {
var key = x(i)(0)
val pattern = new Regex("\\.")
val key2 = pattern replaceAllIn(key1,"|")
var tempvalue = x(i)(1)
val finalval = tempvalue.split("\n")
for (ab <-0 to (finalval.length -1)){
val result = (I am trying to append filename to each record in the filekey2+"|"+finalval(ab))
}
}
}
If you have a RDD[(String, String)], you can access the first tuple field of the first tuple by calling
val firstTupleField: String = myRDD.first()._1
If you want to convert a RDD[(String, String)] into a RDD[Array[String]] you can do the following
val arrayRDD: RDD[Array[String]] = myRDD.map(x => Array(x._1, x._2))
You may also employ a partial function to destructure the tuples:
val arrayRDD: RDD[Array[String]] = myRDD.map { case (a,b) => Array(a, b) }
I am trying to get the first 2 values of a comma separated string in scala. For example
a,b,this is a test
How do i store the values a,b in 2 separate variables?
To keep it easy and clean.
KISS solution:
1.Use split for separation. Then use take which is defined on all ordered sequences to get the elements as needed:
scala> val res = "a,b,this is a test" split ',' take 2
res: Array[String] = Array(a, b)
2.Use Pattern matching to set the variables:
scala> val Array(x,y) = res
x: String = a
y: String = b*
Another solution using Sequence Pattern match in Scalaenter link description here
Welcome to Scala version 2.11.2 (OpenJDK 64-Bit Server VM, Java 1.7.0_65).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val str = "a,b,this is a test"
str: String = a,b,this is a test
scala> val Array(x, y, _*) = str.split(",")
x: String = a
y: String = b
scala> println(s"x = $x, y = $y")
x = a, y = b
Are you looking for the method split ?
"a,b,this is a test".split(',')
res0: Array[String] = Array(a, b, this is a test)
If you want only the first two values you'll need to do something like:
val splitted = "a,b,this is a test".split(',')
val (first, second) = (splitted(0), splitted(1))
There should be some regex options here.
scala> val s = "a,b,this is a test"
s: String = a,b,this is a test
scala> val r = "[^,]+".r
r: scala.util.matching.Regex = [^,]+
scala> r findAllIn s
res0: scala.util.matching.Regex.MatchIterator = non-empty iterator
scala> .toList
res1: List[String] = List(a, b, this is a test)
scala> .take(2)
res2: List[String] = List(a, b)
scala> val a :: b :: _ = res2
a: String = a
b: String = b
but
scala> val a :: b :: _ = (r findAllIn "a" take 2).toList
scala.MatchError: List(a) (of class scala.collection.immutable.$colon$colon)
... 33 elided
or if you're not sure there is a second item, for instance:
scala> val r2 = "([^,]+)(?:,([^,]*))?".r.unanchored
r2: scala.util.matching.UnanchoredRegex = ([^,]+)(?:,([^,]*))?
scala> val (a,b) = "a" match { case r2(x,y) => (x, Option(y)) }
a: String = a
b: Option[String] = None
scala> val (a,b) = s match { case r2(x,y) => (x, Option(y)) }
a: String = a
b: Option[String] = Some(b)
This is a bit nicer if records are long strings.
Footnote: the Option cases look nicer with a regex interpolator.
If your string is short, you may as well just use String.split and take the first two elements.
val myString = "a,b,this is a test"
val splitString = myString.split(',') // Scala adds a split-by-character method in addition to Java's split-by-regex
val a = splitString(0)
val b = splitString(1)
Another solution would be to use a regex to extract the first two elements. I think it's quite elegant.
val myString = "a,b,this is a test"
val regex = """(.*),(.*),.*""".r // all groups (in parenthesis) will be extracted.
val regex(a, b) = myString // a="a", b="b"
Of course, you can tweak the regex to only allow non-empty tokens (or anything else you might need to validate) :
val regex = """(.+),(.+),.+""".r
Note that in my examples I assumed that the string always had at least two tokens. In the first example, you can test the length of the array if needed. The second one will throw a MatchError if the regex doesn't match the string.
I had originally proposed the following solution. I will leave it because it works and doesn't use any class formally marked as deprecated, but the Javadoc for StringTokenizer mentions that it is a legacy class and should no longer be used.
val myString = "a,b,this is a test"
val st = new StringTokenizer(",");
val a = st.nextToken()
val b = st.nextToken()
// You could keep calling st.nextToken(), as long as st.hasMoreTokens is true
This drives me crazy, I can't figure out why this gives me an error.
Here an example of my code:
var seqOfObjects:Seq[Map[String, String]] = Seq[Map[String, String]]()
for(item <- somelist) {
seqOfObjects += Map(
"objectid" -> item(0).toString,
"category" -> item(1),
"name" -> item(2),
"url" -> item(3),
"owneremail" -> item(4),
"number" -> item(5).toString)
}
This gives me an error saying:
Type mismatch, expected: String, actual: Map[String, String]
But a Map[String, String] is exactly what I want to append into my Seq[Map[String, String]].
Why is it saying that my variable seqOfObjects expects a String??
Anyone have a clue?
Thanks
a += b means a = a.+(b). See this answer.
There is no method + in Seq, so you can't use +=.
scala> Seq[Int]() + 1
<console>:8: error: type mismatch;
found : Int(1)
required: String
Seq[Int]() + 1
^
required: String is from string concatenation. This behavior is inherited from Java:
scala> List(1, 2, 3) + "str"
res0: String = List(1, 2, 3)str
Actually method + here is from StringAdd wrapper. See implicit method Predef.any2stringadd.
You could use :+= or +:= instead of +=.
Default implementation of Seq is List, so you should use +: and +:= instead of :+ and :+=. See Performance Characteristics of scala collections.
You could also use List instead of Seq. There is :: method in List, so you can use ::=:
var listOfInts = List[Int]()
listOfInts ::= 1
You can rewrite your code without mutable variables using map:
val seqOfObjects =
for(item <- somelist) // somelist.reverse to reverse order
yield Map(...)
To reverse elements order you could use reverse method.
Short foldLeft example:
sl.foldLeft(Seq[Map[Srting, String]]()){ (acc, item) => Map(/* map from item */) +: acc }