I am trying to append filename to each record in the file. I thought if the RDD is Array it would have been easy for me to do it.
Some help with converting RDD type or solving this problem would be much appreciated!
In (String, String) type
scala> myRDD.first()(1)
scala><console>:24: error: (String, String) does not take parametersmyRDD.first()(1)
In Array(string)
scala> myRDD.first()(1)
scala> res1: String = abcdefgh
My function:
def appendKeyToValue(x: Array[Array[String]){
for (i<-0 to (x.length - 1)) {
var key = x(i)(0)
val pattern = new Regex("\\.")
val key2 = pattern replaceAllIn(key1,"|")
var tempvalue = x(i)(1)
val finalval = tempvalue.split("\n")
for (ab <-0 to (finalval.length -1)){
val result = (I am trying to append filename to each record in the filekey2+"|"+finalval(ab))
}
}
}
If you have a RDD[(String, String)], you can access the first tuple field of the first tuple by calling
val firstTupleField: String = myRDD.first()._1
If you want to convert a RDD[(String, String)] into a RDD[Array[String]] you can do the following
val arrayRDD: RDD[Array[String]] = myRDD.map(x => Array(x._1, x._2))
You may also employ a partial function to destructure the tuples:
val arrayRDD: RDD[Array[String]] = myRDD.map { case (a,b) => Array(a, b) }
Related
I have a DStream with tuple of Date and String as value, I want to convert the String part of the tuple to HashSet[String], when I try to map it inside another map it is not working for some reason, here is what I tried so far, any pointers would help
val enTextTupleStream: DStream[(Date, String)]
def extractStrings(data: String): HashSet[String]= HashSet(data)
val ddstream = enTextTupleStream.map {
t =>
val e1 = t._1
val e2 = t._2.map(extractStrings)
}
I am trying some basic logic using scala . I tried the below code but it throws error .
scala> val data = ("HI",List("HELLO","ARE"))
data: (String, List[String]) = (HI,List(HELLO, ARE))
scala> data.flatmap( elem => elem)
<console>:22: error: value flatmap is not a member of (String, List[String])
data.flatmap( elem => elem)
Expected Output :
(HI,HELLO,ARE)
Could some one help me to fix this issue?
You are trying to flatMap over a tuple, which won't work. The following will work:
val data = List(List("HI"),List("HELLO","ARE"))
val a = data.flatMap(x => x)
This will be very trivial in scala:
val data = ("HI",List("HELLO","ARE"))
println( data._1 :: data._2 )
what exact data structure are you working with?
If you are clear about you data structure:
type rec = (String, List[String])
val data : rec = ("HI",List("HELLO","ARE"))
val f = ( v: (String, List[String]) ) => v._1 :: v._2
f(data)
A couple of observations:
Currently there is no flatten method for tuples (unless you use shapeless).
flatMap cannot be directly applied to a list of elements which are a mix of elements and collections.
In your case, you can make element "HI" part of a List:
val data = List(List("HI"), List("HELLO","ARE"))
data.flatMap(identity)
Or, you can define a function to handle your mixed element types accordingly:
val data = List("HI", List("HELLO","ARE"))
def flatten(l: List[Any]): List[Any] = l.flatMap{
case x: List[_] => flatten(x)
case x => List(x)
}
flatten(data)
You are trying to flatMap on Tuple2 which is not available in current api
If you don't want to change your input, you can extract the values from Tuple2 and the extract the values for second tuple value as below
val data = ("HI",List("HELLO","ARE"))
val output = (data._1, data._2(0), data._2(1))
println(output)
If that's what you want:
val data = ("HI",List("HELLO,","ARE").mkString(""))
println(data)
>>(HI,HELLO,ARE)
In the below code, encoded is a JSON string. The JSON.parseFull() function is returning an object of the form: Some(Map(...)). I am using .get to extract the Map, but am unable to index it as the compiler sees it as type Any. Is there any to provide the compiler visibility that it is, in fact, a map?
val parsed = JSON.parseFull(encoded)
val mapped = parsed.get
You can utilize the collect with pattern matching to match on the type:
scala> val parsed: Option[Any] = Some(Map("1" -> List("1")))
parsed: Option[Any] = Some(Map(1 -> List(1)))
scala> val mapped = parsed.collect{
case map: Map[String, Any] => map
}
mapped: Option[Map[String,Any]] = Some(Map(1 -> List(1)))
You can do something like the following in the case of a List value to get values from the List:
scala> mapped.get.map{ case(k, List(item1)) => item1}
res0: scala.collection.immutable.Iterable[Any] = List(1)
I was able to use a combination of the get function and pattern matching similar to what was posted in Tanjin's response to get the desired result.
object ReadFHIR {
def fatal(msg: String) = throw new Exception(msg)
def main (args: Array[String]): Unit = {
val fc = new FhirContext()
val client = fc.newRestfulGenericClient("http://test.fhir.org/r2")
val bundle = client.search().forResource("Observation")
.prettyPrint()
.execute()
val jsonParser = fc.newJsonParser()
val encoded = jsonParser.encodeBundleToString(bundle)
val parsed = JSON.parseFull(encoded)
val mapped: Map[String, Any] = parsed.get match{
case map: Map[String, Any] => map
}
println(mapped("resourceType"))
}
}
I can't grasp the map method I guess..
Trying to read a file :
val messagesMap = XML.loadFile(messageXMLFile).map(parseMessageXML)
where the method parseMessageXML is defined as :
def parseMessageXML(xml : scala.xml.Node) = {
val nodes = xml \\ "add"
nodes.map({
node =>
val obj = new AdMessage(node)
println("adding an AdMessage " + obj.toString)
(obj.MessageId -> obj)
}).toMap
}
Can anybody please explain why I end up with a Seq[Map[String, AdMessage]] and not a just a Map[String, AdMessage] ?
map transforms each element of your Seq into an another element.
For instance:
scala> Seq("One", "Two", "Three").map(_.length())
res0: Seq[Int] = List(3, 3, 5)
Each String is mapped into an Int thanks to the length function. Therefore the original type is Seq[String] and the final type is Seq[Int]
In your case, parseMessageXML transforms a Node into a Map[String, AdMessage], so the original type is Seq[Node] and the final type is Seq[Map[String, AdMessage]].
In your case, assuming you just want to transform the content of the file into a Map[String, AdMessage]:
val messagesMap = parseMessageXML(XML.loadFile(messageXMLFile))
Say I have been given a list of futures with each one linked to an key such as:
val seq: Seq[(Key, Future[Value])]
And my goal is to produce a list of key value tuples once all futures have completed:
val complete: Seq[(Key, Value)]
I am wondering if this can be achieved using a sequence call. For example I know I can do the following:
val complete = Future.sequence(seq.map(_._2).onComplete {
case Success(s) => s
case Failure(NonFatal(e)) => Seq()
}
But this will only returns me a sequence of Value objects and I lose the pairing information between Key and Value. The problem being that Future.sequence expects a sequence of Futures.
How could I augment this to maintain the key/value pairing in my complete sequence?
Thanks
Des
How about transforming your Seq[(Key, Future[Value])] to Seq[Future[(Key, Value)]] first.
val seq: Seq[(Key, Future[Value])] = // however your implementation is
val futurePair: Seq[Future[(Key, Value)]] = for {
(key, value) <- seq
} yield value.map(v => (key, v))
Now you can use sequence to get Future[Seq[(Key, Value)]].
val complete: Future[Seq[(String, Int)]] = Future.sequence(futurePair)
Just a different expression of the other answer, using unzip and zip.
scala> val vs = Seq(("one",Future(1)),("two",Future(2)))
vs: Seq[(String, scala.concurrent.Future[Int])] = List((one,scala.concurrent.impl.Promise$DefaultPromise#4e38d975), (two,scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3))
scala> val (ks, fs) = vs.unzip
ks: Seq[String] = List(one, two)
fs: Seq[scala.concurrent.Future[Int]] = List(scala.concurrent.impl.Promise$DefaultPromise#4e38d975, scala.concurrent.impl.Promise$DefaultPromise#35f8a9d3)
scala> val done = (Future sequence fs) map (ks zip _)
done: scala.concurrent.Future[Seq[(String, Int)]] = scala.concurrent.impl.Promise$DefaultPromise#56913163
scala> done.value
res0: Option[scala.util.Try[Seq[(String, Int)]]] = Some(Success(List((one,1), (two,2))))
or maybe save on zippage:
scala> val done = (Future sequence fs) map ((ks, _).zipped)
done: scala.concurrent.Future[scala.runtime.Tuple2Zipped[String,Seq[String],Int,Seq[Int]]] = scala.concurrent.impl.Promise$DefaultPromise#766a52f5
scala> done.value.get.get.toList
res1: List[(String, Int)] = List((one,1), (two,2))