display words which words length is more than 8 - scala

I am trying to get from below list where the words have r and size should be more than 8 and converted to uppercase all the values in the list.
val names=List("sachinramesh","rahuldravid","viratkohli","mayank")
But I have tried with below but it is not giving anything. It throwing error.
names.map(s =>s.toUpperCase.contains("r").size(8)
It is throwing error.
can someone tell me how to resolve this issue.
Regards,
Kumar

If you are doing a combination of filter and map, think about using the collect method, which does both in one call. This is how to do what is described in the question:
names.collect{
case s if s.lengthCompare(8) > 0 && s.contains('r') =>
s.toUpperCase
}
collect works like filter because it only returns values that match a case statement. It works like map because you can make changes to the matching values before returning them.

you can try this :
names.filter(str => str.contains('r') && str.length > 8) // str contains an `r` and length > 8
.map(_.toUpperCase) // map the result to uppercase

names.filter(...).map(...) approach solves the problem, however requires iterating through the list twice. For a more optimal solution where we go through the list only once, consider #Tim's suggestion regarding collect, or perhaps consider lazy Iterator approach like so:
names
.iterator
.filter(_.size > 8)
.filter(_.contains('r'))
.map(_.toUpperCase)
.toList

You can also try this:
val result =for (x <- names if x.contains('r') && x.length > 8) yield x.toUpperCase
result.foreach(println)
cheers

Related

scala nested for/yield generator to extract substring

I am new to scala. Pls be gentle. My problem for the moment is the syntax error.
(But my ultimate goal is to print each group of 3 characters from every string in the list...now i am merely printing the first 3 characters of every string)
def do_stuff():Unit = {
val s = List[String]("abc", "fds", "654444654")
for {
i <- s.indices
r <- 0 to s(i).length by 3
println(s(i).substring(0,3))
} yield {s(i)}
}
do_stuff()
i am getting this error. it is syntax related, but i dont undersatnd..
Error:(12, 18) ')' expected but '.' found.
println(s(i).substring(0,3))
That code doesn't compile because in a for-comprehension, you can't just put a print statement, you always need an assignment, in this case, a dummy one can solve your porblem.
_ = println(s(i).substring(0,3))
EDIT
If you want the combination of 3 elements in every String you can use combinations method from collections.
List("abc", "fds", "654444654").flatMap(_.combinations(3).toList)

PySpark RDD Filter with "not in" for multiple values

I have an RDD looks like below:
myRDD:
[[u'16/12/2006', u'17:24:00'],
[u'16/12/2006', u'?'],
[u'16/12/2006', u'']]
I want to exclude the records with '?' or '' in it.
Following code works for one by one filtering, but is there a way to combine and filter items with '?' and '' in one go to get back following:
[u'16/12/2006', u'17:24:00']
The below works only for one item at a time, how to extend to multiple items
myRDD.filter(lambda x: '?' not in x)
want help on how to write:
myRDD.filter(lambda x: '?' not in x && '' not in x)
Try this,
myRDD.filter(lambda x: ('?' not in x) & ('' not in x))

Remove first element in RDD without using filter function

I have built an RDD from a file where each element in the RDD is section from the file separated by a delimiter.
val inputRDD1:RDD[(String,Long)] = myUtilities.paragraphFile(spark,path1)
.coalesce(100*spark.defaultParallelism)
.zipWithIndex() //RDD[String, Long]
.filter(f => f._2!=0)
The reason I do the last operation above (filter) is to remove the first index 0.
Is there a better way to remove the first element rather than to check each element for the index value as done above?
Thanks!
One possibility is to use RDD.mapPartitionsWithIndex and to remove the first element from the iterator at index 0:
val inputRDD = myUtilities
.paragraphFile(spark,path1)
.coalesce(100*spark.defaultParallelism)
.mapPartitionsWithIndex(
(index, it) => if (index == 0) it.drop(1) else it,
preservesPartitioning = true
)
This way, you only ever advance a single item on the first iterator, where all others remain untouched. Is this be more efficient? Probably. Anyway, I'd test both versions to see which one performs better.

Instantiate a map with default

So, I have a list of characters and I have to create a list which says how many there are of each kind
( List(a,a,a,a,b,b) => List ( (a,4), (b,3))
I wanted to create a map that would list each character to 1 at first
( (a->1), (a->1), (a->1), (a->1), (b->1), (b->1))
and then use group by + tolist to return the final list. The problem is creating a map requires a PHD in this language.
I tried
val m = xs.foldLeft(Map[Char,Int]()){c => c->1}
Which doesn't work.
xs map (x=> x-> 1) toMap
Which compiles but I can't do anything with this map afterwards.
and xs.toMap(x,1)
Which doesn't work either.
Could somebody tell me how I should proceed please?
You can use groupBy to do the grouping and then find the count of each group:
list.groupBy(identity).mapValues(_.length)

Scala count/access preceding item in a list from inside the list?

I'm looking for a way to access a preceding item in a list. The goal is to count a nested list and then append that nested lists' length in the next list item. Preferably a way to do this inline
Code example:
List(List[items], this.preceding.size)
To output:
List(List(item1,item2,item3), 3)
Thanks for your help!
This sort of thing is, in general, what folds and scans are good at. I'm not sure exactly what form you want, but here is something you can work off of:
val xs = List("salmon","cod","halibut")
xs.scanLeft((0,"")){ (prev, item) => (prev._2.length, item) }.tail
// List((0,salmon), (6,cod), (3,halibut))
You can substitute other lists for the strings.