Scala filter by extension - scala

I have this function below for filter a list of files. I was wondering how I could filter so it only returns files that end in .png or .txt?
def getListOfFiles(directoryName: String): Array[String] = {
return (new File(directoryName)).listFiles.filter(_.isFile).map(_.getAbsolutePath)
}
Thanks for the help, guys.

Just add a condition to filter:
(new File(directoryName)).listFiles.
filter { f => f.isFile && (f.getName.endsWith(".png") || f.getName.endsWith(".txt")) }.
map(_.getAbsolutePath)
or use listFiles(FileFilter) instead of just listFiles, but it's less convenient (unless you use experimental Scala single method interface implementation)

Just like you would filter ordinary strings:
val filenames = List("batman.png", "shakespeare.txt", "superman.mov")
filenames.filter(name => name.endsWith(".png") || name.endsWith(".txt"))
// res1: List[String] = List(batman.png, shakespeare.txt)

Alternative approach, a bit less verbose
import scala.reflect.io.Directory
Directory(directoryName).walkFilter(_.extension=="png")
It returns an Iterator[Path] which can be converted with.toArray[String]

Related

spark filter with higher order function

How can I get a higher order function in Scala to properly accept a spark filter predicate?
I.e.
val df = Seq(1,2,3,4).toDF("value")
df.filter(col("value")> 2).show
df.filter(col("value")< 2).show
works just fine. But when I try to refactor it into a function which accepts a filter predicate (note: same signature as > operator) the compiler no longer finds the left/right part to submit to predicate.
def myFilter =(predicate:Any =>Column)(df:DataFrame)= {
df.filter(col("value") predicate 2).show // WARN this does not compile
}
df.transform(myFilter(>)).show
How can this be made to work?
Combining the various comments gives this as a possible solution:
def myFilter = (predicate: (Column, Int) => Column)(df: DataFrame) = {
df.filter(predicate(col("value"), 2))
}
df.transform(myFilter(_ > _)).show

Any better, more idiomatic way to convert SQL ResultSet to a Scala List or other collection type?

I'm using the following naive code to convert a ResultSet to a Scala List:
val rs = pstmt.executeQuery()
var nids = List[String]()
while (rs.next()) {
nids = nids :+ rs.getString(1)
}
rs.close()
Is there a better approach, something more idiomatic to Scala, that doesn't require using a mutable object?
Why don't you try this:
new Iterator[String] {
def hasNext = resultSet.next()
def next() = resultSet.getString(1)
}.toStream
Taken from this answer here
I have a similar problem and my solution is:
Iterator.from(0).takeWhile(_ => rs.next()).map(_ => rs.getString(1)).toList
Hope that will help.
Using ORM tool like slick or Quill as mention in comment section is considered to be better approach.
If you want to use the Scala Code for processing the ResultSet.You can use the tailRecursion.
#scala.annotation.tailrec
def getResult(resultSet: ResultSet, list: List[String] = Nil): List[String] = {
if (resultSet.next()) {
val value = resultSet.getString(0)
getResult(resultSet, value :: list)
}
else {
list
}
}
This method returns the list of string that contains column value at 0th position. This process is pure immutable so you don't need to worry.On the plus side this method is tail recursion so Scala will internally optimize this method accordingly.
Thanks

Using map on an Option[Seq[T]]

I have an Option[Seq[T]] which, naturally enough may contain a Seq[T] or may indeed by None.
I have been warned away from using .get but how can I use map to return either the intended populated Seq[T] or an empty List() if the Option was None.
I have managed to do it using pattern matching, was wondering if there is a way to use map to achieve the same goal. Thanks!
val maybeProducts:Option[Seq[Product]] = {....}
val products:Seq[Product] = {
maybeProducts match {
case Some(ps) => ps
case None => List()
}
}
You can use getOrElse:
maybeProducts.getOrElse(List())
val products: Seq[Product] = maybeProducts.getOrElse(List())
For the record; another solution is to convert the option to list and then flatten them:
maybeProducts.toList.flatten

Repeating function call until we'll get non-empty Option result in Scala

A very newbie question in Scala - how do I do "repeat function until something is returned meets my criteria" in Scala?
Given that I have a function that I'd like to call until it returns the result, for example, defined like that:
def tryToGetResult: Option[MysteriousResult]
I've come up with this solution, but I really feel that it is ugly:
var res: Option[MysteriousResult] = None
do {
res = tryToGetResult
} while (res.isEmpty)
doSomethingWith(res.get)
or, equivalently ugly:
var res: Option[MysteriousResult] = None
while (res.isEmpty) {
res = tryToGetResult
}
doSomethingWith(res.get)
I really feel like there is a solution without var and without so much hassle around manual checking whether Option is empty or not.
For comparison, Java alternative that I see seems to be much cleaner here:
MysteriousResult tryToGetResult(); // returns null if no result yet
MysteriousResult res;
while ((res = tryToGetResult()) == null);
doSomethingWith(res);
To add insult to injury, if we don't need to doSomethingWith(res) and we just need to return it from this function, Scala vs Java looks like that:
Scala
def getResult: MysteriousResult = {
var res: Option[MysteriousResult] = None
do {
res = tryToGetResult
} while (res.isEmpty)
res.get
}
Java
MysteriousResult getResult() {
while (true) {
MysteriousResult res = tryToGetResult();
if (res != null) return res;
}
}
You can use Stream's continually method to do precisely this:
val res = Stream.continually(tryToGetResult).flatMap(_.toStream).head
Or (possibly more clearly):
val res = Stream.continually(tryToGetResult).dropWhile(!_.isDefined).head
One advantage of this approach over explicit recursion (besides the concision) is that it's much easier to tinker with. Say for example that we decided that we only wanted to try to get the result a thousand times. If a value turns up before then, we want it wrapped in a Some, and if not we want a None. We just add a few characters to our code above:
Stream.continually(tryToGetResult).take(1000).flatMap(_.toStream).headOption
And we have what we want. (Note that the Stream is lazy, so even though the take(1000) is there, if a value turns up after three calls to tryToGetResult, it will only be called three times.)
Performing side effects like this make me die a little inside, but how about this?
scala> import scala.annotation.tailrec
import scala.annotation.tailrec
scala> #tailrec
| def lookupUntilDefined[A](f: => Option[A]): A = f match {
| case Some(a) => a
| case None => lookupUntilDefined(f)
| }
lookupUntilDefined: [A](f: => Option[A])A
Then call it like this
scala> def tryToGetResult(): Option[Int] = Some(10)
tryToGetResult: ()Option[Int]
scala> lookupUntilDefined(tryToGetResult())
res0: Int = 10
You may want to give lookupUntilDefined an additional parameter so it can stop eventually in case f is never defined.

Scala Parallel Collections- How to return early?

I have a list of possible input Values
val inputValues = List(1,2,3,4,5)
I have a really long to compute function that gives me a result
def reallyLongFunction( input: Int ) : Option[String] = { ..... }
Using scala parallel collections, I can easily do
inputValues.par.map( reallyLongFunction( _ ) )
To get what all the results are, in parallel. The problem is, I don't really want all the results, I only want the FIRST result. As soon as one of my input is a success, I want my output, and want to move on with my life. This did a lot of extra work.
So how do I get the best of both worlds? I want to
Get the first result that returns something from my long function
Stop all my other threads from useless work.
Edit -
I solved it like a dumb java programmer by having
#volatile var done = false;
Which is set and checked inside my reallyLongFunction. This works, but does not feel very scala. Would like a better way to do this....
(Updated: no, it doesn't work, doesn't do the map)
Would it work to do something like:
inputValues.par.find({ v => reallyLongFunction(v); true })
The implementation uses this:
protected[this] class Find[U >: T](pred: T => Boolean, protected[this] val pit: IterableSplitter[T]) extends Accessor[Option[U], Find[U]] {
#volatile var result: Option[U] = None
def leaf(prev: Option[Option[U]]) = { if (!pit.isAborted) result = pit.find(pred); if (result != None) pit.abort }
protected[this] def newSubtask(p: IterableSplitter[T]) = new Find(pred, p)
override def merge(that: Find[U]) = if (this.result == None) result = that.result
}
which looks pretty similar in spirit to your #volatile except you don't have to look at it ;-)
I took interpreted your question in the same way as huynhjl, but if you just want to search and discardNones, you could do something like this to avoid the need to repeat the computation when a suitable outcome is found:
class Computation[A,B](value: A, function: A => B) {
lazy val result = function(value)
}
def f(x: Int) = { // your function here
Thread.sleep(100 - x)
if (x > 5) Some(x * 10)
else None
}
val list = List.range(1, 20) map (i => new Computation(i, f))
val found = list.par find (_.result.isDefined)
//found is Option[Computation[Int,Option[Int]]]
val result = found map (_.result.get)
//result is Option[Int]
However find for parallel collections seems to do a lot of unnecessary work (see this question), so this might not work well, with current versions of Scala at least.
Volatile flags are used in the parallel collections (take a look at the source for find, exists, and forall), so I think your idea is a good one. It's actually better if you can include the flag in the function itself. It kills referential transparency on your function (i.e. for certain inputs your function now sometimes returns None rather than Some), but since you're discarding the stopped computations, this shouldn't matter.
If you're willing to use a non-core library, I think Futures would be a good match for this task. For instance:
Akka's Futures include Futures.firstCompletedOf
Twitter's Futures include Future.select
...both of which appear to enable the functionality you're looking for.