Can anyone 1-line this Scala code (preferably with FP?) - scala

Sup y'all. The below feels like a tragic waste of Scala. Can anyone save this code?
val tokenSplit = token.split(":")(1)
val candidateEntityName = tokenSplit.substring(0,tokenSplit.length-1)
if(!candidateEntityName.equals(entityName)) removeEnd = true

Here it is (You don't need to use equals):
val removeEnd = token.split(":")(1).init != entityName

should be sth like: (untested)
val removeEnd = !(token.split(":")(1).dropRight(1).equals(entityName))
or:
val removeEnd = !(token.split(":").last.dropRight(1).equals(entityName))

A different solution using regex's to match on the input. It also handles the case where the data isn't as expected (you could of course extend your regex to suite your needs).
val removeEnd = """<(\w+):(\w+)>""".r.findFirstMatchIn(token).map(!_.group(2).equals(entityName)).getOrElse(throw new Exception(s"Can't parse input: $token"))
If you want to default to false:
val removeEnd = """<(\w+):(\w+)>""".r.findFirstMatchIn(token).exists(!_.group(2).equals(entityName))

Related

Cleanest way to check if a string value is a negative number?

We could make a method:
def isNegNumber(s : String) : Boolean =
(s.head == '-') && s.substring(1).forall(_.isDigit)
Is there a cleaner way to do this?
You can use Try and Option to do this in a safer way. This will prevent an error if the string is not a number at all
import scala.util.Try
val s = "-10"
val t = Try(s.toDouble).toOption
val result = t.fold(false)(_ < 0)
Or even better, based on Luis' comment, starting Scala 2.13 (simpler and more efficient):
val t = s.toDoubleOption
val result = t.fold(false)(_ < 0)
Use a regex pattern.
"-10" matches "-\\d+" //true
It can easily be adjusted to account for whitespace.
"\t- 7\n" matches raw"\s*-\s*\d+\s*" //true
Another possible solution could be:
def isNegativeNumber(s: String) = s.toDoubleOption.exists(_ < 0)

Scio Apache Beam - How to properly separate a pipeline code?

I have a pipeline with a set of PTransforms and my method is getting very long.
I'd like to write my DoFns and my composite transforms in a separate package and use them back in my main method. With python it's pretty straightforward, how can I achieve that with Scio? I don't see any example of doing that. :(
withFixedWindows(
FIXED_WINDOW_DURATION,
options = WindowOptions(
trigger = groupedWithinTrigger,
timestampCombiner = TimestampCombiner.END_OF_WINDOW,
accumulationMode = AccumulationMode.ACCUMULATING_FIRED_PANES,
allowedLateness = Duration.ZERO
)
)
.sumByKey
// How to write this in an another file and use it here?
.transform("Format Output") {
_
.withWindow[IntervalWindow]
.withTimestamp
}
If I understand your question correctly, you want to bundle your map, groupBy, ... transformations in a separate package, and use them in your main pipeline.
One way would be to use applyTransform, but then you would end up using PTransforms, which are not scala-friendly.
You can simply write a function that receives an SCollection and returns the transformed one, like:
def myTransform(input: SCollection[InputType]): Scollection[OutputType] = ???
But if you intend to write your own Source/Sink, take a look at the ScioIO class
You can use map function to map your elements example.
Instead of passing a lambda, you can pass a method reference from another class
Example .map(MyClass.MyFunction)
I think one way to solve this could be to define an object in another package and then create a method in that object that would have the logic required for your transformation. For example:
def main(cmdlineArgs: Array[String]): Unit = {
val (sc, args) = ContextAndArgs(cmdlineArgs)
val defaulTopic = "tweets"
val input = args.getOrElse("inputTopic", defaulTopic)
val output = args("outputTopic")
val inputStream: SCollection[Tweet] = sc.withName("read from pub sub").pubsubTopic(input)
.withName("map to tweet class").map(x => {parse(x).extract[Tweet]})
inputStream
.flatMap(sentiment.predict) // object sentiment with method predict
}
object sentiment {
def predict(tweet: Tweet): Option[List[TweetSentiment]] = {
val data = tweet.text
val emptyCase = Some("")
Some(data) match {
case `emptyCase` => None
case Some(v) => Some(entitySentimentFile(data)) // I used another method, //not defined
}
}
Please also this link for an example given in the Scio examples

How to get Set of keys or values from Set of Map.Entry in scala?

I have a set of Map.Entry like Set<Map.Entry<String, ConfigValue>> in scala. Now I want to get the Set either keys(String) or values(ConfigValue) in scala. Please suggest some easy solution for this problem.
Thanks
you can use .map to transform your Set[Map.Entry[String,ConfigValue]] to Set[String] and/or Set[ConfigValue]. however note that you might want to convert to List before to avoid collapsing duplicates.
So if you have
val map: Set[Map[K, V]] = ???
val keys = map.flatMap(_.keySet) will give you Set[K]
val values = map.flatMap(_.values) will give you Set[V]
In both cases duplicates will be removed.
You could create a couple of functions that describe that computation, like:
val getKeys: Set[JavaMap.Entry[String, ConfigValue]] => Set[String] = _.map(_.getKey)
val getValues: Set[JavaMap.Entry[String, ConfigValue]] => Set[ConfigValue] = _.map(_.getValue)
Then when you need to extract one or the other you can call them like so:
val setOfKeyMap: Set[Map.Entry[String, ConfigValue]] = ???
...
val setOfKeys: Set[String] = getKeys(setOfKeyMap)
val setOfValues: Set[ConfigValue] = getValues(setOfKeyMap)

Scala filter by extension

I have this function below for filter a list of files. I was wondering how I could filter so it only returns files that end in .png or .txt?
def getListOfFiles(directoryName: String): Array[String] = {
return (new File(directoryName)).listFiles.filter(_.isFile).map(_.getAbsolutePath)
}
Thanks for the help, guys.
Just add a condition to filter:
(new File(directoryName)).listFiles.
filter { f => f.isFile && (f.getName.endsWith(".png") || f.getName.endsWith(".txt")) }.
map(_.getAbsolutePath)
or use listFiles(FileFilter) instead of just listFiles, but it's less convenient (unless you use experimental Scala single method interface implementation)
Just like you would filter ordinary strings:
val filenames = List("batman.png", "shakespeare.txt", "superman.mov")
filenames.filter(name => name.endsWith(".png") || name.endsWith(".txt"))
// res1: List[String] = List(batman.png, shakespeare.txt)
Alternative approach, a bit less verbose
import scala.reflect.io.Directory
Directory(directoryName).walkFilter(_.extension=="png")
It returns an Iterator[Path] which can be converted with.toArray[String]

What is a more Scala way of writing this code?

I want to strip off the word "America/" from the start of each item in the list, and the code below does just that, but I feel like it can be done in a significantly better way.
var tz = java.util.TimeZone.getAvailableIDs
for(i <- 0 until tz.length) {
if(tz(i).startsWith("America/")) {
tz(i) = tz(i).replaceFirst("America/", "")
}
}
Simple and straight forward:
val tz = java.util.TimeZone.getAvailableIDs.map(_.replaceFirst("^America/", ""))
very similar to #Noah's answer, but using a for-yield iteration (so that you can add other filters with no more usage of parentheses).
import java.util.TimeZone
val tz = for(t <- TimeZone.getAvailableIDs) yield t.replaceFirst("^America/", "")
I will use regex for it:
val pattern = "^America/".r
tz = tz.map(pattern.replaceFirstIn(_, ""))
wonder if it is an effcient way.
Map is preferred to for loops in functional programming, so instead of changing the list in place with a for loop, passing the data around by mapping is more pure and (IMO) prettier.
this can work:
val tzs = java.util.TimeZone.getAvailableIDs map { tz =>
if(tz.startsWith("America/")) tz.replaceFirst("America/","")
else tz
}
If you only want the American time zones, you could do this:
val americanZones = {
val pattern = "^America/(.*)".r
( java.util.TimeZone.getAvailableIDs
flatMap pattern.findFirstMatchIn
map (_ group 1) )
}
Added an 'if' to the for/yield
val zones = java.util.TimeZone.getAvailableIDs
val formatted_zones = for(i <- 0 until zones.length if zones(i).startsWith("America/")) yield {
zones(i).replaceFirst("America/", "")
}
No regex:
val tz = java.util.TimeZone.getAvailableIDs.map(_ stripPrefix "America/")