Scala capture method of field without whole instance - scala

I had a piece of code that looks like this:
val foo = df.map(parser.parse) // def parse(str: String): ParsedData = { ... }
However, I found out that passes a lambda into Scala that captures this, I guess Scala treated code above into:
val foo = df.map(s => /* this. */parser.parse(s))
where my intention to pass lightweight capture. In Java, I'd do parser::parse, but that isn't available in Scala.
That does the trick:
val tmp = parser // split val read from capture; must be in method body
val foo = df.map(tmp.parse)
but makes code a little bit more unpleasant. There is an answer by #tiago-henrique-engel that could be used like:
val tmp = parser.parse _
val foo = df.map(tmp)
but it still requires a temporary val.
So, is there a way to pass lightweight capture to map without storing it first into some val to keep the code clean?

Related

Calling function library scala

I'm looking to call the ATR function from this scala wrapper for ta-lib. But I can't figure out how to use wrapper correctly.
package io.github.patceev.talib
import com.tictactec.ta.lib.{Core, MInteger, RetCode}
import scala.concurrent.Future
object Volatility {
def ATR(
highs: Vector[Double],
lows: Vector[Double],
closes: Vector[Double],
period: Int = 14
)(implicit core: Core): Future[Vector[Double]] = {
val arrSize = highs.length - period + 1
if (arrSize < 0) {
Future.successful(Vector.empty[Double])
} else {
val begin = new MInteger()
val length = new MInteger()
val result = Array.ofDim[Double](arrSize)
core.atr(
0, highs.length - 1, highs.toArray, lows.toArray, closes.toArray,
period, begin, length, result
) match {
case RetCode.Success =>
Future.successful(result.toVector)
case error =>
Future.failed(new Exception(error.toString))
}
}
}
}
Would someone be able to explain how to use function and print out the result to the console.
Many thanks in advance.
Regarding syntax, Scala is one of many languages where you call functions and methods passing arguments in parentheses (mostly, but let's keep it simple for now):
def myFunction(a: Int): Int = a + 1
myFunction(1) // myFunction is called and returns 2
On top of this, Scala allows to specify multiple parameters lists, as in the following example:
def myCurriedFunction(a: Int)(b: Int): Int = a + b
myCurriedFunction(2)(3) // myCurriedFunction returns 5
You can also partially apply myCurriedFunction, but again, let's keep it simple for the time being. The main idea is that you can have multiple lists of arguments passed to a function.
Built on top of this, Scala allows to define a list of implicit parameters, which the compiler will automatically retrieve for you based on some scoping rules. Implicit parameters are used, for example, by Futures:
// this defines how and where callbacks are run
// the compiler will automatically "inject" them for you where needed
implicit val ec: ExecutionContext = concurrent.ExecutionContext.global
Future(4).map(_ + 1) // this will eventually result in a Future(5)
Note that both Future and map have a second parameter list that allows to specify an implicit execution context. By having one in scope, the compiler will "inject" it for you at the call site, without having to write it explicitly. You could have still done it and the result would have been
Future(4)(ec).map(_ + 1)(ec)
That said, I don't know the specifics of the library you are using, but the idea is that you have to instantiate a value of type Core and either bind it to an implicit val or pass it explicitly.
The resulting code will be something like the following
val highs: Vector[Double] = ???
val lows: Vector[Double] = ???
val closes: Vector[Double] = ???
implicit val core: Core = ??? // instantiate core
val resultsFuture = Volatility.ATR(highs, lows, closes) // core is passed implicitly
for (results <- resultsFuture; result <- results) {
println(result)
}
Note that depending on your situation you may have to also use an implicit ExecutionContext to run this code (because you are extracting the Vector[Double] from a Future). Choosing the right execution context is another kind of issue but to play around you may want to use the global execution context.
Extra
Regarding some of the points I've left open, here are some pointers that hopefully will turn out to be useful:
Operators
Multiple Parameter Lists (Currying)
Implicit Parameters
Scala Futures

Why it is possible to write currying?

I have the following code:
object ContraCats {
val showString = Show[String]
def main(args: Array[String]): Unit = {
val m = showString.contramap[Symbol](_.name).show('dave)
val a = showString.contramap[Symbol](_.name)('dave)
}
}
As you can see, it is possible to write as currying version and the other as method call. Why it is possible?
The contramap returns a Show instance.
Show has both the show and the apply methods.
The apply method is special in Scala, since this two are equivalent:
someValue.apply(someArg)
someValue(someArg)
So in your example what's happening is that you're calling the apply method on the Show instance returned by contramap, i.e.
val m = showString.contramap[Symbol](_.name).show('dave)
val a = showString.contramap[Symbol](_.name).apply('dave)
Update
While the explanation above would make sense, I realized cats's Show doesn't have an apply method, so your code shouldn't compile (I tried on a REPL and it doesn't)

Infinite loop when replacing concrete value by parameter name

I have the two following objects (in scala and using spark):
1. The main object
object Omain {
def main(args: Array[String]) {
odbscan
}
}
2. The object odbscan
object odbscan {
val conf = new SparkConf().setAppName("Clustering").setMaster("local")
conf.set("spark.driver.maxResultSize", "3g")
val sc = new SparkContext(conf)
val param_user_minimal_rating_count = 2
/***Connexion***/
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val sql = "SELECT id, data FROM user_profile"
val options = connectMysql.getOptionsMap(sql)
val uSQL = sqlcontext.load("jdbc", options)
val users = uSQL.rdd.map { x =>
val v = x.toString().substring(1, x.toString().size - 1).split(",")
var ap: Map[Int, Double] = Map()
if (v.size > 1)
ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
(v(0).toInt, ap)
}.filter(_._2.size >= param_user_minimal_rating_count)
println(users.collect().mkString("\n"))
}
When I execute this code I obtain an infinite loop, until I change:
filter(_._2.size >= param_user_minimal_rating_count)
to
filter(_._2.size >= 1)
or any other numerical value, in this case the code work, and I have my result displayed
What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map) calls the accessor param_user_minimal_rating_count of object odbscan, the entire object odbscan will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...
Probably the easiest thing to do here is changing that val to final val param_user_minimal_rating_count = 2 so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.
An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.
Your problem is somewhere else.
The only difference between the 2 snippets is the definition of val Eps = 5 outside of the map which does not change at all the control flow of your code.
Please post more context so we can help.

Strongly typed access to csv in scala?

I would like to access csv files in scala in a strongly typed manner. For example, as I read each line of the csv, it is automatically parsed and represented as a tuple with the appropriate types. I could specify the types beforehand in some sort of schema that is passed to the parser. Are there any libraries that exist for doing this? If not, how could I go about implementing this functionality on my own?
product-collections appears to be a good fit for your requirements:
scala> val data = CsvParser[String,Int,Double].parseFile("sample.csv")
data: com.github.marklister.collections.immutable.CollSeq3[String,Int,Double] =
CollSeq((Jan,10,22.33),
(Feb,20,44.2),
(Mar,25,55.1))
product-collections uses opencsv under the hood.
A CollSeq3 is an IndexedSeq[Product3[T1,T2,T3]] and also a Product3[Seq[T1],Seq[T2],Seq[T3]] with a little sugar. I am the author of product-collections.
Here's a link to the io page of the scaladoc
Product3 is essentially a tuple of arity 3.
If your content has double-quotes to enclose other double quotes, commas and newlines, I would definitely use a library like opencsv that deals properly with special characters. Typically you end up with Iterator[Array[String]]. Then you use Iterator.map or collect to transform each Array[String] into your tuples dealing with type conversions errors there. If you need to do process the input without loading all in memory, you then keep working with the iterator, otherwise you can convert to a Vector or List and close the input stream.
So it may look like this:
val reader = new CSVReader(new FileReader(filename))
val iter = reader.iterator()
val typed = iter collect {
case Array(double, int, string) => (double.toDouble, int.toInt, string)
}
// do more work with typed
// close reader in a finally block
Depending on how you need to deal with errors, you can return Left for errors and Right for success tuples to separate the errors from the correct rows. Also, I sometimes wrap of all this using scala-arm for closing resources. So my data maybe wrapped into the resource.ManagedResource monad so that I can use input coming from multiple files.
Finally, although you want to work with tuples, I have found that it is usually clearer to have a case class that is appropriate for the problem and then write a method that creates that case class object from an Array[String].
You can use kantan.csv, which is designed with precisely that purpose in mind.
Imagine you have the following input:
1,Foo,2.0
2,Bar,false
Using kantan.csv, you could write the following code to parse it:
import kantan.csv.ops._
new File("path/to/csv").asUnsafeCsvRows[(Int, String, Either[Float, Boolean])](',', false)
And you'd get an iterator where each entry is of type (Int, String, Either[Float, Boolean]). Note the bit where the last column in your CSV can be of more than one type, but this is conveniently handled with Either.
This is all done in an entirely type safe way, no reflection involved, validated at compile time.
Depending on how far down the rabbit hole you're willing to go, there's also a shapeless module for automated case class and sum type derivation, as well as support for scalaz and cats types and type classes.
Full disclosure: I'm the author of kantan.csv.
I've created a strongly-typed CSV helper for Scala, called object-csv. It is not a fully fledged framework, but it can be adjusted easily. With it you can do this:
val peopleFromCSV = readCSV[Person](fileName)
Where Person is case class, defined like this:
case class Person (name: String, age: Int, salary: Double, isNice:Boolean = false)
Read more about it in GitHub, or in my blog post about it.
Edit: as pointed out in a comment, kantan.csv (see other answer) is probably the best as of the time I made this edit (2020-09-03).
This is made more complicated than it ought to because of the nontrivial quoting rules for CSV. You probably should start with an existing CSV parser, e.g. OpenCSV or one of the projects called scala-csv. (There are at least three.)
Then you end up with some sort of collection of collections of strings. If you don't need to read massive CSV files quickly, you can just try to parse each line into each of your types and take the first one that doesn't throw an exception. For example,
import scala.util._
case class Person(first: String, last: String, age: Int) {}
object Person {
def fromCSV(xs: Seq[String]) = Try(xs match {
case s0 +: s1 +: s2 +: more => new Person(s0, s1, s2.toInt)
})
}
If you do need to parse them fairly quickly and you don't know what might be there, you should probably use some sort of matching (e.g. regexes) on the individual items. Either way, if there's any chance of error you probably want to use Try or Option or somesuch to package errors.
I built my own idea to strongly typecast the final product, more than the reading stage itself..which as pointed out might be better handled as stage one with something like Apache CSV, and stage 2 could be what i've done. Here's the code you are welcome to it. The idea is to typecast the CSVReader[T] with type T .. upon construction, you must supply the reader with a Factor object of Type[T] as well. The idea here is that the class itself (or in my example a helper object) decides the construction detail and thus decouples this from the actual reading. You could use Implicit objects to pass the helper around but I've not done that here. The only downside is that each row of the CSV must be of the same class type, but you could expand this concept as needed.
class CsvReader/**
* #param fname
* #param hasHeader : ignore header row
* #param delim : "\t" , etc
*/
[T] ( factory:CsvFactory[T], fname:String, delim:String) {
private val f = Source.fromFile(fname)
private var lines = f.getLines //iterator
private var fileClosed = false
if (lines.hasNext) lines = lines.dropWhile(_.trim.isEmpty) //skip white space
def hasNext = (if (fileClosed) false else lines.hasNext)
lines = lines.drop(1) //drop header , assumed to exist
/**
* also closes the file
* #return the line
*/
def nextRow ():String = { //public version
val ans = lines.next
if (ans.isEmpty) throw new Exception("Error in CSV, reading past end "+fname)
if (lines.hasNext) lines = lines.dropWhile(_.trim.isEmpty) else close()
ans
}
//def nextObj[T](factory:CsvFactory[T]): T = past version
def nextObj(): T = { //public version
val s = nextRow()
val a = s.split(delim)
factory makeObj a
}
def allObj() : Seq[T] = {
val ans = scala.collection.mutable.Buffer[T]()
while (hasNext) ans+=nextObj()
ans.toList
}
def close() = {
f.close;
fileClosed = true
}
} //class
next the example Helper Factory and example "Main"
trait CsvFactory[T] { //handles all serial controls (in and out)
def makeObj(a:Seq[String]):T //for reading
def makeRow(obj:T):Seq[String]//the factory basically just passes this duty
def header:Seq[String] //must define headers for writing
}
/**
* Each class implements this as needed, so the object can be serialized by the writer
*/
case class TestRecord(val name:String, val addr:String, val zip:Int) {
def toRow():Seq[String] = List(name,addr,zip.toString) //handle conversion to CSV
}
object TestFactory extends CsvFactory[TestRecord] {
def makeObj (a:Seq[String]):TestRecord = new TestRecord(a(0),a(1),a(2).toDouble.toInt)
def header = List("name","addr","zip")
def makeRow(o:TestRecord):Seq[String] = {
o.toRow.map(_.toUpperCase())
}
}
object CsvSerial {
def main(args: Array[String]): Unit = {
val whereami = System.getProperty("user.dir")
println("Begin CSV test in "+whereami)
val reader = new CsvReader(TestFactory,"TestCsv.txt","\t")
val all = reader.allObj() //read the CSV info a file
sd.p(all)
reader.close
val writer = new CsvWriter(TestFactory,"TestOut.txt", "\t")
for (x<-all) writer.printObj(x)
writer.close
} //main
}
Example CSV (tab seperated.. might need to repair if you copy from an editor)
Name Addr Zip "Sanders, Dante R." 4823 Nibh Av. 60797.00 "Decker, Caryn G." 994-2552 Ac Rd. 70755.00 "Wilkerson, Jolene Z." 3613 Ultrices. St. 62168.00 "Gonzales, Elizabeth W." "P.O. Box 409, 2319 Cursus. Rd." 72909.00 "Rodriguez, Abbot O." Ap #541-9695 Fusce Street 23495.00 "Larson, Martin L." 113-3963 Cras Av. 36008.00 "Cannon, Zia U." 549-2083 Libero Avenue 91524.00 "Cook, Amena B." Ap
#668-5982 Massa Ave 69205.00
And finally the writer (notice the factory methods require this as well with "makerow"
import java.io._
class CsvWriter[T] (factory:CsvFactory[T], fname:String, delim:String, append:Boolean = false) {
private val out = new PrintWriter(new BufferedWriter(new FileWriter(fname,append)));
if (!append) out.println(factory.header mkString delim )
def flush() = out.flush()
def println(s:String) = out.println(s)
def printObj(obj:T) = println( factory makeRow(obj) mkString(delim) )
def printAll(objects:Seq[T]) = objects.foreach(printObj(_))
def close() = out.close
}
If you know the the # and types of fields, maybe like this?:
case class Friend(id: Int, name: String) // 1, Fred
val friends = scala.io.Source.fromFile("friends.csv").getLines.map { line =>
val fields = line.split(',')
Friend(fields(0).toInt, fields(1))
}

Scala collection of elements accessible by name

I have some Scala code roughly analogous to this:
object Foo {
val thingA = ...
val thingB = ...
val thingC = ...
val thingD = ...
val thingE = ...
val thingsOfAKind = List(thingA, thingC, thingE)
val thingsOfADifferentKind = List(thingB, thingD)
val allThings = thingsOfAKind ::: thingsOfADifferentKind
}
Is there some nicer way of declaring a bunch of things and being able to access them both individually by name and collectively?
The issue I have with the code above is that the real version has almost 30 different things, and there's no way to actually ensure that each new thing I add is also added to an appropriate list (or that allThings doesn't end up with duplicates, although that's relatively easy to fix).
The various things are able to be treated in aggregate by almost all of the code base, but there are a few places and a couple of things where the individual identity matters.
I thought about just using a Map, but then the compiler looses the ability to check that the individual things being looked up actually exist (and I have to either wrap code to handle a failed lookup around every attempt, or ignore the problem and effectively risk null pointer exceptions).
I could make the kind each thing belongs to an observable property of the things, then I would at least have a single list of all things and could get lists of each of the kinds with filter, but the core issue remains that I would ideally like to be able to declare that a thing exists, has a name (identifier), and is part of a collection.
What I effectively want is something like a compile-time Map. Is there a good way of achieving something like this in Scala?
How about this type of pattern?
class Things[A] {
var all: List[A] = Nil
def ->: (x: A): A = { all = x :: all; x }
}
object Test {
val things1 = new Things[String]
val things2 = new Things[String]
val thingA = "A" ->: things1
val thingB = "B" ->: things2
val thingC = "C" ->: things1
val thingD = ("D" ->: things1) ->: things2
}
You could also add a little sugar, making Things automatically convertible to List,
object Things {
implicit def thingsToList[A](things: Things[A]): List[A] = things.all
}
I can't think of a way to do this without the var that has equally nice syntax.