I'm trying to read in a CSV and store it in a matrix-like array of arrays object. A roadblock I am hitting is that the strings are read in with the surrouding quotes - that is to say, the string "price" is not just the word price, but """"price"""" in scala. Consequently, I want to remove those surrounding quotes. I also want to make sure that any numeric values are coerced to Double/Int, as they are read in as strings.
What I have now:
val rawObs = io.Source.fromFile(file).getLines() .map(_.split(",")).toArray
// An example element of the array is:
//scala> rawObs(2)
//res93: Array[String] = Array("3", 0, "2013-02-27", 1, 52, 52, 1, "1", "kg")
// Here, I make a function to remove surrounding strings and return a string, or if there are
// not surrounding strings, return a Double.
def fixRawObs(x: String) = {
// if it was actually meant to be a string:
if(x.startsWith(""""""")){
// delete any " quotes
var y = x.replaceAll(""""""", "")
} else { // this means that x needs to be coerced to Int or Double
var y = x.toDouble
}
y // return y
}
// but this won't compile, it returns <console>:14: error: not found: value y
// If it did compile, I'd want to do something like this:
rawObs.map(_.map(fixRawObs(_)))
// (although, is there a better way?)
So, basically, my first question is how to fix my fixRawObs function, and secondarily, is this even an okay way to do this or is there some nicer way to accomplish what I want? What I'm doing feels kind of hackish.
I'm super super new to Scala so it would be greatly appreciated if answers didn't assume much knowledge. Thank you!
There are a few problems with your code:
You are trying to store Strings and Doubles in an Array. Since the closest common super type of Strings and Doubles is Any, you will have an Array[Any]. With an Array[Any], you will need to cast the values inside as Strings or Doubles whenever you want to use them, and this is not desirable.
Your function fixRawObs() is not compiling because it is trying to return an inaccessible variable. "y" is declared inside of curly braces, which makes it inaccessible outside of the curly braces. "y" actually is not even necessary, because an if statement in Scala returns a value, just like a function can. You could do this:
def fixRawObs(x: String) = {
if(x.startsWith(""""""")) x.replaceAll(""""""", "")
else x.toDouble
}
The return type of this function is "Any", though, so you would still have to cast the return values to the proper types manually. Again, this is not a good approach.
I recommend creating a class, so that you can have a custom data structure that references your values using their proper types.
case class Row(
col1: String, col2: Double, col3: String, col4: Double,
col5: Double, col6: Double, col7: Double, col8: String, col9: String
)
It would be best if you rename the values with appropriate, descriptive names.
You can then create your row objects like this:
def stripQuotes(s: String): String = {
if(s.startsWith("\"") && s.endsWith("\"")) s.dropRight(1).dropLeft(1)
else s
}
val csv = io.Source.fromFile(file)
val rows = (for {
line <- file.getLines
s = line.split(",")
if(s.size == 9)
} yield {
new Row(
stripQuotes(s(0)),
s(1).toDouble,
stripQuotes(s(2)),
s(3).toDouble,
s(4).toDouble,
s(5).toDouble,
s(6).toDouble,
stripQuotes(s(7)),
stripQuotes(s(8))
)
}).toArray
csv.close()
You probably want to use a library that parses CSV files instead of trying to get through the edge cases by yourself. There are many options for Scala/Java (one two).
If you're practicing Scala, I'll explain why it won't compile. The issue is that you're trying to return y, which is defined in the scope of your loop and isn't available outside of it.
In Scala, the last statement of the function is the return value. So making your if statement the last one in the function, and returning the replaced/parsed value right away will do what you want.
def fixRawObs(x: String) = {
x.startsWith("\"") match {
case true =>
x.replaceAll("\"", "")
case false =>
x.toDouble
}
}
Note that the function will return an instance of Any - the superclass of all classes in Scala. This is because you are returning a String in one clause and a Double in another.
Knowing the specific format of your data (e.g. is a given field always a double or is it always a string), you can rewrite it to be more precise and support the actual types.
This simple library: product-collections does something similar to what you are trying to do with your array of arrays. It also has an intuitive csv reader.
Update: the library now handles case classes directly:
val csv = Array("3", 0, "2013-02-27", 1, 52, 52, 1, "1", "kg").mkString(",")
csv: String = 3,0,2013-02-27,1,52,52,1,1,kg
scala> case class Row(
| col1: String, col2: Double, col3: String, col4: Double,
| col5: Double, col6: Double, col7: Double, col8: String, col9: String
| )
defined class Row
scala> CsvParser(Row).parse(new java.io.StringReader(csv))
res33: Seq[Row] = List(Row(3,0.0,2013-02-27,1.0,52.0,52.0,1.0,1,kg))
Related
my Task is to read registrations from a file given like:
Keri,345246,2
Ingar,488058,2
Almeta,422016,1
and insert them into a list(Tuple of (String, Int, Int).
So far I wrote this:
The problem is that I don‘t understand why I can't try to cast value2 and value3 to Int even tho they should be Strings because they come from an Array of Strings. Could someone tell me, what my mistake is, I am relatively new to Scala
What is the point of using Scala if you are going to write Java code?
This is how you would properly read a file as a List of case classes.
import scala.io.Source
import scala.util.Using
// Use proper names for the fields.
final case class Registration(field1: String, field2: Int, field3: Int)
// You may change the error handling logic.
def readRegistrationsFromFile(fileName: String): List[Registration] =
Using(Source.fromFile(fileName)) { source =>
source.getLines().map(line => line.split(',').toList).flatMap {
case field1Raw :: field2Raw :: field3Raw :: Nil =>
for {
field2 <- field2Raw.toIntOption
field3 <- field3Raw.toIntOption
} yield Registration(field1 = field1Raw.trim, field2, field3)
case _ =>
None
}.toList
}.getOrElse(default = List.empty)
(feel free to ask any question you may have about this code)
In Scala, in order to convert a String to an Int you need explicit casting.
This can be achieved like this if you are sure the string can be parsed into a integer:
val values = values(1).toInt
If you cannot trust the input (and you probably should not), you can use .toIntOption which will give you a Option[Int] defined if the value was converted successfully or undefined if the string did not represent an integer.
The previous answers are correct. I would add a few more points.
saveContent is declared as a val. This is means it cannot be changed (assigned another value). You can use the Scala REPL (command-line) tool to check:
scala> val saveContent = Nil
val v: collection.immutable.Nil.type = List()
scala> saveContent = 3
^
error: reassignment to val
Instead, you could use a var, although it would be more idiomatic to have an overall pattern like the one provided by Luis Miguel's answer - with pattern-matching and a for-comprehension.
You can use the Scala REPL to check the types of the variables, too. Splitting a String will always lead to more Strings, not Ints, etc.
> val values = "a,2,3".split(",")
val values: Array[String] = Array(a, 2, 3)
> values(2)
val res3: String = 3
This is why a cast like Gael's is necessary.
Array-type access is done with parentheses and not square brackets, in Scala. See above, and http://scalatutorials.com/tour/interactive_tour_of_scala_lists for more details.
The code below seems to work fine. But my question is that why the function is defined to take Strings as input but on the bottom, it has accepted some input integers?
case class
TempData(year:Int, month:String, Prec:Double, Maxtemp:Int, Meantemp:Int, Mintemp:Int)
def parseLine(line:String):TempData = {
val p = line.split(",")
TempData(p(1).toInt, p(3).toString, p(5).toDouble, p(6).toInt, p(8).toInt, p(10).toInt)
}
//> parseLine: (line: String)Tests.TempData
parseLine("Verizon, 2017, Alpha, October, gentlemen, 10.3, 5, Dallas, 67, schools, 42")
//> res0: Tests.TempData = TempData(2017, October, 10.3, 5, 67, 42)
The defined function parseLine(line: String) takes String type arguments, and you have passed it correctly. The confusion is that you are considering the numbers present in the string as integers(Int) which is incorrect. The numbers present in the string are of type String only because you have enclosed them in quotes. You can also reason out for using toInt for converting the the numbers from String to Int type.
Your function takes one single String as an argument..
def parseLine(line: String): TempData
And one single String is exactly what is passed to it:
scala> "Verizon,2017,Alpha,October,gentlemen,10.3,5,Dallas,67,schools,42"
res0: String = Verizon,2017,Alpha,October,gentlemen,10.3,5,Dallas,67,schools,42
The numbers are just plain text and are a part of that String, there are no integers involved. It's the body of your function that splits the String and extracts Ints from it.
edit: regarding the title of your post: all of that has nothing to do with your TempData case class. It seems that you are confusing the call of the case class constructor with the call to parseLine.
Scala already supports this:
val (a : Int, b: String, c: Double) = {
val someInt : Int = 42 // ... complex computations ...
val someString : String = "42" // ... complex computations ...
val someDouble : Double = 42.0D // ... complex computations ...
(someInt, someString, someDouble)
}
I was surprised when I couldn't do this:
class Foo(a : Int, b: String, c: Double) {
def this(tuple: (Int, String, Double)) = {
this {
val someInt : Int = 42 // ... complex computations ...
val someString : String = "42" // ... complex computations ...
val someDouble : Double = 42.0D // ... complex computations ...
(someInt, someString, someDouble)
}
}
}
The compiler yields this:
Error:(3, 6) called constructor's definition must precede calling
constructor's definition
this {
^
It seems natural to me to think of a arguments to a constructor as a tuple, but perhaps that's just me. I realize that I can accomplish a similar thing using
the companion object but that places the logic outside of the class itself.
Do you know a way to place the "complex computations" in a non-primary constructor without using the companion object?
What you do with
val (a : Int, b: String, c: Double) = someComputation
is, computing a value on the right side of the equal sign, and assign it to the name(s) on the left side of the equal sign.
What you do in the second example is
class Foo(a : Int, b: String, c: Double) {
you create a class with a default constructor taking three single parameters. If you like to think of it as a tuple or not, it simply is a parameter list with three single parameters.
Then with
def this(tuple: (Int, String, Double)) =
you try to define another constructor taking one parameter named "tuple" of type 3-tuple.
BUT
When you write something like this
def someName( aParam: TheType ) = { computation }
you declare a named function/method and its parameter list, then to the right of the equal sign you define its implementation.
The equal sign now is a marker, that the computation will not be without result, but indeed the result of the computation will be the result of the whole function/method.
So, you somehow may have to think in the other direction. The parameters flowing in from the left of the equal sign lead to a result to the right of the equal sign, where this "lead to" is defined by the computation.
One thing more: You can -and sometimes must- also leave off the equal sign:
def someMethod { someSideEffect }
This defines a method where nothing is returned.
At least until Scala 2.11, a constructor is of the last sort: It has no equal sign, as it does not return a value. Much more, as a side effect, it initialises the object under construction.
That is, space for the object is allocated, and this space is initialised by executing the constructor's implementation with the given parameters.
So, coming back to your second example, with
def this(tuple: (Int, String, Double))
what you do is, declaring a constructor (an implementation to initialise an object under construction) taking one parameter named "tuple".
Now, in the code block following the declaration, you should define how to use the given tuple to initialise the object.
But what you do instead is
this {
val someInt : Int = 42 // ... complex computations ...
val someString : String = "42" // ... complex computations ...
val someDouble : Double = 42.0D // ... complex computations ...
(someInt, someString, someDouble)
}
i.e, you call the exactly same constructor with a value computed inside its implementation definition.
So, when this constructor is called with a tuple, you define that it should forget this tuple, but instead call that same constructor with another, computed, tuple, but by calling it, this newly created tuple will be forgotten, just to call that constructor with another, computed, tuple, but by calling it .....
This is confusing to the Scala-ist as well to the compiler. It does not know what indeed should happen when with what computations any more (and why ...).
You declared something like a hen-and-egg problem to the compiler.
So tl;dr : NO.
The first example assigns values computed from the right side of '=' to the name(s) on the left side.
The second example defines a recursive constructor implementation, what simply is not possible in Scala.
For example my case class is
case class Test(id: String, myValues: List[Item])
case class Item(id: Long, order: Long)
and I get string value like
val checkValue: String = "id"
I want sort Tests by items and I want it to look like
val test= Test("0", List(Item(0, 14), Item(1, 34))
val sortedItems = test.myValues.map(_.*checkValue*).sorted
Its about get field of class like someInstanceOfClass.checkValue
Scala is not an interpreted language, therefore you can't just use strings as variable names. The easiest way to solve your problem is to map the string value to the variable:
scala> def get(item: Items, str: String) = str match {
| case "id" => item.id
| case "order" => item.order
| }
get: (item: Items, str: String)Long
scala> test.myValues.map(get(_, checkValue)).sorted
res0: List[Long] = List(0, 1)
scala> test.myValues.map(get(_, "order")).sorted
res1: List[Long] = List(14, 34)
Of course there are more ways to solve the problem. You could use Reflection to read the name of the variable at runtime. In case you already know at compile time the name of the variable you want to read, you could also use macros to generate the code that is doing what you want. But these are both very specialized solutions, I would go with the runtime matching as shown above.
You may wish to rethink how you're going about this. What good does the string "id" actually do you? If you just need the capability to pull out a particular bit of data, why not use a function?
val f: Item => Long = _.id
Do you not want to have to type the function type over and over again? That's fine too; you can use a method to request the compiler's help filling in the type arguments:
def pick[A](x: Item => A) = x
val f = pick(_.id)
Now you can use f anywhere you would have used "id". (You can even name it id instead of f if that will help, or something that reminds you that it's actually a function that gets an id, not an id itself, like idF or getId.)
I would like a function to consume tuple of 7 but compiler won't let me with the shown message. I failed to find a proper way how to do it. Is it even possible without explicitely typing all the type parameters like Tuple7[String,String...,String] and is it even a good idea to use Scala like this ?
def store(record:Tuple7): Unit = {
}
Error:(25, 20) class Tuple7 takes type parameters
def store(record: Tuple7): Unit = {
^
As stated by Luis you have to define what Type goes on which position for every position in the Tuple.
I`d like to add some approaches to express the same behaviour in different ways:
Tuple Syntax
For that you have two choices, what syntax to use to do so:
Tuple3[String, Int, Double]
(String, Int, Double)
Approach using Case Classes for better readability
Long tuples are hard to handle, especially when types are repeated. Scala offers a different approach for handling this. Instead of a Tuple7 you can use a case class with seven fields. The gain in this approach would be that you now can attach speaking names to each field and also the typing of each position makes more sense if a name is attached to it.
And the chance of putting values in wrong positions is reduced
(String, Int, String, Int)
// vs
case class(name: String, age: Int, taxNumber: String, numberOfChildren: Int)
using Seq with pattern matching
If your intention was to have a sequence of data seq in combination with pattern matching could also be a nice fit:
List("name", 24, "", 5 ) match {
case name:String :: age:Int ::_ :: _ :: Nil => doSomething(name, age)
}
This only works nice in a quite reduced scope. Normally you would lose a lot of type information as the List is of type Any.
You could do the following :
def store(record: (String, String, String, String, String, String, String)):Unit = {
}
which is the equivalent of :
def store(record: Tuple7[String, String, String, String, String, String, String]):Unit = {
}
You can read more about it in Programming in Scala, 2nd Edition, chapter "Next Steps in Scala", sub-chapter "Step 9. use Tuples".