IntelliJ-Scala format list elements same line when using :: - scala

when building a nested object which gets a list as its imputs using :: for building the list the auto-format is misleading:
val sentimentExpectedSchema = StructType(
StructField("metadata", StructType(
StructField("type", StringType, nullable = false) ::
StructField("job_run_time", LongType, nullable = false) ::
StructField("version", LongType, nullable = false) :: Nil
)) :: Nil
)
Someone new to the code will have the first impression that job_run_time is nested to type, which is not true.
I would expect the following format-rule to be used:
val sentimentExpectedSchema = StructType(
StructField("metadata", StructType(
StructField("type", StringType, nullable = false) ::
StructField("job_run_time", LongType, nullable = false) ::
StructField("version", LongType, nullable = false) :: Nil
)) :: Nil
)
In this way, it is pretty much obvious that job_run_time and type are in the same nesting level.
Furthermore, if one builds the Array/List "normally", it will be properly formated:
val sentimentExpectedSchema = StructType(Array(
StructField("metadata", StructType(Array(
StructField("type", StringType, nullable = false),
StructField("job_run_time", LongType, nullable = false),
StructField("version", LongType, nullable = false)
)))
))
Is there a way to customize my IntelliJ configuration to properly format lists built using :: as it formats with Array()/List()?

I doubt that you will be able to always force new line on :: Nil and always having all of first :: second :: third :: Nil aligned to the same column - :: is just infix, aside from the fact that methods ending with : have reverse order in infix notation, is nothing special about it from the language point of view. It will always be indented same way as if it was x.call(y).call(z), the first element having one less indent as the following chains.
However, you can experiment with adding scalafmt to your project and trying out various newlines. config options to see which one would be unambiguous enough for you.
An example config could look like:
// put into .scalafmt.conf in the root of your project
version = 2.6.3
align = most
trailingCommas = preserve
maxColumn = 120
newlines.afterInfix = many,keep
newlines.afterInfixBreakOnNested = true
Try out some options and see which one works best for you. Using the formatting I have in my current project I got:
val sentimentExpectedSchema = StructType(
StructField(
"metadata",
StructType(
StructField("type", StringType, nullable = false) ::
StructField("job_run_time", LongType, nullable = false) ::
StructField("version", LongType, nullable = false) :: Nil
)
) :: Nil
)
which should be enough to not cause the false impression that job_run_time is nested to type. At least by my and other people used to this style - chain of :: is nested deeper than StructType which contains them, and each line is short enough to see that it end with ::. If it was too long to see this :: it would be broken down, with more indentations for nested elements.
(As a bonus, this config will be automatically imported by IntelliJ so everyone participating in your project will use the same settings. You can also enforce checking this formatting in CI, or during compilation by sbt plugin).
If that still doesn't meet your needs, just embrace the fact that :: chains are just method chains and will never have the same formatting as varargs, weather you use scalafmt or IntelliJ, so just use List(x, y, z) instead. It would actually feel wrong if they did since this is basically a multi-line expression which produces a single value, it should be obvious at a glance where it starts (the less indented line) and where it ends (the last more indented line). If "job_run_time" was indeed nested in "type", then "type" would have some closing parenthesis aligned to StructField AND THEN single indentation more would imply nesting. But there isn't one so it implies that these are just sequences of method calls of a single expression.
TBH I've never seen a :: b :: c :: Nil used intensively to construct values in any real prod codebases, since List(a, b, c) is always shorter. Only sometimes it made some sense to use :: instead of +: as prepend operation (val newList = newHead :: oldList). Obviously, it also makes sense in pattern-matching (case a :: b :: c :: Nil =>).
The only real use case when constructing values with :: is actually needed, that I can think of, is building HList with Shapeless ("x" :: 2 :: 'c' :: HNil), because you cannot build heterogeneous list type with varargs.
TL;DR - just use List or get used to what standard formatters do.

Related

Why does the `is not a member of` error come while creating a list in scala using the :: operator

I am learning scala and I've noticed that the following line of code doesn't work
val worldFreq = ("India", 1) :: ("US", 2) :: ("Berlin", 10)
Results in the error : error: value :: is not a member of (String, Int) val worldFreq = ("India", 1) :: ("US", 2) :: ("Berlin", 10)
However this line of code works perfectly
val worldFreq = ("India", 1) :: ("US", 2) :: ("Berlin", 10) :: Nil
worldFreq: List[(String, Int)] = List((India,1), (US,2), (Berlin,10))
Can someone help me understand the error message and the fact the it works with Nil.
It happens because :: is right-associative operator.
So, when you type (1, 2) :: Nil it transforms to Nil.::((1,2)). And obviously, there is no :: method on tuples, so you can't write (1, 2) :: (3, 4).
You can read more here: Scala's '::' operator, how does it work?

Second Element of a List

From the Book programming in Scala I got the following line of code:
val second: List[ Int] => Int = { case x :: y :: _ => y }
//warning: match may not be exhaustive.
It states that this function will return the second element of a list of integers if the list is not empty or nil. Stil this part is a bit awkward to me:
case x :: y :: _
How does this ecxactly work? Does this mathches any list with at least 2 Elements and than return the second? If so can somebody still explain the syntax? I understood that :: is invoked on the right operand. So it could be written as
(_.::(y)).::(X)
Still I than don't get why this would return 2
val second: List[ Int] => Int = { case x :: y :: _ => y }
var x = List(1,2)
second(x) //returns 2
In the REPL, you can type:
scala> val list = "a" :: "b" :: Nil
list: List[String] = List(a, b)
which is to be read from right to left, and means take the end of a List (Nil), prepend String "b" and to this List ("b" :: Nil) prepend String a, a :: ("b" :: Nil) but you don't need the parens, so it can be written "a" :: "b" :: Nil.
In pattern matching you will more often see:
... list match {
case Nil => // ...
case x :: xs => // ...
}
to distinguish between empty list, and nonempty, where xs might be a rest of list, but matches Nil too, if the whole list is ("b" :: Nil) for example, then x="b" and xs=Nil.
But if list= "a" :: "b" :: Nil, then x="a" and xs=(b :: Nil).
In your example, the deconstruction is just one more step, and instead of a name like xs, the joker sign _ is used, indicating, that the name is probably not used and doesn't play a role.
The value second is of function type, it takes List[Int] and returns Int.
If the list has first element ("x"), and a second element ("y"), and whatever comes next (we don't care about it), we simply return the element "y" (which is the second element of the list).
In any other case, the function is not defined. You can check that:
scala> val second: PartialFunction[List[Int], Int] = {
| case x :: y :: _ => y
| }
second: PartialFunction[List[Int],Int] = <function1>
scala> second.isDefinedAt(List(1,2,3))
res18: Boolean = true
scala> second.isDefinedAt(List(1,2))
res19: Boolean = true
scala> second.isDefinedAt(List(0))
res20: Boolean = false
First of all. When you think about pattern matching you should think about matching a structure.
The first part of the case statement describes a structure. This structure may describe one or more things (variables) which are useful to deriving your result.
In your example, you are interested in deriving the second element of a list. A shorthand to build a list in Scala is to use :: method (also called cons). :: can also be used to describe a structure in case statement. At this time, you shouldn't think about evaluation of the :: method in first part of case. May be that's why you are saying about evaluation of _.::(y).::(x). The :: cons operator help us describe the structure of the list in terms of its elements. In this case, the first element (x) , the second element (y) and the rest of it (_ wildcard). We are interested in a structure that is a list with at least 2 elements and the third can be anything - a Nil to indicate end of list or another element - hence the wildcard.
The second part of the case statement, uses the second element to derive the result (y).
More on List and Consing
List in Scala is similar to a LinkedList. You know about the first element called head and start of the rest of the list. When traversing the linked list you stop if the rest of the list is Nil. This :: cons operator helps us visualise the structure of the linked list. Although Scala compile would actually be calling :: methods evaluating from right to left as you described _.::(y).::(x)
As an aside, you might have already noticed that the Scala compiler might be complain that your match isn't exhaustive. This means that this second method would work for list of any size. Because there isn't any case statement to describe list with zero or one element. Also, as mentioned in comments of previous answers, if you aren't interested in first element you can describe it as a wildcard _.
case _ :: y :: _ => y
I hope this helped.
If you see the structure of list in scala its head::tail, first element is treated as head and all remaining ones as tail(Nil will be the last element of tail). whenever you do x::y::_, x will match the head of the list and remaining will be tail and again y will match the head of the next list(tail of first list)
eg:
val l = List(1,2,3,4,5)
you can see this list in differnt ways:
1::2::3::4::5::Nil
1::List(2,3,4,5)
1::2::List(2,3,4,5)
and so on
So try matching the pattern. In your question y will give the second element

Shapeless record: update a field with different type

Is it possible to update an HList record with a different type, since given a list:
val l1 = 'field1 ->> 1 :: 'field2 ->> 2 :: HNil
updating field2 with a different type would not update but add a new field:
l1 + ('field2 ->> "2")
//1 :: 2 :: "2" :: HNil
Is it possible to disable this behaviour?
You can do it by importing record ops and then use updateWith:
import shapeless.record._
l1.updateWith('field2)(_ => "2")
The function of the second section of updateWith is, roughly speaking, of type A => B, where A is the original type "pointed" by 'field2 and B is the type you want to transform it to. So, since the original value for 'field2 was 2, you could have done the transformation this way as well:
l1.updateWith('field2)(_.toString)

Class cast exception when describing a data frame

I have a small dataset in csv format with two columns of integers over which I am computing summary statistics. There should be no missing or bad data:
import org.apache.spark.sql.types._
import org.apache.spark.sql._
val raw = sc.textFile("skill_aggregate.csv")
val struct = StructType(StructField("personid", IntegerType, false)
:: StructField("numSkills", IntegerType, false) :: Nil)
val rows = raw.map(_.split(",")).map(x => Row(x(0), x(1)))
val df = sqlContext.createDataFrame(rows, struct)
df.describe().show()
The last line gives me:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
which of course implies some bad data. The weird bit is that I can "collect" the entire data set without issue which implies each row correctly conforms to the IntegerType described in the schema. Also odd is that I can't find any NA values when I open the dataset up in R.
Why don't you use the databricks-csv reader (https://github.com/databricks/spark-csv) ? is easier and safer to create dataframes from a csv file and it allows you to define a schema of your fields (and avoid cast problems).
The code is very simple to achieve it :
myDataFrame = sqlContext.load(source="com.databricks.spark.csv", header="true", path = myFilePath)
Greetings,
JG
I found the error. It was necessary to add toInt to each row entry:
val rows = raw.map(_.split(",")).map(x => Row(x(0).toInt, x(1).toInt))

Pre- and Append to a List

Using scala, I try to concatenate multiple elements to a list as follows
val min = func1()
val max = func1()
val interpol : List[Float] = func2()
val res : List[Float] = (min.toFloat) :: interpolated :: (max.toFloat) :: Nil
This syntax does not work because of a type mismatch error. How could I pre- and append elements to a list (in a very elegant way, i.e., without using list buffers, etc.)?
Btw, I also tried
val res : List[Float] = (min.toFloat) :: interpolated :: List(max.toFloat)
but got a type mismatch error (List[Any] vs List[Float])
The Peter Neyens solution works fine.
Personally, i prefer this one
min.toFloat +: interpolated :+ max.toFloat
+: and :+ are defines in Seq, so works not only for List, but for Vector too
You are prepending the min.toFloat to the interpolated list, but you can't prepend the resulting list to the list with the maximum you have created (max.toFloat :: Nil), you will need ::: to concatenate these two lists.
(min.toFloat) :: interpolated ::: ((max.toFloat) :: Nil)