Scala Slick Bulk Insert with Array - scala

I'm splitting a long string into an array of strings, then I want to insert all of them to the database. I can easily loop through the array and insert one by one, but it seems very inefficient. Then I think there is a insertAll() method. However, the insertAll() method is defined like this:
def insertAll(values: U*)
This only accepts multiple U, but not an Array or List.
/* Insert a new Tag */
def insert(insertTags: Array[String])(implicit s: Session) {
var insertSeq: List[Tag] = List()
for(tag <- insertTags) {
insertSeq ++= new Tag(None, tag)
}
Tag.insertAll(insertSeq)
}
*Tag is the table Object
This is the preliminary code I have written. It doesn't work because insertAll() doesn't take Seq. I hope there is a way to do this...so it won't generate an array length times SQL insertion clause.

When a function expect repeated parameters such as U* and you want to pass a sequence of U instead, it must be marked as a sequence argument, which is done with : _* as in
Tag.insertAll(insertSeq: _*)
This is described in the specification in section 6.6. It disambiguates some situations such as ;
def f(x: Any*)
f(Seq(1, "a"))
There, f could be called with a single argument, Seq(1, "a") or two, 1 and "a". It will be the former, the latter is done with f(Seq(1, "a"): _*). Python has a similar syntax.
Regarding the questions in your comment :
It works with Seq, collections that confoms to Seq, and values that can be implicitly converted to Seqs (which includes Arrays). This means a lot of collections, but not all of them. For instances, Set are not allowed (but they have a toSeq methods so it is still easy to call with a Set).
It is not a method, it it more like a type abscription. It simply tells the compiler that this the argument is the full expected Seq in itself, and not the only item in a sequence argument.

Related

In Slick 3, how does one SQL-Compile an insert using a mapped case class?

To SQL-compile a query, you need to compile a function which takes, for each query parameter arg: type, a lifted parameter of type Rep[type] .
I have a case class JobRecord and a TableQuery jobRecords.
So to insert a JobRecord case-class instance, I need to be able to say something like:
val qMapToId = (jobRecords returning jobRecords.map(_.id))
def ucCreate(jobRecord: Rep[JobRecord]) = qMapToId += jobRecord
val cCreate = Compiled(ucCreate _)
But of course this doesn't compile, because += doesn't take a Rep, and I'm not sure Rep[JobRecord] is valid either.
I've tried many things, not worth showing, including mixing in the Monomorphic Case Classes guidance. I probably got a step away from a solution a few times. A pointer to a working example would be great!
You don't have to do anything, val qMapToId = (jobRecords returning jobRecords.map(_.id)) generates the statement once, at compile time (i.e. on container startup).
Compiled replaces Parameters in the new API, and comes into play for selects, updates, and (I believe) deletes where you are binding placeholders for generating a prepared statement. For insert statements there's nothing to bind, you already have an instance for +=.
You can use TableQuery[] as follows.
# define TableQuery of JobRecord
case class JobRecordRow(...)
class JobRecord(tag:Tag) extends Table[JobRecordRow](tag, "JOB_TABLE_NAME") {
}
# define compiled query
val insert = Compiled( TableQuery[JobRecord].filter(_ => true:Rep[Boolean]))
val stmt = (insert += JobRecordRow(...))
db.run( stmt)
The Compiled query seems little bit tricky. However, when I tried Compiled(TableQuery[JobRecord]) as suggested in other articles, it did not work. By adding filter(), I could build insert query.
Updated on 2019-07-21
Instead of filter(), map(identity) can be used.
TableQuery[JobRecord].map(identity)

What does the : mean after a value in Scala?

I just started learning Scala, and I've noticed that : is used in many places. Most of the time, the : usage makes sense, e.g. after parameter names or method declarations. The following usage confuses me, however:
val a = Seq[String]("a", "b")
a :+ "c"
or
def myMethod(varargs: String*) {
// ...
}
val a = Seq[String]("a", "b", "c")
myMethod(a:_*)
What exactly is the : doing in these cases? Why can't I call a._* directly?
The two usages that you are asking about are two completely different cases.
a :+ "c"
The : doesn't mean anything by itself here; it's part of a method named :+, which appends an element to a Seq.
myMethod(a:_*)
Here, you have a method myMethod which takes a variable number of arguments. You want to use the Seq to fill the arguments; the : _* indicates that you want to do that (rather than pass the Seq itself as the first argument of the method).
Note that : has a special meaning if a method name ends with : (not if it begins with : as in your first example). In that case, the method will be right-associative; it means that the method will be called on the thing on the right, with the thing on the left as the argument, rather than the reverse.
It is telling the compiler that you want your sequence to be taken apart and supplied as separate parameters.
def myMethod(varargs: String*) = {
// varargs is a Seq
val x:Seq[String] = varargs
}
// Calling it is different
myMethod("a", "b")
// To call using a seq, you need to signal the compiler
myMethod(mySeqOfString: _*)
// They choose the `:` because this would be valid too:
myMethod(myString: String)
Int the other example you gave a :+ "c" the : has a different meaning. It's simply part of the method name. They could have named it append, but they chose :+. The reason is that the : in +: has a special meaning: bind to the right. This allows for "c" +: a. So for sake of consistency they probably chose :+ for append and +: for prepend.
:+ is a method on Seq[A] which appends the given item to the sequence.
In the second example, myMethod(a : _*) the a : _* is used to pass a sequence as the variable argument list to the myMethod function.
The significance of the colon in Scala varies according to context.
As you note, it separates an identifier in a declaration from its type.
It can also appear as part of an operator involving a sequence.
Since an operator in Scala is really just a method, its use here is arbitrary, but you can rely on the convention that the colon is on the side of a binary operator that points to the sequence.
So your example
a :+ "c"
Appends the string "c" to a sequence of strings, while
"c" +: a
Prepends the string to the sequence. (Or, more precisely in the case of an immutable sequence, returns a new sequence resulting from the prepend/append operation. See documentation)
Your final example
myMethod(a: _*)
(Which I had to look up thanks for teaching! see documentation) tells the compiler to expand the sequence as varargs (rather than as a single argument).

Is Queue.foreach properly ordered?

Will the foreach method of a Scala immutable Queue always be processed in the order one expects for a queue or is there a method that guarantees the order? Or do I have to use a loop + dequeue?
scala.collection.immutable.Queue is scala.collection.Seq. See Seq documentation:
Sequences are special cases of iterable collections of class Iterable. Unlike iterables, sequences always have a defined order of elements.
So yes, you'll get the same elements order with foreach and with loop + dequeue.
If you don't trust documentation you could take a look at implementation:
Queue#foreach is inherited from IterableLike and implemented like this:
def foreach[U](f: A => U): Unit = iterator.foreach(f)
Queue#iterator is implemented like this:
override def iterator: Iterator[A] = (out ::: in.reverse).iterator
And Queue#dequeue returns the first element of out, if any, or the last element of in. So you'll get the same elements order.

Ending a for-comprehension loop when a check on one of the items returns false

I am a bit new to Scala, so apologies if this is something a bit trivial.
I have a list of items which I want to iterate through. I to execute a check on each of the items and if just one of them fails I want the whole function to return false. So you can see this as an AND condition. I want it to be evaluated lazily, i.e. the moment I encounter the first false return false.
I am used to the for - yield syntax which filters items generated through some generator (list of items, sequence etc.). In my case however I just want to break out and return false without executing the rest of the loop. In normal Java one would just do a return false; within the loop.
In an inefficient way (i.e. not stopping when I encounter the first false item), I could do it:
(for {
item <- items
if !satisfiesCondition(item)
} yield item).isEmpty
Which is essentially saying that if no items make it through the filter all of them satisfy the condition. But this seems a bit convoluted and inefficient (consider you have 1 million items and the first one already did not satisfy the condition).
What is the best and most elegant way to do this in Scala?
Stopping early at the first false for a condition is done using forall in Scala. (A related question)
Your solution rewritten:
items.forall(satisfiesCondition)
To demonstrate short-circuiting:
List(1,2,3,4,5,6).forall { x => println(x); x < 3 }
1
2
3
res1: Boolean = false
The opposite of forall is exists which stops as soon as a condition is met:
List(1,2,3,4,5,6).exists{ x => println(x); x > 3 }
1
2
3
4
res2: Boolean = true
Scala's for comprehensions are not general iterations. That means they cannot produce every possible result that one can produce out of an iteration, as, for example, the very thing you want to do.
There are three things that a Scala for comprehension can do, when you are returning a value (that is, using yield). In the most basic case, it can do this:
Given an object of type M[A], and a function A => B (that is, which returns an object of type B when given an object of type A), return an object of type M[B];
For example, given a sequence of characters, Seq[Char], get UTF-16 integer for that character:
val codes = for (char <- "A String") yield char.toInt
The expression char.toInt converts a Char into an Int, so the String -- which is implicitly converted into a Seq[Char] in Scala --, becomes a Seq[Int] (actually, an IndexedSeq[Int], through some Scala collection magic).
The second thing it can do is this:
Given objects of type M[A], M[B], M[C], etc, and a function of A, B, C, etc into D, return an object of type M[D];
You can think of this as a generalization of the previous transformation, though not everything that could support the previous transformation can necessarily support this transformation. For example, we could produce coordinates for all coordinates of a battleship game like this:
val coords = for {
column <- 'A' to 'L'
row <- 1 to 10
} yield s"$column$row"
In this case, we have objects of the types Seq[Char] and Seq[Int], and a function (Char, Int) => String, so we get back a Seq[String].
The third, and final, thing a for comprehension can do is this:
Given an object of type M[A], such that the type M[T] has a zero value for any type T, a function A => B, and a condition A => Boolean, return either the zero or an object of type M[B], depending on the condition;
This one is harder to understand, though it may look simple at first. Let's look at something that looks simple first, say, finding all vowels in a sequence of characters:
def vowels(s: String) = for {
letter <- s
if Set('a', 'e', 'i', 'o', 'u') contains letter.toLower
} yield letter.toLower
val aStringVowels = vowels("A String")
It looks simple: we have a condition, we have a function Char => Char, and we get a result, and there doesn't seem to be any need for a "zero" of any kind. In this case, the zero would be the empty sequence, but it hardly seems worth mentioning it.
To explain it better, I'll switch from Seq to Option. An Option[A] has two sub-types: Some[A] and None. The zero, evidently, is the None. It is used when you need to represent the possible absence of a value, or the value itself.
Now, let's say we have a web server where users who are logged in and are administrators get extra javascript on their web pages for administration tasks (like wordpress does). First, we need to get the user, if there's a user logged in, let's say this is done by this method:
def getUser(req: HttpRequest): Option[User]
If the user is not logged in, we get None, otherwise we get Some(user), where user is the data structure with information about the user that made the request. We can then model that operation like this:
def adminJs(req; HttpRequest): Option[String] = for {
user <- getUser(req)
if user.isAdmin
} yield adminScriptForUser(user)
Here it is easier to see the point of the zero. When the condition is false, adminScriptForUser(user) cannot be executed, so the for comprehension needs something to return instead, and that something is the "zero": None.
In technical terms, Scala's for comprehensions provides syntactic sugars for operations on monads, with an extra operation for monads with zero (see list comprehensions in the same article).
What you actually want to accomplish is called a catamorphism, usually represented as a fold method, which can be thought of as a function of M[A] => B. You can write it with fold, foldLeft or foldRight in a sequence, but none of them would actually short-circuit the iteration.
Short-circuiting arises naturally out of non-strict evaluation, which is the default in Haskell, in which most of these papers are written. Scala, as most other languages, is by default strict.
There are three solutions to your problem:
Use the special methods forall or exists, which target your precise use case, though they don't solve the generic problem;
Use a non-strict collection; there's Scala's Stream, but it has problems that prevents its effective use. The Scalaz library can help you there;
Use an early return, which is how Scala library solves this problem in the general case (in specific cases, it uses better optimizations).
As an example of the third option, you could write this:
def hasEven(xs: List[Int]): Boolean = {
for (x <- xs) if (x % 2 == 0) return true
false
}
Note as well that this is called a "for loop", not a "for comprehension", because it doesn't return a value (well, it returns Unit), since it doesn't have the yield keyword.
You can read more about real generic iteration in the article The Essence of The Iterator Pattern, which is a Scala experiment with the concepts described in the paper by the same name.
forall is definitely the best choice for the specific scenario but for illustration here's good old recursion:
#tailrec def hasEven(xs: List[Int]): Boolean = xs match {
case head :: tail if head % 2 == 0 => true
case Nil => false
case _ => hasEven(xs.tail)
}
I tend to use recursion a lot for loops w/short circuit use cases that don't involve collections.
UPDATE:
DO NOT USE THE CODE IN MY ANSWER BELOW!
Shortly after I posted the answer below (after misinterpreting the original poster's question), I have discovered a way superior generic answer (to the listing of requirements below) here: https://stackoverflow.com/a/60177908/501113
It appears you have several requirements:
Iterate through a (possibly large) list of items doing some (possibly expensive) work
The work done to an item could return an error
At the first item that returns an error, short circuit the iteration, throw away the work already done, and return the item's error
A for comprehension isn't designed for this (as is detailed in the other answers).
And I was unable to find another Scala collections pre-built iterator that provided the requirements above.
While the code below is based on a contrived example (transforming a String of digits into a BigInt), it is the general pattern I prefer to use; i.e. process a collection and transform it into something else.
def getDigits(shouldOnlyBeDigits: String): Either[IllegalArgumentException, BigInt] = {
#scala.annotation.tailrec
def recursive(
charactersRemaining: String = shouldOnlyBeDigits
, accumulator: List[Int] = Nil
): Either[IllegalArgumentException, List[Int]] =
if (charactersRemaining.isEmpty)
Right(accumulator) //All work completed without error
else {
val item = charactersRemaining.head
val isSuccess =
item.isDigit //Work the item
if (isSuccess)
//This item's work completed without error, so keep iterating
recursive(charactersRemaining.tail, (item - 48) :: accumulator)
else {
//This item hit an error, so short circuit
Left(new IllegalArgumentException(s"item [$item] is not a digit"))
}
}
recursive().map(digits => BigInt(digits.reverse.mkString))
}
When it is called as getDigits("1234") in a REPL (or Scala Worksheet), it returns:
val res0: Either[IllegalArgumentException,BigInt] = Right(1234)
And when called as getDigits("12A34") in a REPL (or Scala Worksheet), it returns:
val res1: Either[IllegalArgumentException,BigInt] = Left(java.lang.IllegalArgumentException: item [A] is not digit)
You can play with this in Scastie here:
https://scastie.scala-lang.org/7ddVynRITIOqUflQybfXUA

Stream as constructor arg sometimes fully evaluated during early class initialization

Streams can be used as class constructor arguments:
scala> ( 0 to 10).toStream.map(i =>{println("bla" + i); -i})
bla0
res0: scala.collection.immutable.Stream[Int] = Stream(0, ?)
scala> class B(val a:Seq[Int]){println(a.tail.head)}
defined class B
scala> new B(res0)
bla1
-1
res1: B = B#fdb84e
So, the Stream does not get fully evaluated although handed in as a Seq argument, and although being partly evaluated. Works as expected.
I have a class like this:
class HazelSimpleResultSet[T] (col: Seq[T], comparator:Comparator[T]) extends HGRandomAccessResult[T] with CountMe
{
val foo: Int = -1 // col of type Stream[T] already fully evaluated here.
def count = col.size
....
}
where HGRandomAccessResult and CountMe are simple interfaces.
I most cases I want to use Streams as col constructor arguments, to avoid costly operations. In the debugger I can follow that it works in some cases, since the value displayed for col remains Stream(xy, ?) and "tlVal = null", even after initialization of HazelSimpleResultSet.
Furthermore, for testing, I include println in the blocks that construct the Streams like this:
keyvalues.foldLeft(Stream.empty[KeyType]){ case (a, b) => ({ println("evaluating "+ b); unpack[KeyType](b)}) #:: a}
in order to follow in the Console when exactly the Stream is evaluted.
So, in some cases it works, but in some cases the Stream gets full evaluted during the very first moments of initialization of HazelSimpleResultSet. I cannot see no relevant difference in the Stream handed in, i'm just sure they are unevaluted Streams till that moment.
"Stepping into" with the debugger, I can see that it gets evaluted in the line of the class definition itself, before even reaching the class body, i.e. before initialization of any field.
EDIT:
I can define the class in a (suboptimal) way such that no field at all is referencing to the Stream, and still I get that behaviour.
The CountMe interface defines a "count" method, which calls col.size which would then evaluate all the Stream. I tried to define count in terms of a lazy val size, but that didn't make a difference.
I'm a bit at a loss why it doesn't work in some cases. Anybody has any hints about hidden caveats of Streams?
EDIT:
An important note: The Stream object wraps some serious state that it needs to evaluate, i.e. a reference to a NoSQL database (hazelcast).
Question: what are the caveats here? Is there something in particular I must take care of when my Stream carries stateful references necessary for evaluation?
If you create Stream like this:
Stream ({println("eval 1"); 1}, {println("eval 2"); 2})
then you are actually calling Stream.apply which is implemented like this:
/** A stream consisting of given elements */
override def apply[A](xs: A*): Stream[A] = xs.toStream
which means that what actually happens is:
All elements are evaluated!
A Seq containing these elements is created.
A Stream is created out of this Seq
So as you can see, if you create your Stream this way, all its elements are evaluated eagerly. This is not how you create lazily-evaluated Stream. What you want to do is probably use #:: and #::: operators that evaluate their operands lazily. Look up the docs for their usage.