readying about compound type in programming scala 2nd edition, and left with more question than answers.
When you declare an instance that combines several types, you get a compound type:
trait T1
trait T2
class C
val c = new C with T1 with T2 // c's type: C with T1 with T2
In this case, the type of c is C with T1 with T2. This is an alternative to declaring a type that extends C and mixes in T1 and T2. Note that c is considered a subtype of all three types:
val t1: T1 = c
val t2: T2 = c
val c2: C = c
The question that comes to mind is , why the alternative ? If you add something to a language it is supposed to add some value, otherwise it is useless. Hence, what's the added value of compound type and how does it compare to mixins i.e. extend ... with ...
Mixins and compound types are different notions:
https://docs.scala-lang.org/tour/mixin-class-composition.html
vs.
https://docs.scala-lang.org/tour/compound-types.html
Mixins are traits
trait T1
trait T2
class C
class D extends C with T1 with T2
val c = new D
Partial case of that is when an anonymous class is instead of D
trait T1
trait T2
class C
val c = new C with T1 with T2 // (*)
Compound types are types
type T = Int with String with A with B with C
Type of c in (*) is a compound type.
The notion of mixins is from the world of classes, inheritance, OOP etc. The notion of compound types is from the world of types, subtyping, type systems, type theory etc.
The authors of "Programming in Scala" mean that there is an alternative:
either to introduce D
(then D extends two mixins, namely T1 and T2, type of c is D)
or not
(to use anonymous class instead of D, type of c is a compound type).
Related
For two postgresql tables, a and b with non-null columns
struct MyStruct {
id: i32,
name: String,
}
sqlx::query_as!(MyStruct, r"
SELECT a.id, b.name
FROM a
INNER JOIN b on a.id = b.a
")
results in errors like the following
mismatched types
expected type i32
found enum std::option::Option<i32> rustcE0308
macros.rs(552, 9): Actual error occurred here
macros.rs(552, 9): Error originated from macro call here
I can force sqlx to mark the columns as non-null via the foo as "foo!" syntax, but I'd rather not.
sqlx::query_as!(MyStruct, r#"
SELECT a.id as "id!", b.name as "name!"
FROM a
INNER JOIN b on a.id = b.a
"#)
In the sqlx documentation it says
In most cases, the database engine can tell us whether or not a column
may be NULL, and the query!() macro adjusts the field types of the
returned struct accordingly.
For Postgres, this only works for columns which come directly from
actual tables, as the implementation will need to query the table
metadata to find if a given column has a NOT NULL constraint. Columns
that do not have a NOT NULL constraint or are the result of an
expression are assumed to be nullable and so Option is used instead
of T.
Does a JOIN count as an expression that would prevent the database engine from inferring non-nullability? Am I missing something about sqlx that would enable it to correctly infer that those columns cannot be null?
I'm trying to use Spark's PrefixSpan algorithm but it is comically difficult to get the data in the right shape to feed to the algo. It feels like a Monty Python skit where the API is actively working to confuse the programmer.
My data is a list of rows, each of which contains a list of text items.
a b c c c d
b c d e
a b
...
I have made this data available two ways, an sql table in Hive (where each row has an array of items) and text files where each line contains the items above.
The official example creates a Seq of Array(Array).
If I use sql, I get the following type back:
org.apache.spark.sql.DataFrame = [seq: array<string>]
If I read in text, I get this type:
org.apache.spark.sql.Dataset[Array[String]] = [value: array<string>]
Here is an example of an error I get (if I feed it data from sql):
error: overloaded method value run with alternatives:
[Item, Itemset <: Iterable[Item], Sequence <: Iterable[Itemset]](data: org.apache.spark.api.java.JavaRDD[Sequence])org.apache.spark.mllib.fpm.PrefixSpanModel[Item] <and>
[Item](data: org.apache.spark.rdd.RDD[Array[Array[Item]]])(implicit evidence$1: scala.reflect.ClassTag[Item])org.apache.spark.mllib.fpm.PrefixSpanModel[Item]
cannot be applied to (org.apache.spark.sql.DataFrame)
new PrefixSpan().setMinSupport(0.5).setMaxPatternLength(5).run( sql("select seq from sequences limit 1000") )
^
Here is an example if I feed it text files:
error: overloaded method value run with alternatives:
[Item, Itemset <: Iterable[Item], Sequence <: Iterable[Itemset]](data: org.apache.spark.api.java.JavaRDD[Sequence])org.apache.spark.mllib.fpm.PrefixSpanModel[Item] <and>
[Item](data: org.apache.spark.rdd.RDD[Array[Array[Item]]])(implicit evidence$1: scala.reflect.ClassTag[Item])org.apache.spark.mllib.fpm.PrefixSpanModel[Item]
cannot be applied to (org.apache.spark.sql.Dataset[Array[String]])
new PrefixSpan().setMinSupport(0.5).setMaxPatternLength(5).run(textfiles.map( x => x.split("\u0002")).limit(3))
^
I've tried to mold the data by using casting and other unnecessarily complicated logic.
This can't be so hard. Given a list of items (of the very reasonable format described above), how the heck do I fed it to PrefixSpan?
edit:
I'm on spark 2.2.1
Resolved:
A column in the table I was querying had collections in each cell. This was causing the returned result to be inside a WrappedArray. I changed my query so the result column only contained a string (by concat_ws). This made it MUCH easier to deal with the type error.
I have custom MappedColumnType for Java8 LocalDateTime, defined like this:
implicit val localDTtoDate = MappedColumnType.base[LocalDateTime, Timestamp] (
l => Timestamp.valueOf(l),
d => d.toLocalDateTime
)
Columns of that type are used in table mappings in that way:
def timestamp = column[LocalDateTime]("ts")
Everything looks good, but i'm not able to sort on that column with different directions, cause it lacks .asc and .desc (and, actually, is not a ColumnOrdered type). How can i add sorting functionality for that type?
You can use sort and do .desc and .asc. But, ensure the mapping implicit val is in scope of the query where you are using .desc and .asc, if not you will get compilation error.
from Slick documentation, it's clear how to make a single left join between two tables.
val q = for {
(t, v) <- titles joinLeft volumes on (_.uid === _.titleUid)
} yield (t, v)
Query q will, as expected, have attributes: _1 of type Titles and _2 of type Rep[Option[Volumes]] to cover for non-existing volumes.
Further cascading is problematic:
val q = for {
((t, v), c) <- titles
joinLeft volumes on (_.uid === _.titleUid)
joinLeft chapters on (_._2.uid === _.volumeUid)
} yield /* etc. */
This won't work because _._2.uid === _.volumeUid is invalid given _.uid being not existing.
According to various sources on the net, this shouldn't be an issue, but then again, sources tend to target different slick versions and 3.0 is still rather new. Does anyone have some clue on the issue?
To clarify, idea is to use two left joins to extract data from 3 cascading 1:n:n tables.
Equivalent SQL would be:
Select *
from titles
left join volumes
on titles.uid = volumes.title_uid
left join chapters
on volumes.uid = chapters.volume_uid
Your second left join is no longer operating on a TableQuery[Titles], but instead on what is effectively a Query[(Titles, Option[Volumes])] (ignoring the result and collection type parameters). When you join the resulting query on your TableQuery[Chapters] you can access the second entry in the tuple using the _2 field (since it's an Option you'll need to map to access the uid field):
val q = for {
((t, v), c) <- titles
joinLeft volumes on (_.uid === _.titleUid)
joinLeft chapters on (_._2.map(_.uid) === _.volumeUid)
} yield /* etc. */
Avoiding TupleN
If the _N field syntax is unclear, you can also use Slick's capacity for user-defined record types to map your rows alternatively:
// The `Table` variant of the joined row representation
case class TitlesAndVolumesRow(title: Titles, volumes: Volumes)
// The DTO variant of the joined row representation
case class TitleAndVolumeRow(title: Title, volumes: Volume)
implicit object TitleAndVolumeShape
extends CaseClassShape(TitlesAndVolumesRow.tupled, TitleAndVolumeRow.tupled)
Let's imagine I have an table called Foo with a primary key FooID and an integer non-unique column Bar. For some reason in a SQL query I have to join table Foo with itself multiple times, like this:
SELECT * FROM Foo f1 INNER JOIN Foo f2 ON f2.Bar = f1.Bar INNER JOIN Foo f3 ON f3.Bar = f1.Bar...
I have to achieve this via LINQ to Entities.
Doing
ObjectContext.Foos.Join(ObjectContext.Foos, a => a.Bar, b => b.Bar, (a, b) => new {a, b})
gives me LEFT OUTER JOIN in the resulting query and I need inner joins, this is very critical.
Of course, I might succeed if in edmx I added as many associations of Foo with itself as necessary and then used them in my code, Entity Framework would substitute correct inner join for each of the associations. The problem is that at design time I don't know how many joins I will need. OK, one workaround is to add as many of them as reasonable...
But, if nothing else, from theoretical point of view, is it at all possible to create inner joins via EF without explicitly defining the associations?
In LINQ to SQL there was a (somewhat bizarre) way to do this via GroupJoin, like this:
ObjectContext.Foos.GroupJoin(ObjectContext.Foos, a => a.Bar, b => b.Bar, (a, b) => new {a, b}).SelectMany(o = > o.b.DefaultIfEmpty(), (o, b) => new {o.a, b)
I've just tried it in EF, the trick does not work there. It still generates outer joins for me.
Any ideas?
In Linq to Entities, below is one way to do an inner join on mutiple instances of the same table:
using (ObjectContext ctx = new ObjectContext())
{
var result = from f1 in ctx.Foo
join f2 in ctx.Foo on f1.bar equals f2.bar
join f3 in ctx.Foo on f1.bar equals f3.bar
select ....;
}