I'm trying to get two tables joined with slick 3.3.2 and play 2.7.x, but i'm having a hard time understanding why my codes doesn't do what i want it to.
I have two tables: Foo and Bar, both with string that i need to join on a string column, lets call it fooBar
val innerJoin = for {
(f, b) <- Foo join Bar on (_.fooBar === _.fooBar)
} yield (f, b)
db.run(innerJoin.result)
Docs say this is the way to do it: http://scala-slick.org/doc/3.3.2/queries.html#applicative-joins
But the query slick generated when debugging, doesn't actually use a join, it simply selects the properties from the two tables, like so: (simplified) select * from Foo, Bar where (x2.fooBar = x3.fooBar) clause
What is going on here?
Slick has generated a join there, but it's in a form known as an "implicit join" (in SQL).
It's a syntax difference, and you can check with your database documentation to see if the query optimiser will treat them the same.
As a rule I would not worry about the surface SQL text Slick generates, unless there's a performance issues which you can identify by profiling the query plan in your database.
Related
Im currently using jOOQ to build my SQL (with code generation via the mvn plugin).
Executing the created query is not done by jOOQ though (Using vert.X SqlClient for that).
Lets say I want to select all columns of two tables which share some identical column names. E.g. UserAccount(id,name,...) and Product(id,name,...). When executing the following code
val userTable = USER_ACCOUNT.`as`("u")
val productTable = PRODUCT.`as`("p")
create().select().from(userTable).join(productTable).on(userTable.ID.eq(productTable.AUTHOR_ID))
the build method query.getSQL(ParamType.NAMED) returns me a query like
SELECT "u"."id", "u"."name", ..., "p"."id", "p"."name", ... FROM ...
The problem here is, the resultset will contain the column id and name twice without the prefix "u." or "p.", so I can't map/parse it correctly.
Is there a way how I can say to jOOQ to alias these columns like the following without any further manual efforts ?
SELECT "u"."id" AS "u.id", "u"."name" AS "u.name", ..., "p"."id" AS "p.id", "p"."name" AS "p.name" ...
Im using the holy Postgres Database :)
EDIT: Current approach would be sth like
val productFields = productTable.fields().map { it.`as`(name("p.${it.name}")) }
val userFields = userTable.fields().map { it.`as`(name("p.${it.name}")) }
create().select(productFields,userFields,...)...
This feels really hacky though
How to correctly dereference tables from records
You should always use the column references that you passed to the query to dereference values from records in your result. If you didn't pass column references explicitly, then the ones from your generated table via Table.fields() are used.
In your code, that would correspond to:
userTable.NAME
productTable.NAME
So, in a resulting record, do this:
val rec = ...
rec[userTable.NAME]
rec[productTable.NAME]
Using Record.into(Table)
Since you seem to be projecting all the columns (do you really need all of them?) to the generated POJO classes, you can still do this intermediary step if you want:
val rec = ...
val userAccount: UserAccount = rec.into(userTable).into(UserAccount::class.java)
val product: Product = rec.into(productTable).into(Product::class.java)
Because the generated table has all the necessary meta data, it can decide which columns belong to it, and which ones don't. The POJO doesn't have this meta information, which is why it can't disambiguate the duplicate column names.
Using nested records
You can always use nested records directly in SQL as well in order to produce one of these 2 types:
Record2<Record[N], Record[N]> (e.g. using DSL.row(table.fields()))
Record2<UserAccountRecord, ProductRecord> (e.g using DSL.row(table.fields()).mapping(...), or starting from jOOQ 3.17 directly using a Table<R> as a SelectField<R>)
The second jOOQ 3.17 solution would look like this:
// Using an implicit join here, for convenience
create().select(productTable.userAccount(), productTable)
.from(productTable)
.fetch();
The above is using implicit joins, for additional convenience
Auto aliasing all columns
There are a ton of flavours that users could like to have when "auto-aliasing" columns in SQL. Any solution offered by jOOQ would be no better than the one you've already found, so if you still want to auto-alias all columns, then just do what you did.
But usually, the desire to auto-alias is a derived feature request from a misunderstanding of what's the best approch to do something in jOOQ (see above options), so ideally, you don't follow down the auto-aliasing road.
This is a very weird problem
In short
var q = (some query).Count();
Gives my a number and
var q = (some query).ToList().Count();
Gives me entirely different number...
with mentioning that (some query) has two includes (joins)
is there a sane explanation for that???
EDIT: here is my query
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).Count();
this gives me a wrong number
and this:
var q = db.membership_renewals.Include(i => i.member).Include(i => i.sport).Where(w => w.isDeleted == false).ToList().Count();
Gives me accurate number..
EDIT 2
Wher I wrote my query as linq query it worked perfectly...
var q1 = (from d in db.membership_renewals where d.isDeleted == false join m in db.members on d.mr_memberId equals m.m_id join s in db.sports on d.mr_sportId equals s.s_id select d.mr_id).Count();
I think the problem that entity framework doesn't execute the joins in the original query but forced to execute them in (ToList())...
I Finally figured out what's going on...
The database tables are not linked together in the database (there are no relationship or constraints defined in the database itself) so the code doesn't execute the (inner join) part.
However my classes on the other hand are well written so when I perform (ToList()) it automatically ignores the unbound rows...
And when I wrote the linq query defining the relation ship keys (primary and foreign) it worked alright because now the database understands my relation between tables...
Thanks everyone you've been great....
My guess is IQueryable gives a smaller number cause not all the objects are loaded, kind of like a stream in Java, but IQueryable.toList().count() forces the Iqueryable to load all the data and it is traversed by the list constructor and stored in the list so IQueryable.toList().Count() is the accurate answer. This is based on 5 minutes of search on MSDN.
The idea is the underlying datastore of the IQueryable is a database iterator so it executes differently every time because it executes the query again on the database, so if you call it twice against the same table, and the data has changed you get different results. This is called delayed execution. But when you say IQueryable.ToList() you force the iterator to do the whole iteration once and dump the results in a list which is constant
I spent quite some time to code multiple SQL queries that were formerly used to fetch the data for various R scripts. This is how it worked
sqlContent = readSQLFile("file1.sql")
sqlContent = setSQLVariables(sqlContent, variables)
results = executeSQL(sqlContent)
The clue is, that for some queries a result from a prior query is required - why creating VIEWs in the database itself does not solve this problem. With Spark 2.0 I already figured out a way to do just that through
// create a dataframe using a jdbc connection to the database
val tableDf = spark.read.jdbc(...)
var tempTableName = "TEMP_TABLE" + java.util.UUID.randomUUID.toString.replace("-", "").toUpperCase
var sqlQuery = Source.fromURL(getClass.getResource("/sql/" + sqlFileName)).mkString
sqlQuery = setSQLVariables(sqlQuery, sqlVariables)
sqlQuery = sqlQuery.replace("OLD_TABLE_NAME",tempTableName)
tableDf.createOrReplaceTempView(tempTableName)
var data = spark.sql(sqlQuery)
But this is in my humble opinion very fiddly. Also, more complex queries, e.g. queries that incooporate subquery factoring currently don't work. Is there a more robust way like re-implementing the SQL code into Spark.SQL code using filter($""), .select($""), etc.
The overall goal is to get multiple org.apache.spark.sql.DataFrames, each representing the results of one former SQL query (which always a few JOINs, WITHs, etc.). So n queries leading to n DataFrames.
Is there a better option than the provided two?
Setup: Hadoop v.2.7.3, Spark 2.0.0, Intelli J IDEA 2016.2, Scala 2.11.8, Testcluster on Win7 Workstation
It's not especially clear what your requirement is, but I think you're saying you have queries something like:
SELECT * FROM people LEFT OUTER JOIN places ON ...
SELECT * FROM (SELECT * FROM people LEFT OUTER JOIN places ON ...) WHERE age>20
and you would want to declare and execute this efficiently as
SELECT * FROM people LEFT OUTER JOIN places ON ...
SELECT * FROM <cachedresult> WHERE age>20
To achieve that I would enhance the input file so each sql statement has an associated table name into which the result will be stored.
e.g.
PEOPLEPLACES\tSELECT * FROM people LEFT OUTER JOIN places ON ...
ADULTS=SELECT * FROM PEOPLEPLACES WHERE age>18
Then execute in a loop like
parseSqlFile().foreach({case (name, query) => {
val data: DataFrame = execute(query)
data.createOrReplaceTempView(name)
}
Make sure you declare the queries in order so all required tables have been created. Other do a little more parsing and sort by dependencies.
In an RDMS I'd call these tables Materialised Views. i.e. a transform on other data, like a view, but with the result cached for later reuse.
Is there some scala relational database framework (anorm, squeryl, etc...) using postgres-like aggregators to produce lists after a group-by, or at least simulating its use?
I would expect two levels of implementation:
a "standard" one, where at least any SQL grouping with array_agg is translated to a List of the type which is being aggregated,
and a "scala ORM powered" one where some type of join is allowed so that if the aggregation is a foreign key to other table, a List of elements of the other table is produced. Of course this last thing is beyond the reach of SQL, but if I am using a more powerful language, I do not mind some steroids.
I find specially intriguing that the documentation of slick, which is based precisely in allowing scala group-by notation, seems to negate explicitly the output of lists as a result of the group-by.
EDIT: use case
You have the typical many-to-many table of, say, products and suppliers, pairs (p_id, s_id). You want to produce a list of suppliers for each product. So the postgresql query should be
SELECT p_id, array_agg(s_id) from t1 group by p_id
One could expect some idiomatic way to to this in slick, but I do not see how. Furthermore, if we go to some ORM, then we could also consider the join with the tables products and suppliers, on p_id and s_id respectively, and get as answer a zip (product, (supplier1, supplier2, supplierN)) containing the objects and not only the ids
I am also not sure if I understand you question correct, could you elaborate?
In slick you currently can not use postgres "array_agg" or "string_agg" as a method on type Query. If you want to use this specific function then you need to use custom sql. But: I added an issue some time ago (https://github.com/slick/slick/issues/923, you should follow this discussion) and we have a prototype from cvogt ready for this.
I needed to use "string_agg" in the past and added a patch for it (see https://github.com/mobiworx/slick/commit/486c39a7ed90c9ccac356dfdb0e5dc5c24e32d63), so maybe this is helpful to you. Look at "AggregateTest" to learn more about it.
Another possibility is to encapsulate the usage of "array_agg" in a database view and just use this view with slick. This way you do not need "array_agg" directly in slick.
You can use slick-pg.
It supports array_agg and other aggregate functions.
Your question is intriguing, care to elaborate a little on how it might ideally look? When you group by you often have an additional column, such as count(*) over and above the standard columns from your case class, so what would the type of your List be?
Most of my (anorm) methods either return a singleton (perhaps Option) or a List of that class's type. For each case class, I have an sqlFields variable (e.g. m.id, m.name, m.forManufacturer) and a single parser variable that I reference as either .as(modelParser.singleOpt) or .as(modelParser *). For foreign keys, a lazy val at the case class level (or def if it needs to be) is pretty useful. E.g. if I had Model and Manufacturer entities, with a foreign key forManufacturer on Model, then I might define a lazy val manufacturer : Manufacturer = ... in the case class of the model, so that at any time I can refer to model.manufacturer. I can define joins as their own methods, either in this way, or as methods in the companion object.
Not 100% sure I am answering your question, but thought this was a bit long for a comment.
Edit: If your driver supported parsing of postgresql arrays, you could map them directly to a class like ProductSuppliers(id:Int, suppliers:List[Int]) (or even List[Supplier]?) In anorm that's about as idiomatic as one could get, I think? For databases that don't support it, it seems to me similar to an order by version, i.e. select p1, s1 from t1 order by p1, which you could groupBy p1 and similarly map to ProductSuppliers.
I'm using QueryDSL with JPA.
I want to query some properties of an entity, it's like this:
QPost post = QPost.post;
JPAQuery q = new JPAQuery(em);
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name);
It works fine.
If i want to query a relation property, e.g. comments of a post:
List<Set<Comment>> rows = q.from(post).where(...).list(post.comments);
It's also fine.
But when I want to query relation and simple properties together, e.g.
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name, post.comments);
Then something went wrong, generiting a bad SQL syntax.
Then I realized that it's not possible to query them together in one SQL statement.
Is it possible that QueryDSL would somehow deal with relations and generate additional queries (just like what hibernate does with lazy relations), and load the results in?
Or should I just query twice, and then merge both result lists?
P.S. what i actually want is each post with its comments' ids. So a function to concat each post's comment ids is better, is this kind of expressin possible?
q.list(post.id, post.name, post.comments.all().id.join())
and generate a subquery sql like (select group_concat(c.id) from comments as c inner join post where c.id = post.id)
Querydsl JPA is restricted to the expressivity of JPQL, so what you are asking for is not possible with Querydsl JPA. You can though try to express it with Querydsl SQL. It should be possible. Also as you don't project entities, but literals and collections it might work just fine.
Alternatively you can load the Posts with only the Comment ids loaded and then project the id, name and comment ids to something else. This should work when accessors are annotated.
The simplest thing would be to query for Posts and use fetchJoin for comments, but I'm assuming that's too slow for you use case.
I think you ought to simply project required properties of posts and comments and group the results by hand (if required). E.g.
QPost post=...;
QComment comment=..;
List<Tuple> rows = q.from(post)
// Or leftJoin if you want also posts without comments
.innerJoin(comment).on(comment.postId.eq(post.id))
.orderBy(post.id) // Could be used to optimize grouping
.list(new QTuple(post.id, post.name, comment.id));
Map<Long, PostWithComments> results=...;
for (Tuple row : rows) {
PostWithComments res = results.get(row.get(post.id));
if (res == null) {
res = new PostWithComments(row.get(post.id), row.get(post.name));
results.put(res.getPostId(), res);
}
res.addCommentId(row.get(comment.id));
}
NOTE: You cannot use limit nor offset with this kind of queries.
As an alternative, it might be possible to tune your mappings so that 1) Comments are always lazy proxies so that (with property access) Comment.getId() is possible without initializing the actual object and 2) using batch fetch* on Post.comments to optimize collection fetching. This way you could just query for Posts and then access id's of their comments with little performance hit. In most cases you shouldn't even need those lazy proxies unless your Comment is very fat. That kind of code would certainly look nicer without low level row handling and you could also use limit and offset in your queries. Just keep an eye on your query log to make sure everything works as intended.
*) Batch fetching isn't directly supported by JPA, but Hibernate supports it through mapping and Eclipselink through query hints.
Maybe some day Querydsl will support this kind of results grouping post processing out-of-box...