Additional conditions in JOIN - perl

I have tables with articles and users, both have many-to-many mapping to third table - reads.
What I am trying to do here is to get all unread articles for particular user ( user_id not present in table reads ).
My query is getting all articles but those read are marked, which if fine as I can filter them out (user_id field contains id of user in question).
I have an SQL query like this:
SELECT articles.id, reads.user_id
FROM articles
LEFT JOIN
reads
ON articles.id = reads.article_id AND reads.user_id = 9
ORDER BY articles.last_update DESC LIMIT 5;
Which yields following:
articles.id | reads.user_id
-------------------+-----------------
57125839 | 9
57065456 |
56945065 |
56945066 |
56763090 |
(5 rows)
This is fine. This is what I want.
I'd like to get same result in Catalyst using my article model, but I cannot find any option to add conditions to a JOIN clause.
Do you know any way how to add AND X = Y to DBIx JOIN?
I know this can be done with custom resoult source and virtual view, but I have some other queries that could benefit from it and I'd like to avoid creating virtual view for each of them.
Thanks,
Canto

I don't even know what Catalyst is but I can hack the SQL query:
select articles.id, reads.user_id
from
articles
left join
(
select *
from reads
where user_id = 9
) reads on articles.id = reads.article_id
order by articles.last_update desc
limit 5;

I got an solution.
It's not straight forward, but it's better than virtual view.
http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Relationship/Base.pm#condition
Above describes how to use conditions in JOIN clause.
However, my case needs an variable in those conditions, which is not available by default in model.
So getting around a bit of model concept and introducing variable to it, we have the following.
In model file
our $USER_ID;
__PACKAGE__->has_many(
pindols => "My::MyDB::Result::Read",
sub {
my $args = shift;
die "no user_id specified!" unless $USER_ID;
return ({
"$args->{self_alias}.id" => { -ident => "$args->{foreign_alias}.article_id" },
"$args->{foreign_alias}.user_id" => { -ident => $USER_ID },
});
}
);
in controller
$My::MyDB::Result::Article::USER_ID = $c->user->id;
$articles = $channel->search(
{ "pindols.user_id" => undef } ,
{
page => int($page),
rows => 20,
order_by => 'last_update DESC',
prefetch => "pindols"
}
);
Will fetch all unread articles and yield following SQL.
SELECT me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail, pindols.article_id, pindols.user_id FROM (SELECT me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail FROM articles me LEFT JOIN reads pindols ON ( me.id = pindols.article_id AND pindols.user_id = 9 ) WHERE ( pindols.user_id IS NULL ) GROUP BY me.id, me.url, me.title, me.content, me.last_update, me.author, me.thumbnail ORDER BY last_update DESC LIMIT ?) me LEFT JOIN reads pindols ON ( me.id = pindols.article_id AND pindols.user_id = 9 ) WHERE ( pindols.user_id IS NULL ) ORDER BY last_update DESC: '20'
Of course you can skip the paging but I had it in my code so I included it here.
Special thanks goes to deg from #dbix-class on irc.perl.org and https://blog.afoolishmanifesto.com/posts/dbix-class-parameterized-relationships/.
Thanks,
Canto

Related

DBIx::Class - get all relationship that was used as a condition using prefetch?

Here are three tables: product, model, and product_model that maps products and models in N:M relationship.
product product_model model
id name product_id model_id id name
------------ ------------------- ----------
p1 Product 1 p1 m1 m1 Model 1
p2 Product 2 p2 m1 m2 Model 2
... p2 m2
What I want to do: Find all products that support Model 2(eg. product 2). Then, for each product, show the list of model_ids that the product supports(product 2 => [ m1,m2 ])
This was my first try. I needed N more queries to search model_ids for each product.
# 1 query for searching products
my #products = $schema->resultset('Product')->search(
{ 'product_models.model_id' => 'm2' },
{ 'join' => 'product_model' },
)
# N queries for searching product_models for each product
foreach my $product ( #products ) {
my #model_ids = map { $_->model_id } $product->product_models;
# #model_ids = ( 'm1', 'm2' ) for p2
}
I looked for a way to get the result using only one query. Replacing join with prefetch didn't work.
my #products = $schema->resultset('Product')->search(
{ 'product_models.model_id' => 'm2' },
{ 'prefetch' => 'product_model' }, # here
)
# no additional queries, but...
foreach my $product ( #products ) {
my #model_ids = map { $_->model_id } $product->product_models;
# now, #model_ids contains only ( `m2` )
}
Next, I tried "prefetch same table twice":
my #products = $schema->resultset('Product')->search(
{ 'product_models.model_id' => 'm2' },
{ 'prefetch' => [ 'product_models', 'product_models' ] },
);
foreach my $product ( #products ) {
my #model_ids = map { $_->model_id } $product->product_models;
}
It seemed that I succeeded. Only one query was executed and I got all model IDs from it.
However I wasn't so sure that this is the right(?) way. Is this a correct approach?
For example, if I used join instead of prefetching, Product 2 appears in the loop twice. I understand that, because the joined table is like:
id name p_m.p_id p_m.m_id p_m_2.p_id p_m_2.m_id
p2 Product 2 p2 m2 p2 m1
p2 Product 2 p2 m2 p2 m2 -- Product 2, one more time
Why does Product 2 appear only once when I use prefetch?
The resulting queries are almost same, except the difference of SELECT fields:
SELECT "me"."id", "me"."name",
"product_models"."product_id", "product_models"."model_id", -- only in prefetch
"product_models_2"."product_id", "product_models_2"."model_id" --
FROM "product" "me"
LEFT JOIN "product_model" "product_models"
ON "product_models"."product_id" = "me"."id"
LEFT JOIN "product_model" "product_models_2"
ON "product_models_2"."product_id" = "me"."id"
WHERE "product_models"."model_id" = 'm2'
If you have the correct relationships in your schema, this is possible with a single query. But it's tricky. Let's assume your database looks like this:
CREATE TABLE product
(`id` VARCHAR(2) PRIMARY KEY, `name` VARCHAR(9))
;
INSERT INTO product
(`id`, `name`) VALUES
('p1', 'Product 1'),
('p2', 'Product 2')
;
CREATE TABLE product_model (
`product_id` VARCHAR(2),
`model_id` VARCHAR(2),
PRIMARY KEY (product_id, model_id),
FOREIGN KEY(product_id) REFERENCES product(id),
FOREIGN KEY(model_id) REFERENCES model(id)
)
;
INSERT INTO product_model
(`product_id`, `model_id`) VALUES
('p1', 'm1'),
('p2', 'm1'),
('p2', 'm2')
;
CREATE TABLE model
(`id` VARCHAR(2) PRIMARY KEY, `name` VARCHAR(7))
;
INSERT INTO model
(`id`, `name`) VALUES
('m1', 'Model 1'),
('m2', 'Model 2')
;
This is essentially your DB from the question. I added primary keys and foreign keys. You probably have those anyway.
We can now create a schema from that. I've written a simple program that uses DBIx::Class::Schema::Loader to do that. It creates an SQLite database on the fly. (If no-one has put this on CPAN, I will).
The SQL from above will go in the __DATA__ section.
use strict;
use warnings;
use DBIx::Class::Schema::Loader qw/ make_schema_at /;
# create db
unlink 'foo.db';
open my $fh, '|-', 'sqlite3 foo.db' or die $!;
print $fh do { local $/; <DATA> };
close $fh;
$ENV{SCHEMA_LOADER_BACKCOMPAT} = 1;
# create schema
my $dsn = 'dbi:SQLite:foo.db';
make_schema_at(
'DB',
{
# debug => 1,
},
[ $dsn, 'sqlite', '', ],
);
$ENV{DBIC_TRACE} = 1;
# connect schema
my $schema = DB->connect($dsn);
# query goes here
__DATA__
# SQL from above
Now that we have that, we can concentrate on the query. At first this will look scary, but I'll try to explain.
my $rs = $schema->resultset('Product')->search(
{ 'product_models.model_id' => 'm2' },
{
'prefetch' => {
product_models => {
product_id => {
product_models => 'model_id'
}
}
}
},
);
while ( my $product = $rs->next ) {
foreach my $product_model ( $product->product_models->all ) {
my #models;
foreach my $supported_model ( $product_model->product_id->product_models->all ) {
push #models, $supported_model->model_id->id;
}
printf "%s: %s\n", $product->id, join ', ', #models;
}
}
The prefetch means join on this relation, and keep the data around for later. So to get all models for your product, we have to write
# 1 2
{ prefetch => { product_models => 'product_id' } }
Where product_models is the N:M table, and product_id is the name of the relation to the Models table. The arrow => 1 is for the first join from Product to ProductModel. The 2 is for ProductModel back to every product that has the model m2. See the drawing of the ER model for an illustration.
Now we want to have all the ProductModels that this Product has. That's arrow 3.
# 1 2 3
{ prefetch => { product_models => { product_id => 'product_models' } } }
And finally, to get the Models for that N:M relation, we have to use the model_id relationshop with arrow 4.
{
'prefetch' => { # 1
product_models => { # 2
product_id => { # 3
product_models => 'model_id' # 4
}
}
}
},
Looking at the ER model drawing should make that clear. Remember that each of those joins is a LEFT OUTER join by default, so it will always fetch all the rows, without loosing anything. DBIC just takes care of that for you.
Now to access all of that, we need to iterate. DBIC gives us some tools to do that.
while ( my $product = $rs->next ) {
# 1
foreach my $product_model ( $product->product_models->all ) {
my #models;
# 2 3
foreach my $supported_model ( $product_model->product_id->product_models->all ) {
# 4
push #models, $supported_model->model_id->id;
}
printf "%s: %s\n", $product->id, join ', ', #models;
}
}
First we grab all the ProductModel entries (1). For each of those, we take the Product (2). There is always only one Product in every line, because that way we have a 1:N relation, so we can directly access it. This Product in turn has a ProductModel relation. That's 3. Because this is the N side, we need to take all of them and iterate. We then push the id of every Model (4) into our list of models for this product. After that, it's just printing.
Here's another way to look at it:
We could eliminate that last model_id in the prefetch, but then we'd have to use get_column('model_id') to get the ID. It would save us a join.
Now if we turn on DBIC_TRACE=1, we get this SQL statement:
SELECT me.id, me.name, product_models.product_id, product_models.model_id, product_id.id, product_id.name, product_models_2.product_id, product_models_2.model_id, model_id.id, model_id.name
FROM product me
LEFT JOIN product_model product_models ON product_models.product_id = me.id
LEFT JOIN product product_id ON product_id.id = product_models.product_id
LEFT JOIN product_model product_models_2 ON product_models_2.product_id = product_id.id
LEFT JOIN model model_id ON model_id.id = product_models_2.model_id
WHERE (product_models.model_id = 'm2')
ORDER BY me.id
If we run this against our DB, we have these rows:
p2|Product 2|p2|m2|p2|Product 2|p2|m1|m1|Model 1
p2|Product 2|p2|m2|p2|Product 2|p2|m2|m2|Model 2
Of course that's pretty useless if we do it manually, but DBIC's magic really helps us, because all the weird joining and combining is completely abstracted away, and we only need one single query to get all the data.

Laravel 5, Derived table in join clause?

I have this query:
SELECT * FROM blog
LEFT JOIN (
SELECT blog_id, AVG(value) as blog_rating FROM blog_ratings
GROUP BY (blog_id)
) T ON T.blog_id = blog.id;
I do not know how to write this with Eloquent.
For Example:
Blog::select("*")->leftJoin( /* Here goes derived table */ )->get()
How do I accomplish this?
I'd personally just use the fluent query builder, try this out and see how it works out:
DB::table('blog')
->select('*')
->leftJoin(DB::raw('(SELECT blog_id, AVG(value) as blog_rating FROM blog_ratings
GROUP BY (blog_id)
) as T'), function ($join) {
$join->on ( 'T.blog_id', '=', 'blog.id' );
})
->get();
You can always swap ->get() for ->toSql() to dump out the query and adjust if you see any mistakes.

RYO blog engine - showing tags for several posts

I am writing yet another blog engine for practice, using SQLite and Perl Dancer framework.
The tables go like this:
CREATE TABLE posts (
p_id INTEGER PRIMARY KEY,
p_url VARCHAR(255),
p_title VARCHAR(255),
p_text TEXT,
p_date DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE tags (
t_id INTEGER PRIMARY KEY,
t_tag VARCHAR(255),
t_url VARCHAR(255)
);
CREATE TABLE tags_posts_junction (
tp_tag INTEGER NOT NULL,
tp_post INTEGER NOT NULL,
FOREIGN KEY(tp_tag) REFERENCES tags.t_id,
FOREIGN KEY(tp_post) REFERENCES tags.p_id
);
All the big guys like Wordpress (or stackoverflow) can show tags right on the main page, after each question, and I'd like to implement it too. The question is how do I do it.
So far the posts are stored in the database, and when I need to render a page showing latest 20 posts I pass a hash refernece (fetchall_hashref from DBI) to the template. So how do I add tags there? Of course I can do something like
my $dbh = database->prepare('SELECT * FROM posts ORDER BY p_date DESC
LIMIT 20 OFFSET 0');
$dbh->execute;
my $posts = $dbh->fetchall_hashref('p_date');
foreach my $key (keys $post) {
my $dbh = database->prepare('SELECT * FROM tags WHERE t_id IN (
SELECT tp_tag FROM tags_posts_junction WHERE tp_post = ?)');
$dbh->execute($post->{"$key"}->{"p_id"});
my $tags = $dbh->fetchall_hashref(t_id);
$post->{"$key"}->{"$tag_hash"} = $tags;
};
But that's ugly and that's 20 more queries per page, isn't it too much? I think there should be a better way.
So the question is how do I get tags for 20 posts the least redundant way?
I think you could combine your first / outer query before
my $posts = $dbh->fetchall_hashref('p_date');
with your inner query and then you will be hitting the database once instead of 20 times.
You could also simplify your code by use of DBIx::Simple - https://metacpan.org/module/DBIx::Simple.
Putting this together would give something like:
my $sql = 'SELECT t.*, p.*
FROM tags t
JOIN tags_posts_junction tpj ON t.t_tag = tpj.t_tag
JOIN posts p ON p.p_id = tpj.tp_post
WHERE tpj.tp_post IN (
SELECT p_id FROM posts ORDER BY p_date DESC
LIMIT 20 OFFSET 0
)';
my $db = DBIx::Simple->connect($dbh);
my $posts = $db->query($sql)->hashes;
Collect all the p_ids into an array and construct your query using IN instead of =, something like this, presuming #pids is your array:
my $dbh = database->prepare('SELECT * FROM tags WHERE t_id IN (
SELECT tp_tag FROM tags_posts_junction WHERE tp_post IN (' .
join(', ', ('?')x#pids).') )');
$dbh->execute(#pids);
Though you should really look to JOINs to replace your sub-queries.

Zend_Db_Select: LEFT JOIN on a subselect

I have a query, that does a LEFT JOIN on a subselect. This query is run in a high load environment and performs within the set requirements. The query (highly simplified) looks like:
SELECT
table_A.pKey
, table_A.uKey
, table_A.aaa
, table_B.bbb
, alias_C.ccc
, alias_C.ddd
FROM table_A
INNER JOIN table_B ON table_A.pKey = table_B.pKey
LEFT JOIN (
SELECT
table_X.pKey
, table_X.ccc
, table_Y.ddd
FROM table_X
INNER JOIN table_Y ON table_X.pKey = table_Y.pKey
) AS alias_C ON table_A.uKey = alias_C.pKey;
(for various reasons, it is not possible to rewrite the subselect as a (direct) LEFT JOIN).
Now, I cannot get the LEFT JOIN on subselect to work with Zend_Db_Select. I've tried everything I could come up with, but it does not work.
So my question is:
Is it not possible to do a query as described above with Zend_Db_Select?
What syntax do I need to get it to work within Zend Framework?
I think that it should work like this:
$subselect = $db->select->from(array('x' => 'table_X'), array('x.pKey', 'x.ccc', 'y.ddd'), 'dbname')
->join(array('Y' => 'table_Y'), 'x.pkey = y.pkey', array(), 'dbname');
$select = $db->select->from(array('a' => 'table_A'), array(/*needed columns*/), 'dbname')
->join(array('b' => 'table_B'), 'a.pkey = b.pkey', array(), 'dbname')
->joinLeft(array('c' => new Zend_Db_Expr('('.$subselect.')'), 'c.pkey = a.ukey', array())
I haven't tried it but I believe it'll work.
...
->joinLeft(array('c' => new Zend_Db_Expr('(' . $subselect->assemble() . ')'), 'c.pkey = a.ukey', array())

How to perform Linq to Entites Left Outer Join

I have read plenty of blog posts and have yet to find a clear and simple example of how to perform a LEFT OUTER JOIN between two tables. The Wikipedia article on joins Join (SQL) provides this simple model:
CREATE TABLE `employee` (
`LastName` varchar(25),
`DepartmentID` int(4),
UNIQUE KEY `LastName` (`LastName`)
);
CREATE TABLE `department` (
`DepartmentID` int(4),
`DepartmentName` varchar(25),
UNIQUE KEY `DepartmentID` (`DepartmentID`)
);
Assume we had a EmployeeSet as an employee container ObjectSet<Employee> EmployeeSet and a DepartmentSet ObjectSet<Department> DepartmentSet. How would you perform the following query using Linq?
SELECT LastName, DepartmentName
FROM employee e
LEFT JOIN department d
ON e.DepartmentID = d.DepartmentID
I would write this, which is far simpler than join and does exactly the same thing:
var q = from e in db.EmployeeSet
select new
{
LastName = e.LastName,
DepartmentName = e.Department.DepartmentName
};
You need to use the DefaultIfEmpty method :
var query =
from e in db.EmployeeSet
join d in db.DepartmentSet on e.DepartmentID equals d.DepartmentID into temp
from d in temp.DefaultIfEmpty()
select new { Employee = e, Department = d };