zf many to many relationship, how to find values stored in intermediary table without a second lookup - zend-framework

I have a many to many relationship between two tables which is represented with an intermediary table. I am using the ZF1 table relationships methodology to model the database in my application and this works fine. One thing I am struggling with is to pull data from the intermediary table when performing a many to many lookup. For exmaple:
productsTable
product_id,
product_name
customerTable
customer_id,
customer_name
salesTable
customer_id,
product_id,
date_of_sale
In this case where the sales table is the intermediary table and the many to many relationship is between customers and products. I add the referenceMap to the sales table model for products and customers and the dependent table "sales" to the product table model and the customer table model.
I can then successfully use the following code to get all the products for a given customer (or vice-versa).
$productTable = new productsTable();
$product = $productTable->find(1)->current();
$customers = $product->findManyToManyRowset('customerTable','salesTable');
But it does not include the "date_of_sale" value in the rowset returned. Is there a way of including the values from the intermediary table without doint a separate database lookup. Ican't see anything in the zf docs.
Any help would be cool.
I hope to eventually replace the zend_table with a datamapper implementation as it seems highly inefficient in terms of the number of db queries it executes which could be hugely reduced with slightly more complex SQL join queries rather than multiple simple selects but for now I'm stuck with this.
Thanks.

You can use JOIN queries in your code to make it in one call. From what I understand, you want to build this query
SELECT p.*, c.*, s.date_of_sale FROM sales AS s
INNER JOIN products AS p ON p.product_id = s.product_id
INNER JOIN customers AS c ON c.customer_id = s.customer_id
WHERE p.product_id = '1';
To achieve this, you can refer to Complex SQL query into Zend to see how I translate it to a Zend_Db Query or just use something like
$select = $this->select()->setIntegrityCheck(false);
$select->from(array('s'=>'sales'), array('date_of_sale'));
$select->join(array('p'=>'products'), 'p.product_id = s.product_id', '*');
$select->join(array('c'=>'customers'), 'c.customer_id = s.customer_id', '*');
$select->where('p.product_id = '.$productId);
Obviously, it would be better to quoteInto the where clause but I'm just too lazy.
Hope this helps !

Related

SQL Natural Join

Okay. So the question that I got asked by the teacher was this:
(5 marks) Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition. (E.g. find the titles of films rented by a particular customer.) Note the hints on the course news page if your query returns nothing.
Here is the layout of the database im working with:
http://www.postgresqltutorial.com/wp-content/uploads/2013/05/PostgreSQL-Sample-Database.png
The hint to us was this:
PostgreSQL hint:
If a natural join doesn't produce any results in the dvdrental DB, it is because many tables have the last update: timestamp field, and thus the natural join tries to join on that field as well as the intended field.
e.g.
select *
from film natural join inventory;
does not work because of this - it produces an empty table (no results).
Instead, use
select *
from film, inventory
where film.film_id = inventory.film_id;
This is what I did:
select *
from film, customer
where film.film_id = customer.customer_id;
The problem is I cannot get a particular customer.
I tried doing customer_id = 2; but it returns a error.
Really need help!
Well, it seems that you would like to join two tables that have no direct relation with each other, there's your issue:
where film.film_id = customer.customer_id
To find which films are rented by which customer you would have to join customer table with rental, then with inventory and finally with film.
The task description states
Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition.quote

Querying a table for rows with no foreign keys pointing to them

Let's say we have two tables in our database.
Table 1:
Employer:
id
name
Table 2:
Employee:
id
name
employer_id
My goal is to find a relatively efficient query to find all employers with no employees. The only way I could think to do this involves a subquery, but I suspect there should be a way to do this using a JOIN.
Here's what I originally had:
SELECT * FROM employer
WHERE employer.id NOT IN (
SELECT employer_id FROM employee
);
As I couldn't find any answers online for this, my solution is in the answers below (admittedly with a good chunk of help from a coworker). I'm using postgres 9.3 and sqlalchemy 0.8.
SELECT * FROM employer
LEFT OUTER JOIN employee ON employee.employer_id = employer.id
WHERE employee.id IS NULL;
For those not familiar with outer joins (I wasn't before this), this query will return all employer/employee pairs AND returns all employers without employees. These will be paired up with "dummy" employee rows, with all values being NULL. As a consequence, filtering out all results with employee ids will leave you with just the employers without employees.
(If anyone can give a better-worded explanation, by all means please do.)
In sqlalchemy:
session.query(Employer).outerjoin(
Employer.id==Employee.employer_id
).filter(Employee.id==None)

Optimizing conditional query in sybase

I would like to get help on optimizing my query below.
I want to have a view that exposes employee data across years. This view has data from employee table which has employeeID, year and employee demographs as columns. I also have a table called testemployees which has EmployeeID and Year. This is a subset of employee table which might or might not have data. What I am trying to accomplish is:
If there is data in testEmployees table my view should fetch employee details only for employees in testEmployees and if there is not data in testEmployees I should have my view have all the data in employee table.
My employee table is really very huge, and even though the query below works it takes a lot of time to fetch this data. Any pointers on how I can improve this query will be greatly appreciated.
Create view dbo.employees(Year, EmployeeID)
as
select * from
employee e, testemployees te
where e.Year = case when((select count(1) from testemployees)>0) then te.Year else e.Year
and e.employeeID = case when((select count(1) from testemployees)> 0) then te.employeeID else e.employeeID
Let me know your thoughts on how to optimize the query. Would using any other kind of joins help?
Thanks in advance.
It's been a while since I worked with Sybase, and I am not too sure of their syntax support for SQL-99 queries. However, assuming Sybase has at least some support for SQL-99 join syntax, the following should work:
Create view dbo.employees(Year, EmployeeID)
as
select e.employeeId, e.Year, e.attribute1, e.attribute2, ... e.attributeN,
ifnull(te.employeeId, e.employeeId, te.employeeId) as testEmployeeId,
ifnull(te.employeeId, e.Year, te.Year) as testYear,
ifnull(te.employeeId, e.attribute1, te.attribute1) as testAttribute1,
ifnull(te.employeeId, e.attribute2, te.attribute2) as testAttribute2,
....
ifnull(te.employeeId, e.attributeN, te.attributeN) as testAttributeN
from employee e
left join testemployees te on (e.employeeId = te.employeeId and e.Year = te.Year)
Be aware that this is somewhat different from the semantics that you defined. I am guessing that this is actually a bit closer to what you want in any case.

Matching items with multiple foreign keys in RavenDB

I asked this question previously regarding SQL Server: Complicated SQL Query--finding items matching multiple different foreign keys
Basically, I need to be able to find products that match multiple criteria. I have a scenario where I need to find products that match each of multiple categories and are found in multiple invoices.
The solution was a rather complex set of unions, which amounts to counting the number times a product matched the criteria and filtering for items whose count matched the count of criteria.
; with data (ID, Count) as (
select pc.ProductID, count(*) from ProductCategories pc (nolock)
inner join #categoryIDs /*table valued param*/ c on c.ID = pc.CategoryID
union all
select ip.ProductID, count(*) from InvoiceProducts ip (nolock)
inner join #invoiceIDs i on i.ID = ip.InvoiceID
)
select d.ID from data d
group by d.ID
having sum(d.Count) = #matchcount
But now, I am considering a NoSQL provider. So my question is, how would I create an index function to match this kind of query in RavenDB (or some other NoSQL project)?
A mental shift is required to properly set this up with RavenDB (or any other document DB). The problem is with the hacks we all used to make when working with structured data against an SQL server.
Therefore, the question here is how your data is modeled. To be more exact - how are you going to use it most often; based on that there are certain guidelines on which entities to define and how to link them together.
For a simple Product object, with String[] of categories, you can query the DB like this:
// Query on nested collections - will return any product with category "C#"
products = from p in session.Query<Product>()
where p.Categories.Any(cat => cat == "C#")
select c;
You can add as many Where clauses as you want. An index will be automatically created for you - but it is recommended to use static indexes when you've settled on a Model.
More on this topic:
http://ayende.com/blog/4801/leaving-the-relational-mindset-ravendbs-trees
https://github.com/ravendb/docs

PostgreSQL slow COUNT() - is trigger the only solution?

I have a table with posts, which are categorized by:
type
tag
language
All of those "categories" are stored in next tables (posts_types) and connected via next tables (posts_types_assignment).
COUNTing in PostgreSQL is really slow (i have more than 500k records in that table) and i need to get the number of posts categorized by any combination of type/tag/lang.
If i would solve it through triggers, it would be full of many multi-level loops, which really doesn't look like nice and is hard to maintenance.
Is there any other solution how to effectively get actual number of posts categorized in any type/tag/language?
Let me get this straight.
You have a table posts. You have a table posts_types. The two have a many to many join on posts_types_assignment. And you have some query like this that is slow:
SELECT count(*)
FROM posts p
JOIN posts_types_assigment pta1
ON p.id = pta1.post_id
JOIN posts_types pt1
ON pt1.id = pta1.post_type_id
AND pt1.type = 'language'
AND pt1.name = 'English'
JOIN posts_types_assigment pta2
ON p.id = pta2.post_id
JOIN posts_types pt2
ON pt2.id = pta2.post_type_id
AND pt2.type = 'tag'
AND pt2.name = 'awesome'
And you would like to know why it is painfully slow.
My first note is that PostgreSQL would have to do a lot less work if you had the identifiers in the posts table rather than in the joins. But that is a moot issue, the decision has been made.
My more useful note is that I believe that PostgreSQL has a similar query optimizer to Oracle. In which case to limit the combinatorial explosion of possible query plans that it has to consider, it only considers plans that start with some table, and then repeatedly joins on one more data set at a time. However no such query plan will work here. You can start with pt1, get 1 record, then go to pta1, get a bunch of records, join p, wind up with the same number of records, then join pta2, and now you get a huge number of records, then join to pt2, get just a few records. Joining to pta2 is the slow step, because the database has no idea which records you want, and therefore has to create a temporary result set for every combination of a post and a piece of metadata (type, language or tag) on it.
If this is indeed your problem, then the right plan looks like this. Join pt1 to pta1, put an index on it. Join pt2 to pta2, then join to the result of the first query, then join to p. Then count. This means that we don't get huge result sets.
If this the case, there is no way to tell the query optimizer that this once you want it to think up a new type of execution plan. But there is a way to force it.
CREATE TEMPORARY TABLE t1
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
WHERE pt.type = 'language'
AND pt.name = 'English';
CREATE INDEX idx1 ON t1 (post_id);
CREATE TEMPORARY TABLE t2
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
JOIN t1
ON t1.post_id = pta.post_id
WHERE pt.type = 'language'
AND pt.name = 'English';
SELECT COUNT(*)
FROM posts p
JOIN t1
ON p.id = t1.post_id;
Barring random typos, etc, this is likely to perform somewhat better. If it doesn't, double check the indexes on your tables.
As btilly notes, and if he has correctly guessed the schema, the table design does not help - it seems (at first sight, at least) that, for example, to have three tables posts_tag(post_id,tag) post_lang(post_id,lang) post_type(post_id,type) would be more natural and much more efficient.
Apart from that (or in addition to that), one could think of a table or materialized view that summarizes all the possible countings, with columns (lang,type,tag,nposts). Of course, to compute this in full would be VERY slow, but (apart from the first time) it can be done either in full "in background", at some intervals (if the data does not vary much, and if you don't require exact counts), or eagerly with triggers.
See for example here