minimizing no of query in Zend - zend-framework

first thing I want to know in between two algorithms which one is better
select table1.* from tabel1 inner join table2 on table1.id = table2.table1_id;
and then to extract or
select * from table2;
and then using foreach loop
select * from table1 where table1.id = table2.table1_id
*Please tell me a plausible reason also
and now I am using zend and i believe that first one is faster and better than second (i don't know why ...just some preconception)
and in zend db profiler, every time I am getting
query :DESCRIBE menu_items time: 0.00084590911865234
is there a way to minimize it?
and please tell me how to join two tables in zend using zend components
regards,

Using inner join is faster than php loop due to the query response time. In the first one you'll do just one query and the in the second many. The database is prepared to retrieve data, this means that is much faster then join tables manually whith php through different queries.
To join with zend, you need this (Assuming you're on Zend_Db_Table):
$select = $this->select()->setIntegrityCheck(false);
$select->from(array('t1'=>'table1'))
->join(array('t2'=>'table2'),'t2.table1_id =t1.id','*')
->where('t1.deleted =?',0)
->group('t1.id')
->order('t1.date DESC')
->limit(4);
$result = $this->fetchAll($select);
To prevent DESCRIBE queries you can hardcode the table structure or cache it. Check here:
http://framework.zend.com/manual/en/performance.database.html

Related

Inner join versus doing a where in clause

Say I have about 10-30 rows returned from my query:
select * from users where location=10;
Is there any difference between the following:
select *
from users u
inner join location l on u.location = l.id
where location=10;
versus:
select * from users where location=10; # say this returns 1,2,3,4,5
select * from location where id IN (1,2,3,4,5)
Basically I want to know if there are any performance differences between doing an inner join versus doing a WHERE IN clause.
Is there a difference between issuing one query and issuing two queries? Well, I certainly hope so. The SQL engine is doing work, and it does twice as much work (from a certain perspective) for two queries.
In general, parsing a single query is going to be faster than parsing one query, returning an intermediate result set, and then feeding it back to another query. There is overhead in query compilation and in passing data back and forth.
For this query:
select *
from users u inner join
location l
on u.location = l.id
where u.location = 10;
You want an index on users(location) and location(id).
I do want to point something else out. The queries are not equivalent. The real comparison query is:
select l.*
from location l
where l.id = 10;
You are using the same column for the where and the on. Hence, this would be the most efficient version and you want an index on location(id).
One way to compare the performance of different queries is to use postgresql's EXPLAIN command, for instance:
EXPLAIN select *
from users u inner join
location l
on u.location = l.id
where u.location = 10;
Which will tell you how the database will get the data. Watch for things like sequential scans, which indicate you could benefit from an index. It also gives estimates on the cost of each operation and how many rows it might be operating on. Some operations can yield a lot more rows than you would expect, which the database then reduces to the set it returns to you.
You can also use EXPLAIN ANALYZE [query], which will actually run the query and give you timing information.
See the postgresql documentation for more information.

T-SQL different JOIN approaches, same results, which one would you prefer?

these are 3 approaches how to make a join. I would like to hear some word on perforance of these 3 queries.
Thank you
SELECT * FROM
tableA A LEFT JOIN tableB B
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
ON B.ColumnB = A.ColumnB
WHERE ColumnX = 'XY'
Versus
SELECT * FROM
tableA A LEFT JOIN tableB B
ON B.ColumnB = A.ColumnB
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
WHERE ColumnX = 'XY'
Versus Common Table Expression
WITH T...
It does not matter.
SQL Server has a cost-based optimizer (as opposed to a rule-based optimizer). That means that the engine is able to figure out that both of your first two options are identical. Run your estimated and actual execution plans and you will see that this is the case.
The only reason you would choose one option over the other is for readability's sake. I go with your second option, because it's a lot easier to read when there are a great many joins involved. ON clauses in reverse order become quite difficult to track.
In my experience, any of the above could be quicker depending on your tables.
As you're setting up joins, you want to start with the most restrictive as possible (without negatively affecting your end result, obviously). This same logic also applies to the Where clause for the same reason. By starting with the most restrictive, you're limiting the number of rows that are being joined and thus evaluated by the Where clause and then returned/manipulated in the select clause. For my answers below regarding the three specific scenarios, I'm assuming a sufficiently complicated query that is doing more than just looking to combine data from multiple tables (i.e., queries answering specific questions).
If Table A is huge and Tables B & C are smaller and more directly related to the data you're trying to isolate, then the first option would likely be fastest.
If Table B or C are huge and Table A is more related to your desired data, the second option would likely be fastest.
As far as option 3 goes, I love CTEs but I try to only use them when I need to do so. Using a CTE will speed up your overall query if the data joined, manipulated, and returned by the CTE is only related to the rest of the query in a limited fashion. Including tables that are only partially related to your end result in your primary string of joins is going to needlessly slow down your query. If you can parse out that data into a CTE, it can run quickly by itself and then be incorporated back into the main query at the end.

zf many to many relationship, how to find values stored in intermediary table without a second lookup

I have a many to many relationship between two tables which is represented with an intermediary table. I am using the ZF1 table relationships methodology to model the database in my application and this works fine. One thing I am struggling with is to pull data from the intermediary table when performing a many to many lookup. For exmaple:
productsTable
product_id,
product_name
customerTable
customer_id,
customer_name
salesTable
customer_id,
product_id,
date_of_sale
In this case where the sales table is the intermediary table and the many to many relationship is between customers and products. I add the referenceMap to the sales table model for products and customers and the dependent table "sales" to the product table model and the customer table model.
I can then successfully use the following code to get all the products for a given customer (or vice-versa).
$productTable = new productsTable();
$product = $productTable->find(1)->current();
$customers = $product->findManyToManyRowset('customerTable','salesTable');
But it does not include the "date_of_sale" value in the rowset returned. Is there a way of including the values from the intermediary table without doint a separate database lookup. Ican't see anything in the zf docs.
Any help would be cool.
I hope to eventually replace the zend_table with a datamapper implementation as it seems highly inefficient in terms of the number of db queries it executes which could be hugely reduced with slightly more complex SQL join queries rather than multiple simple selects but for now I'm stuck with this.
Thanks.
You can use JOIN queries in your code to make it in one call. From what I understand, you want to build this query
SELECT p.*, c.*, s.date_of_sale FROM sales AS s
INNER JOIN products AS p ON p.product_id = s.product_id
INNER JOIN customers AS c ON c.customer_id = s.customer_id
WHERE p.product_id = '1';
To achieve this, you can refer to Complex SQL query into Zend to see how I translate it to a Zend_Db Query or just use something like
$select = $this->select()->setIntegrityCheck(false);
$select->from(array('s'=>'sales'), array('date_of_sale'));
$select->join(array('p'=>'products'), 'p.product_id = s.product_id', '*');
$select->join(array('c'=>'customers'), 'c.customer_id = s.customer_id', '*');
$select->where('p.product_id = '.$productId);
Obviously, it would be better to quoteInto the where clause but I'm just too lazy.
Hope this helps !

What is the syntax for performing a parameterised delete using joins in SSIS 2008?

I'm trying to use an OLE DB Command to perform a delete using data from each row of my input file. The actual query works fine when running manually in sql server (given tableB.otherID is compared to an int), but I'm having issues parameterising it.
delete tableA from tableA
where tableA.ID = ?
The above query runs, and allows me to assign one of my input columns to tableA.ID. This is what I would expect.
Trying
delete tableA from tableA
INNER JOIN tableB ON tableB.ID = tableA.ID
where tableB.OtherID = ?
Throws up an error however ("The multi-part identifier tableB.OtherID could not be bound"). Hardcoding a value in place of the '?' stops this error from appearing.
It seems like this would be the correct syntax, is there anything wrong with the above?
This seems to be a bug/limitation with SSIS, I've found myself unable to perform similar parameterised update statements using a join.
Solution I ended up using was creating a temporary stored procedure with the delete statement I wanted, and passing the parameter to it directly.
I think the TSQL syntax you want is:
DELETE FROM tableA
FROM tableA INNER JOIN tableB ON tableB.ID = tableA.ID
WHERE tableB.OtherID = ?
DELETE FROM tableA
FROM tableA INNER JOIN tableB ON tableB.ID = tableA.ID
WHERE tableB.OtherID = #OrderId
Where #OrderID should be your variable in SSIS.
Depending on how many rows you need to delete using an Execute Sql task becomes slow quite quickly.
If that happens the solution that worked for me is putting the keys of the rows that need to be deleteed into a staging table, then when they're all in there issue one statement that deletes all those rows in a single statement and purges the staging table. Much quicker that way, added beneift is that you don;t have to use the quirky ? syntax. I never liked that, much too easy to mix stuff up when the sql becomes a little more complicated.
Regards Gert-Jan

PostgreSQL slow COUNT() - is trigger the only solution?

I have a table with posts, which are categorized by:
type
tag
language
All of those "categories" are stored in next tables (posts_types) and connected via next tables (posts_types_assignment).
COUNTing in PostgreSQL is really slow (i have more than 500k records in that table) and i need to get the number of posts categorized by any combination of type/tag/lang.
If i would solve it through triggers, it would be full of many multi-level loops, which really doesn't look like nice and is hard to maintenance.
Is there any other solution how to effectively get actual number of posts categorized in any type/tag/language?
Let me get this straight.
You have a table posts. You have a table posts_types. The two have a many to many join on posts_types_assignment. And you have some query like this that is slow:
SELECT count(*)
FROM posts p
JOIN posts_types_assigment pta1
ON p.id = pta1.post_id
JOIN posts_types pt1
ON pt1.id = pta1.post_type_id
AND pt1.type = 'language'
AND pt1.name = 'English'
JOIN posts_types_assigment pta2
ON p.id = pta2.post_id
JOIN posts_types pt2
ON pt2.id = pta2.post_type_id
AND pt2.type = 'tag'
AND pt2.name = 'awesome'
And you would like to know why it is painfully slow.
My first note is that PostgreSQL would have to do a lot less work if you had the identifiers in the posts table rather than in the joins. But that is a moot issue, the decision has been made.
My more useful note is that I believe that PostgreSQL has a similar query optimizer to Oracle. In which case to limit the combinatorial explosion of possible query plans that it has to consider, it only considers plans that start with some table, and then repeatedly joins on one more data set at a time. However no such query plan will work here. You can start with pt1, get 1 record, then go to pta1, get a bunch of records, join p, wind up with the same number of records, then join pta2, and now you get a huge number of records, then join to pt2, get just a few records. Joining to pta2 is the slow step, because the database has no idea which records you want, and therefore has to create a temporary result set for every combination of a post and a piece of metadata (type, language or tag) on it.
If this is indeed your problem, then the right plan looks like this. Join pt1 to pta1, put an index on it. Join pt2 to pta2, then join to the result of the first query, then join to p. Then count. This means that we don't get huge result sets.
If this the case, there is no way to tell the query optimizer that this once you want it to think up a new type of execution plan. But there is a way to force it.
CREATE TEMPORARY TABLE t1
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
WHERE pt.type = 'language'
AND pt.name = 'English';
CREATE INDEX idx1 ON t1 (post_id);
CREATE TEMPORARY TABLE t2
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
JOIN t1
ON t1.post_id = pta.post_id
WHERE pt.type = 'language'
AND pt.name = 'English';
SELECT COUNT(*)
FROM posts p
JOIN t1
ON p.id = t1.post_id;
Barring random typos, etc, this is likely to perform somewhat better. If it doesn't, double check the indexes on your tables.
As btilly notes, and if he has correctly guessed the schema, the table design does not help - it seems (at first sight, at least) that, for example, to have three tables posts_tag(post_id,tag) post_lang(post_id,lang) post_type(post_id,type) would be more natural and much more efficient.
Apart from that (or in addition to that), one could think of a table or materialized view that summarizes all the possible countings, with columns (lang,type,tag,nposts). Of course, to compute this in full would be VERY slow, but (apart from the first time) it can be done either in full "in background", at some intervals (if the data does not vary much, and if you don't require exact counts), or eagerly with triggers.
See for example here