Any method faster than count(*) for counting mysql rows

Any method faster than count(*) for counting mysql rows - mysqli

We exactly know that count(*) is faster than mysql_num_rows, but is it enough for to be more faster?
firma table has 575.000 rows
onay, bireysel, bastarih, uyeliktur has INDEX on firma table
My simple query is...
$sql="SELECT COUNT('x') as num FROM firma where 1 and firma.onay=1 and firma.bireysel=0 ";
$res = mysqli_query($i_link,$sql);
$row= mysqli_fetch_row($res);
echo $db_count = $row['0'];
This query is working after nearly 5 or 6 seconds
Test link: http://celikhane.com/firmalar/
How to fasten this query to run under 1 second.
NOTE: if i changed query
SELECT COUNT('x') as num FROM firma where firma.onay=1
it is 0,6 seconds and really faster but i must use other conditions.

Try count('x') instead of count(*).

Related

How to make a self referential window functions

I have a table like this:
amount type app owe
1 a 10 10
2 a 8 -2
3 a 20 12
4 i 30 10
5 a 40 10
owe is:
(type == 'a')?app - sum(owe) where amount < (amount for current row):max(app-sum(owe)where amount<(amount for current row),0)
So I'd need a window function on the column that the window function is on. There are these partition on rows between rows unlimited preceding and prior row, but it has to be on a different column, not the column I'm summing. Is there a way to reference the same column the window function is on
I tried an alias
case
when type = a
then app - sum(owe)over(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding) as owe
else
greatest(0,app - sum(owe)over(ROWS BETWEEN UNBOUNDED PRECEDING AND 1 preceding))
end as owe
But since owe doesn't exist when I made it, I get:
owe doesn't exist.
Is there some other way?

You cannot do that with window functions. Your only chance using SQL is a recursive CTE:
WITH RECURSIVE tab_owe AS (
SELECT amount, type, app,
CASE WHEN type = 'a'
THEN app
ELSE GREATEST(app, 0)
END AS owe
FROM tab
ORDER BY amount LIMIT 1
UNION ALL
SELECT t.amount, t.type, t.app,
CASE WHEN t.type = 'a'
THEN t.app - sum(tab_owe.owe)
ELSE GREATEST(t.app - sum(tab_owe.owe), 0)
END AS owe
FROM (SELECT amount, type, app
FROM tab
WHERE amount > (SELECT max(amount) FROM tab_owe)
ORDER BY amount
LIMIT 1) AS t
CROSS JOIN tab_owe
GROUP BY t.amount, t.type, t.app
)
SELECT amount, type, app, owe
FROM tab_owe;
(untested)
This would be much easier to write in procedural code, sou consider using a table function.

This is what I came up with. Of course, I'm not a real programmer, so I'm sure there's a smarter way:
insert into mort (amount, "type", app)
values
(1,'a',10),
(2,'a',8),
(3,'a',20),
(4,'i',30),
(5,'a',40)
CREATE OR REPLACE FUNCTION mort_v ()
RETURNS TABLE (
zamount int,
ztype text,
zapp int,
zowe double precision
) AS $$
DECLARE
var_r record;
charlie double precision;
sam double precision;
BEGIN
charlie = 0;
FOR var_r IN(SELECT
amount,
"type",
app
FROM mort order by 1)
LOOP
zamount = var_r.amount;
ztype = var_r.type;
zapp = var_r.app;
sam = var_r.app - charlie;
if ztype = 'a' then
zowe = sam;
else
zowe = greatest(sam, 0);
end if;
charlie = charlie + zowe;
RETURN NEXT;
END LOOP;
END; $$
LANGUAGE 'plpgsql';
select * from mort_v()
So with my limited skills you'll notice I had to add a 'z' in front of the columns that are already in the table so I can spit it out again. If your table has 30 columns you'd normally have to do this 30 times. But, I asked a real engineer and he mentioned that if you just spit out the primary key with the calculated column, you can just join it back to the original table. That's smarter than what I have. If there's an even better solution, that would be great. This does serve as a nice reference to how to do something like a cursor in postgre and how to make variables without a '#' in front like in mssqlserver.

Optimizing Postgres query with timestamp filter

I have a query:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND (sent_email_v2s.sent >= '2016-03-18')
AND "users"."team_id" = 1
When I run EXPLAIN ANALYZE on it, I get:
Then, if I change that to the following (Just removing the (sent_email_v2s.sent >= '2016-03-18')) as follows:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND "users"."team_id" = 1
when I run EXPLAIN ANALYZE on this query, the results are:
EDIT:
The results above from today are about as I expected. When I ran this last night, however, the difference created by including the timestamp filter was about 100x slower (0.5s -> 59s). The EXPLAIN ANALYZE from last night showed all of the time increase to be attributed to the first unique/sort operation in the query plan above.
Could there be some kind of caching issue here? I am worried now that there might be something else going on (transiently) that might make this query take 100x longer since it happened at least once.
Any thoughts are appreciated!

Native Query (JPA) takes long with date comparison

Has anyone got any idea how I could optimize this query so that it'll run faster? Right now it takes up to 30sec to retrieve around 3k of "containers" and thats way to long.. It's forseen that it'll have to retrieve around 1miljon records.
Query query = em().createNativeQuery("SELECT * FROM CONTAINER where TO_CHAR(CREATION_DATE, 'YYYY-MM-DD') >= TO_CHAR(:from, 'YYYY-MM-DD') " +
"AND TO_CHAR(CREATION_DATE, 'YYYY-MM-DD') <= TO_CHAR(:to, 'YYYY-MM-DD') ", Container.class);
query.setParameter("from", from);
query.setParameter("to", to);
return query.getResultList();
JPA 2.0, Oracle DB
EDIT: I've got an index on the CREATION_DATE column:
CREATE INDEX IDX_CONTAINER_CREATION_DATE
ON CONTAINER (CREATION_DATE);
it's not a named query because the TO_CHAR function doesn't seem to be supported by JPA 2.0 and I've read that it should make the query faster if there's an index..
My explain plan (still doing full table scan for some reason instead of using the index):
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| CONTAINER |
---------------------------------------
One fix I don't like:
I've done the following..
TypedQuery<Container> query = em().createQuery(
"SELECT NEW Container(c.barcode, c.createdBy, c.creationDate, c.owner, c.sequence, c.containerSizeBarcode, c.a, c.b, c.c) " +
"FROM Container c where c.creationDate >= :from AND c.creationDate <= :to", Container.class);
and I've added an absurdly long constructor to Container and this fixes the loading times.. But, this is really ugly and I don't want this tbh. Anyone any other suggestions?

Update rows returned by a complex SQL query with data from query result

I have a multi-table join and want to update a table based on the result of that join. The join table produces both the scope of the update (only those rows whose effort.id appears in the result should be updated) and the data for the update (a new column should be set to the value of a calculated column).
I've made progress but can't quite make it work. Here's my statement:
UPDATE
efforts
SET
dropped_int = jt.split
FROM
(
SELECT
ef.id,
s.id split,
s.kind,
s.distance_from_start,
s.sub_order,
max(s.distance_from_start + s.sub_order)
OVER (PARTITION BY ef.id) AS max_dist
FROM
split_times st
LEFT JOIN splits s ON s.id = st.split_id
LEFT JOIN efforts ef ON ef.id = st.effort_id
) jt
WHERE
((jt.distance_from_start + jt.sub_order) = max_dist)
AND
kind <> 1;
The SELECT produces the correct join table:
id split kind dfs sub max_dist dropped dropped_int
403 33 2 152404 1 152405 TRUE 33
404 33 2 152404 1 152405 TRUE 33
405 31 2 143392 1 143393 TRUE 33
406 31 2 143392 1 143393 TRUE 33
407 29 2 132127 1 132128 TRUE 33
408 29 2 132127 1 132128 TRUE 33
409 29 2 132127 1 132128 TRUE 33
and does indeed update the efforts.id column, but there are two problems: First, it updates all efforts, not just those that are produced from the query, and second, it sets effort.id to the split value of the first row in the query result, but I need it to set each effort to the associated split value.
If this were non-SQL, it might look something like:
jt_rows.each do |jt_row|
efforts[jt_row].dropped_int = jt[jt_row].split
end
But I don't know how to do that in SQL. It seems like this should be a fairly common problem, but after a couple of hours of searching I'm coming up short.
How should I modify my statement to produce the described result? If it matters, this is Postgres 9.5. Thanks in advance for any suggestions.
EDIT:
I did not get a workable answer but ended up solving this with a mixture of SQL and native code (Ruby/Rails):
dropped_splits = SplitTime.joins(:split).joins(:effort)
.select('DISTINCT ON (efforts.id) split_times.effort_id, split_times.split_id')
.where(efforts: {dropped: true})
.order('efforts.id, splits.distance_from_start DESC, splits.sub_order DESC')
update_hash = Hash[dropped_splits.map { |x| [x.effort_id, {dropped_split_id: x.split_id, updated_at: Time.now}] }]
Effort.update(update_hash.keys, update_hash.values)

Use a condition in the WHERE clause that relates efforts table with a subquery:
efforts.id = jt.id
that is:
WHERE
((jt.distance_from_start + jt.sub_order) = max_dist)
AND
kind <> 1
AND
efforts.id = jt.id

Count previous occurences of a value split by date ranges

Here's a simple query we do for ad hoc requests from our Marketing department on the leads we received in the last 90 days.
SELECT ID
,FIRST_NAME
,LAST_NAME
,ADDRESS_1
,ADDRESS_2
,CITY
,STATE
,ZIP
,HOME_PHONE
,MOBILE_PHONE
,EMAIL_ADDRESS
,ROW_ADDED_DTM
FROM WEB_LEADS
WHERE ROW_ADDED_DTM BETWEEN #START AND #END
They are asking for more derived columns to be added that show the number of previous occurences of ADDRESS_1 where the EMAIL_ADDRESS matches. But they want is for different date ranges.
So the derived columns would look like this:
,COUNT_ADDRESS_1_LAST_1_DAYS,
,COUNT_ADDRESS_1_LAST_7_DAYS
,COUNT_ADDRESS_1_LAST_14_DAYS
etc.
I've manually filled these derived columns using update statements when there was just a few. The above query is really just a sample of a much larger query with many more columns. The actual request has blossomed into 6 date ranges for 13 columns. I'm asking if there's a better way then using 78 additional update statements.

I think you will have a hard time writing a query that includes all of these 78 metrics per e-mail address without actually creating a query that hard-codes the different choices. However you can generate such a pivot query with dynamic SQL, which will save you some keystrokes and will adjust dynamically as you add more columns to the table.
The result you want to end up with will look something like this (but of course you won't want to type it):
;WITH y AS
(
SELECT
EMAIL_ADDRESS,
/* aggregation portion */
[ADDRESS_1] = COUNT(DISTINCT [ADDRESS_1]),
[ADDRESS_2] = COUNT(DISTINCT [ADDRESS_2]),
... other columns
/* end agg portion */
FROM dbo.WEB_LEADS AS wl
WHERE ROW_ADDED_DTM >= /* one of 6 past dates */
GROUP BY wl.EMAIL_ADDRESS
)
SELECT EMAIL_ADDRESS,
/* pivot portion */
COUNT_ADDRESS_1_LAST_1_DAYS = *count address 1 from 1 day ago*,
COUNT_ADDRESS_1_LAST_7_DAYS = *count address 1 from 7 days ago*,
... other date ranges ...
COUNT_ADDRESS_2_LAST_1_DAYS = *count address 2 from 1 day ago*,
COUNT_ADDRESS_2_LAST_7_DAYS = *count address 2 from 7 days ago*,
... other date ranges ...
... repeat for 11 more columns ...
/* end pivot portion */
FROM y
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;
This is a little involved, and it should all be run as one script, but I'm going to break it up into chunks to intersperse comments on how the above portions are populated without typing them. (And before long #Bluefeet will probably come along with a much better PIVOT alternative.) I'll enclose my interspersed comments in /* */ so that you can still copy the bulk of this answer into Management Studio and run it with the comments intact.
Code/comments to copy follows:
/*
First, let's build a table of dates that can be used both to derive labels for pivoting and to assist with aggregation. I've added the three ranges you've mentioned and guessed at a fourth, but hopefully it is clear how to add more:
*/
DECLARE #d DATE = SYSDATETIME();
CREATE TABLE #L(label NVARCHAR(15), d DATE);
INSERT #L(label, d) VALUES
(N'LAST_1_DAYS', DATEADD(DAY, -1, #d)),
(N'LAST_7_DAYS', DATEADD(DAY, -8, #d)),
(N'LAST_14_DAYS', DATEADD(DAY, -15, #d)),
(N'LAST_MONTH', DATEADD(MONTH, -1, #d));
/*
Next, let's build the portions of the query that are repeated per column name. First, the aggregation portion is just in the format col = COUNT(DISTINCT col). We're going to go to the catalog views to dynamically derive the list of column names (except ID, EMAIL_ADDRESS and ROW_ADDED_DTM) and stuff them into a #temp table for re-use.
*/
SELECT name INTO #N FROM sys.columns
WHERE [object_id] = OBJECT_ID(N'dbo.WEB_LEADS')
AND name NOT IN (N'ID', N'EMAIL_ADDRESS', N'ROW_ADDED_DTM');
DECLARE #agg NVARCHAR(MAX) = N'', #piv NVARCHAR(MAX) = N'';
SELECT #agg += ',
' + QUOTENAME(name) + ' = COUNT(DISTINCT '
+ QUOTENAME(name) + ')' FROM #N;
PRINT #agg;
/*
Next we'll build the "pivot" portion (even though I am angling for the poor man's pivot - a bunch of CASE expressions). For each column name we need a conditional against each range, so we can accomplish this by cross joining the list of column names against our labels table. (And we'll use this exact technique again in the query later to make the /* one of past 6 dates */ portion work.
*/
SELECT #piv += ',
COUNT_' + n.name + '_' + l.label
+ ' = MAX(CASE WHEN label = N''' + l.label
+ ''' THEN ' + QUOTENAME(n.name) + ' END)'
FROM #N as n CROSS JOIN #L AS l;
PRINT #piv;
/*
Now, with those two portions populated as we'd like them, we can build a dynamic SQL statement that fills out the rest:
*/
DECLARE #sql NVARCHAR(MAX) = N';WITH y AS
(
SELECT
EMAIL_ADDRESS, l.label' + #agg + '
FROM dbo.WEB_LEADS AS wl
CROSS JOIN #L AS l
WHERE wl.ROW_ADDED_DTM >= l.d
GROUP BY wl.EMAIL_ADDRESS, l.label
)
SELECT EMAIL_ADDRESS' + #piv + '
FROM y
GROUP BY EMAIL_ADDRESS
ORDER BY EMAIL_ADDRESS;';
PRINT #sql;
EXEC sp_executesql #sql;
GO
DROP TABLE #N, #L;
/*
Now again, this is a pretty complex piece of code, and perhaps it can be made easier with PIVOT. But I think even #Bluefeet will write a version of PIVOT that uses dynamic SQL because there is just way too much to hard-code here IMHO.
*/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Any method faster than count(*) for counting mysql rows - mysqli

Try count('x') instead of count(*).

Related

How to make a self referential window functions

Optimizing Postgres query with timestamp filter

Native Query (JPA) takes long with date comparison

Update rows returned by a complex SQL query with data from query result

Count previous occurences of a value split by date ranges

Categories

Resources