How to "loop" through dates in PostgreSQL - postgresql

Say I have a query with a nested query inside of a where condition.
SELECT COUNT(id)
FROM table
WHERE create_date = date_trunc('month', current_timestamp)
and id NOT IN (
SELECT DISTINCT id
FROM some_table
WHERE date_trunc('month', current_timestamp)
)
This query gets the metric for this month. However, what if I want it for all months?
I tried this query, although it doesn't seem to run/takes a very long time:
SELECT date_trunc('month', t.create_date), COUNT(id)
FROM table t
WHERE id NOT IN (
SELECT DISTINCT id
FROM some_table tt
WHERE date_trunc('month', tt.create_date)= date_trunc('month', t.create_date)
)
GROUP BY date_trunc('month', t.create_date)
I would like to execute this command via Postgres CLI (from the command line).
Any guidance to make this query more efficient or logical appreciated!

Related

Calculate difference between the row counts of tables in two schemas in PostgreSQL

I have two table with same name in two different schemas (old and new dump). I would like to know the difference between the two integration.
I have two queries, that gives old and new count:
select count(*) as count_old from(
SELECT
distinct id
FROM
schema1.compound)q1
select count(*) as count_new from(
SELECT
distinct id
FROM
schema2.compound)q2
I would like have the following output.
table_name count_new count_new diff
compound 4740 4735 5
Any help is appreciated. Thanks in advance
with counts as (
select
(select count(distinct id) from schema1.compound) as count_old,
(select count(distinct id) from schema2.compound) as count_new
)
select
'compound' as table_name,
count_old,
count_new,
count_old - count_new as diff
from counts;
I think you could do something like this:
SELECT 'compound' AS table_name, count_old, count_new, (count_old - count_new) AS diff FROM (
SELECT(
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema1.compound)) AS count_old,
(SELECT count(*) FROM (SELECT DISTINCT id FROM schema2.compound)) AS count_new
)
It was probably answered already, but it is a subquery/nested query.
You can directly compute the COUNT on distinct values if you use the DISTINCT keyword inside your aggregation function. Then you can join the queries extracting your two needed values, and use them inside your query to get the output table.
WITH cte AS (
SELECT new.cnt AS count_new,
old.cnt AS count_old
FROM (SELECT COUNT(DISTINCT id) AS cnt FROM schema1.compound) AS old
INNER JOIN (SELECT COUNT(DISTINCT id) AS cnt FROM schema2.compound) AS new
ON 1 = 1
)
SELECT 'compound' AS table_name,
count_new,
count_old,
count_new = count_old AS diff
FROM cte

Postgresql select date range between two tables

I have two tables that have date fields in them. I want to select data from table 1 where the date is +/- 1 day from any date in table 2.
try something like this :
select * from table1,table2
where table1.date BETWEEN (table2.date - '1 day'::interval)
AND (table2.date + '1 day'::interval)
and ...
If only +/- 1 day, you could use a workaround like this:
select col1, col2, ...
from table1
where date_col in (select distinct date_col
from table2
union all
select distinct (date_col - '1 day'::interval)
from table2
union all
select distinct (date_col + '1 day'::interval)
from table2
);
This has quite good peformance because the subquery only be calculated one time and will be cache for comparing

Subsetting based on combinations from an inner query

I'm using postgres on Redshift. I have a query which goes like this:
SELECT EXTRACT(year from created_at) AS CustomYear,
client_ip,
member_id,
COUNT(*) AS Views
FROM ads.fbs_page_view_staging
WHERE member_id = 2
GROUP BY CustomYear,
client_ip,
member_id
HAVING COUNT(*) = 1
ORDER BY CustomYear
Here, I'm selecting a combination of client_ip and member_id where Views is 1. I would now like to take these combinations of client_ip and member_id and subset the entire table ads.fbs_page_view_staging having only such combinations.
If there was only one column I wanted to subset on, say client_ip, I could've written the following query and got the results:
SELECT EXTRACT(year FROM created_at) AS CustomYear,
COUNT(*)
FROM ads.fbs_page_view_staging
WHERE member_id = 2
AND client_ip IN (SELECT client_ip
FROM ((SELECT EXTRACT(year from created_at) AS CustomYear,
client_ip,
member_id,
COUNT(*)
FROM ads.fbs_page_view_staging
WHERE member_id = 2
GROUP BY CustomYear,
client_ip,
member_id
HAVING COUNT(*) = 1
ORDER BY CustomYear)))
GROUP BY customyear
ORDER BY customyear
Notice that in the outer query, I am subsetting based on client_ip. But how do I subset the table on a combination of columns?
Any help would be much appreciated.
Instead of subquerying, try joining directly to the results of your query. That way you can specify multiple criteria.
Here is (draft) SQL to select IP/member pairs that match the rows found by your sub-query (i.e. for some year in the past, there was only one view for that IP & member.)
SELECT distinct client_ip, member_id
FROM ads.fbs_page_view_staging Staging
INNER JOIN (SELECT EXTRACT(year from created_at) AS CustomYear,
client_ip,
member_id,
COUNT(*) AS Views
FROM ads.fbs_page_view_staging
WHERE member_id = 2
GROUP BY CustomYear,
client_ip,
member_id
HAVING COUNT(*) = 1) SingularViews
ON SingularViews.client_ip=Staging.client_ip
AND SingularViews.member_id=Staging.member_id
ORDER BY Staging.client_ip, Staging.member_id
I'm not certain I've captured the intent of your query correctly, but if not hopefully you can adapt the technique.

Update same table by its maximum date

I have a table in which i have to find the maximum date for each unique EMPid & testid
below is the input table and expected output
I tried with correlated sub query but that didn't work.
Any quick way to update the table with max date.
You can use a common-table-expression and the OVER clause with PARTITION BY:
WITH CTE AS
(
SELECT EmpId, [Hall Id], testId, Date, [Max date],
MaxDate = MAX(Date) OVER (PARTITION BY EmpId, testId)
FROM dbo.TableName
)
UPDATE CTE SET [Max date] = MaxDate
If you want to see what will happen replace UPDATE with SELECT * FROM.
You can use a CTE to select all maximum dates and join this with your original data like this:
WITH MaxDates AS (
SELECT empid
, testid
, MAX(Date) AS MaxDate
FROM table
GROUP BY empid
, testid
)
SELECT table.*
, MaxDate
FROM table
INNER JOIN MaxDates ON table.empid = MaxDates.empid AND table.testid = MaxDates.testid

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ