Add date for each ID in PostgreSQL - postgresql

I have a table "Users" that looks like this
id name
1 Johny
2 Michael
3 Jony
i want add new column called date,
date
2021-01-01
2021-02-01
but i want the date for each id
id name date
1 Johny 2021-01-01
1 Johny 2021-02-01
2 Michael 2021-01-01
2 Michael 2021-02-01
3 Jony 2021-01-01
3 Jony 2021-02-01
How to do this ?

Seemingly you're wanting to do a cross join between an existing table, users and either another table, or some pseudo table.
Here's how to do it with a pseudo table (I've aliased as d)
select u.id, u.name, d.date
from
users u
cross join
(
select TO_DATE('2021-01-01', 'YYYY-MM-DD') as date
union
select TO_DATE('2021-02-01', 'YYYY-MM-DD')
) as d;
Sql Fiddle example here
This will produce all permutations (#users x #dates) between the two tables

Related

How to Shorten Execution Time for A View

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.
user admin cust
user_id user_id | admin_id user_id | cust_id
--------- ---------|---------- ---------|---------
1 1 | a 2 | dd
2 4 | b 3 | ff
3
4
Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app
login_history
user_id | login_on
---------|---------------------
1 | 2022-01-01 13:22:43
1 | 2022-01-02 16:16:27
3 | 2022-01-05 21:17:52
2 | 2022-01-11 11:12:26
3 | 2022-01-12 03:34:47
I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.
login_counts_view
week_start_date | admin_count | cust_count
-----------------|-------------|------------
2022-01-01 | 1 | 1
2022-01-08 | 0 | 2
2022-01-15 | 0 | 0
.
.
.
2022-12-31 | 0 | 0
Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.
Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.
CREATE VIEW login_counts_view AS
SELECT
week_start_dates.week_start_date::text AS week_start_date,
count(distinct a.user_id) AS admin_count,
count(distinct c.user_id) AS cust_count
FROM (
SELECT
to_char(i::date, 'YYYY-MM-DD') AS week_start_date
FROM
generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates
LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date + INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
GROUP BY week_start_date;
Does anyone have any tips as to how to make this query execute more efficiently?
Idea
Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.
Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.
Implementation
A view to implement this follows:
CREATE VIEW login_counts_view_fast AS
WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i + 1 FROM Numbers WHERE i < 52 )
SELECT CAST ( date_trunc('year', NOW()) AS DATE) + 7 * n.i week_start_date
, count(distinct lw.admin_id) admin_count
, count(distinct lw.cust_id) cust_count
FROM (
SELECT i FROM Numbers
) n
LEFT JOIN (
SELECT admin_id
, cust_id
, base
, pit
, pit-base delta
, (pit-base) / (3600 * 24 * 7) week
FROM (
SELECT a.user_id admin_id
, c.user_id cust_id
, CAST ( EXTRACT ( EPOCH FROM l.login_on ) AS INTEGER ) pit
, CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW()) ) AS INTEGER ) base
FROM login_history l
LEFT JOIN admin a ON a.user_id = l.user_id
LEFT JOIN cust c ON c.user_id = l.user_id
) le
) lw
ON lw.week = n.i
GROUP BY n.i
;
Some remarks:
The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)
Evaluation
See it in action in this db fiddle
The query plan indicates savings of 50-70% in execution time.

How to Outer Join a Calendar table to view dates with 0 records

I have a table with records of orders by customers and a table with dates from Jan 2022 to 10 years. I wanted to get all numbers of customers made everyday for the last 28 days, including those with 0 customers recorded. So I needed to outer join the calendar table to the customer records. However, I cant use outer join correctly.
Here's how I done it:
SELECT order_date as 'date', COUNT(orderstatus) as 'customers'
FROM orders
RIGHT OUTER JOIN calendar ON
calendar.date = orders.order_date
WHERE sellerid = 11
Im getting:
date customers
2022-01-02 9
I wanted to see:
date customers
2022-01-01 0
2022-01-02 9
2022-01-03 0
.
.
.
You would not get the results that you posted in your question unless you group by date, so I guess you missed that part of your code.
You need a WHERE clause to filter the calendar's rows for the last 28 days and you must move the condition sellerid = 11 to the ON clause:
SELECT c.order_date,
COUNT(o.order_date) customers
FROM calendar c LEFT JOIN orders o
ON o.sellerid = 11 AND o.order_date = c.date
WHERE c.date BETWEEN CURRENT_DATE - INTERVAL 28 DAY AND CURRENT_DATE
GROUP BY c.order_date;

Getting attendance of an employee with a date series in a particular range in Postgres

I have a attendance table with employee_id, date and punch-in time.
Emp_Id PunchTime
101 10/10/2016 07:15
101 10/10/2016 12:20
101 10/10/2016 12:50
101 10/10/2016 16:31
102 10/10/2016 07:15
Here I have the date only for the working days. I want to get the attendance list of a employee with series of given date period. I need the day also. Result should look like as follows
date | day |employee_id | Intime | outtime |
2016-10-09 | sunday | 101 | | |
2016-10-10 | monday | 101 | 2016-10-10 7:15AM |2016-10-10 4:31 PM |
You can generate a list of dates and then do an outer join on them:
The following displays all days in October:
select d.date, a.emp_id,
min(punchtime) as intime,
max(punchtime) as outtime
from generate_series(date '2016-10-01', date '2016-11-01' - 1, interval '1' day) as d (date)
left join attendance a on d.date = a.punchtime::date
group by d.date, a.emp_id;
order by d.date, a.emp_id;
As you want the first and last timestamp from each day this can be done using a simple group by query.
This will however not repeat the emp_id for the non_existing days.
Something like the following will generate a list of the range of dates (starting and ending with whatever range is found in your punchtime table), with employees and intime, outtime for each. Check the SQL fiddle here:
http://sqlfiddle.com/#!15/d93bd/1
WITH RECURSIVE minmax AS
(
SELECT MIN(CAST(time AS DATE)) AS min, MAX(CAST(time as DATE)) AS max
FROM emp_time
),
dates AS
(
SELECT m.min as datepart
FROM minmax m
RIGHT JOIN emp_time e ON m.min = CAST(e.time as DATE)
UNION ALL
SELECT d.datepart + 1 FROM dates d, minmax mm
WHERE d.datepart + 1 <= mm.max
)
SELECT d.datepart as date, e.emp, MIN(e.time) as intime, MAX(e.time) as outtime FROM dates d
LEFT JOIN emp_time e ON d.datepart = CAST(e.time as DATE)
GROUP BY d.datepart, e.emp
ORDER BY d.datepart;

Checking missing hours for every id in a table

I have a table that contains column for id-s (id_code) and a time for transaction (time). What I need is to figure out those hours between two dates for each id where no transaction took place. Lets say i need to check missing hours for id 1 and id 2 from a table below between 2014-06-13 12:00:00 and 2014-06-13 14:59:59 - the desired result would be that id 1 has missing transactions 2014-06-13 13:00:00 and id 2 is missing transactions 2014-06-13 14:00:00.
id_code | time
1 | 2014-06-13 12:23:12
2 | 2014-06-13 12:27:23
1 | 2014-06-13 12:56:21
2 | 2014-06-13 13:34:12
1 | 2014-06-13 14:23:56
I am using PostgreSQL 9.3
SQL Fiddle
select c.id, d.time
from
(
select distinct id
from t
) c
cross join
generate_series (
(select date_trunc('hour', min(t.time)) from t),
(select date_trunc('hour', max(t.time)) from t),
interval '1 hour'
) d(time)
left join
(
select id, date_trunc('hour', t.time) as time
from t
group by id, 2
) t on t.time = d.time and c.id = t.id
where t.time is null
order by c.id, d.time
The generate_series will build a set of all possible hours. The cross join will make that a matrix of all possible ids of all possible hours. Then the t.time is null condition will filter those id x hours that do not exist.
SELECT DISTINCT id, h FROM t, generate_series('2014-06-13 12:00:00'::timestamp, '2014-06-13 14:59:59'::timestamp, '1 hour') h
EXCEPT
SELECT id, date_trunc('hour', time) FROM t
Thanks to Clodoaldo Neto for providing a useful SQL Fiddle page for testing!

Compare interval date by row

I am trying to group dates within a 1 year interval given an identifier by labeling which is the earliest date and which is the latest date. If there are no dates within a 1 year interval from that date, then it will record it's own date as the first and last date. For example originally the data is:
id | date
____________
a | 1/1/2000
a | 1/2/2001
a | 1/6/2000
b | 1/3/2001
b | 1/3/2000
b | 1/3/1999
c | 1/1/2000
c | 1/1/2002
c | 1/1/2003
And the output I want is:
id | first_date | last_date
___________________________
a | 1/1/2000 | 1/2/2001
b | 1/3/1999 | 1/3/2001
c | 1/1/2000 | 1/1/2000
c | 1/1/2002 | 1/1/2003
I have been trying to figure this out the whole day and can't figure it out. I can do it for cases id's with only 2 duplicates, but can't for greater values. Any help would be great.
SELECT id
, min(min_date) AS min_date
, max(max_date) AS max_date
, sum(row_ct) AS row_ct
FROM (
SELECT id, year, min_date, max_date, row_ct
, year - row_number() OVER (PARTITION BY id ORDER BY year) AS grp
FROM (
SELECT id
, extract(year FROM the_date)::int AS year
, min(the_date) AS min_date
, max(the_date) AS max_date
, count(*) AS row_ct
FROM tbl
GROUP BY id, year
) sub1
) sub2
GROUP BY id, grp
ORDER BY id, grp;
1) Group all rows per (id, year), in subquery sub1. Record min and max of the date. I added a count of rows (row_ct) for demonstration.
2) Subtract the row_number() from the year in the second subquery sub2. Thus, all rows in succession end up in the same group (grp). A gap in the years starts a new group.
3) In the final SELECT, group a second time, this time by (id, grp) and record min, max and row count again. Voilá. Produces exactly the result you are looking for.
-> SQLfiddle demo.
Related answers:
Return array of years as year ranges
Group by repeating attribute
select id, min ([date]) first_date, max([date]) last_date
from <yourTbl> group by id
Use this (SQLFiddle Demo):
SELECT id,
min(date) AS first_date,
max(date) AS last_date
FROM mytable
GROUP BY 1
ORDER BY 1