Postgresql: Looping through a date_trunc generated group - postgresql

I've got some records on my database that have a 'createdAt' timestamp.
What I'm trying to get out of postgresql is those records grouped by 'createdAt'
So far I've got this query:
SELECT date_trunc('day', "updatedAt") FROM goal GROUP BY 1
Which gives me:
+---+------------+-------------+
| date_trunc |
+---+------------+-------------+
| Sep 20 00:00:00 |
+---+------------+-------------+
Which are the days where the records got created.
My question is: Is there any way to generate something like:
| Sep 20 00:00:00 |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| |
| Sep 24 00:00:00 |
| |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| 2 | John De | male | NY | 32 |
That means group by date_trunc and select all the columns of those rows?
Thanks a lot!

Please try SELECT date_trunc('day', "updatedAt"), name, gender, state, age FROM goal GROUP BY 1,2,3. It will not provide as the structure, you expect, but will "group by date_trunc and select all the columns ".

Related

Create a pivot table for Month over Month variation

I have these records returned from a query
+---------+--------------+-----------+----------+
| Country | other fields | sales | date |
+---------+--------------+-----------+----------+
| US | 1 | $100.00 | 01/01/21 |
| CA | 1 | $100.00 | 01/01/21 |
| UK | 1 | $100.00 | 01/01/21 |
| FR | 1 | $100.00 | 01/01/21 |
| US | 1 | $200.00 | 01/02/21 |
| CA | 1 | $200.00 | 01/02/21 |
| UK | 1 | $200.00 | 01/02/21 |
| FR | 1 | $200.00 | 01/02/21 |
And I want to show the sales variation from one month to previous, like this:
| Country | 01/02/21 | 01/01/21 | Var% |
| US | $200.00 | $100.00 | 100% |
| CA | $200.00 | $100.00 | 100% |
| FR | $200.00 | $100.00 | 100% |
+---------+--------------+-----------+----------+
How could be done with a Postgres query?
if you always comparing two month only :
select country
, sum(sales) filter (where date ='01/01/21') month1
, sum(sales) filter (where date ='01/02/21') month2
, ((sum(sales) filter (where date ='01/02/21') /sum(sales) filter (where date ='01/01/21')) - 1) * 100 var
from tablename
where date in ('01/01/21' , '01/02/21')
group by country
you also can look at crosstab from tablefunc extension which basically does the same as above query.
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * ,("01/02/21" /"01/01/21") - 1) * 100 var
from(
select * from crosstab ('select Country,date , sales from tablename')
as ct(country varchar(2),"01/01/21" money , "01/02/21" money)
) t
for more info about crosstab , see tablefunc
but if you want to show date in rows instead of columns, you can easily generalize it for all the dates :
select *
, ((sales / LAG(sales,1,1) over (partition by country order by date)) -1)* 100 var
from
country

How to get non-aggregated measures?

I calculate my metrics with SQL and publish the resulting table to Tableau Server. Afterward, use this data source to create charts and dashboards.
For one analysis, I already calculated the measures per day with SQL. When I use the resulting table in Tableau, it aggregates these measures to SUM by default. However, I don't want to have SUM or AVG of the average or SUM of the Percentiles.
What I want is the result when I don't select date dimension and not GROUP BY date in SQL as attached below.
Here is the query:
SELECT
-- date,
COUNT(DISTINCT id) AS count_of_id,
AVG(timediff_in_sec) AS avg_timediff,
PERCENTILE_CONT(0.25) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_25,
PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY timediff_in_sec) AS percentile_50
FROM
(
--subquery
) AS t1
-- GROUP BY date
Here are the first 10 rows of the resulting table:
+------------+--------------+-------------+---------------+---------------+
| date | avg_timediff | count_of_id | percentile_25 | percentile_50 |
+------------+--------------+-------------+---------------+---------------+
| 10/06/2020 | 61,65186364 | 22 | 8,5765 | 13,3015 |
| 11/06/2020 | 127,2913333 | 3 | 15,6045 | 17,494 |
| 12/06/2020 | 306,0348214 | 28 | 12,2565 | 17,629 |
| 13/06/2020 | 13,2664 | 5 | 11,944 | 13,862 |
| 14/06/2020 | 16,728 | 7 | 14,021 | 17,187 |
| 15/06/2020 | 398,6424595 | 37 | 11,893 | 19,271 |
| 16/06/2020 | 293,6925152 | 33 | 12,527 | 17,134 |
| 17/06/2020 | 155,6554286 | 21 | 13,452 | 16,715 |
| 18/06/2020 | 383,8101429 | 7 | 266,048 | 493,722 |
+------------+--------------+-------------+---------------+---------------+
How can I achieve the desired output above?
Drag them all into the dimensions list, then they will be static dimensions. For your use you could also just drag the Date field to Rows. Aggregating 1 value, which you have for each date, returns the same value whatever the aggregation type.

SUM OVER PARTITION ON Date range

Im trying to do a cumulative sum over specific periods of time for every row in Postgres, example:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 4 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 3 | Aaron |
|---------------------|------------------|------------------|
| 22-05-1990 | 7 | Aaron |
|---------------------|------------------|------------------|
Expected result, taking a range of 60 days:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 38 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 41 | Aaron |
|---------------------|------------------|------------------|
| 01-05-1990 | 10 | Aaron |
|---------------------|------------------|------------------|
I tried with the following but the results are not correct:
WITH tab AS (SELECT * FROM table_with_values)
SELECT tab.Date, SUM(tab.Value)
FILTER (WHERE tab.Date<=tab.Date AND tab.Date >=t.Date - INTERVAL '60 DAY')
OVER(PARTITION BY tab.Employee ORDER BY tab.Date ROWS BETWEEN UNBOUND PRECEDENT AND CURRENT ROW)
AS values_cumulative, tab.Employee
FROM tab
Try this:
SELECT date, employee, sum(bvalue)
FROM (
SELECT a.*, b.date as bdate, b.value as bvalue
FROM testtable a
LEFT JOIN testtable b ON
a.employee = b.employee AND
b.date <= a.date AND
b.date >= a.date - integer '60') c
GROUP BY employee, date
ORDER BY date ASC;
date | employee | sum
------------+----------+-----
1990-01-25 | Aaron | 34
1990-02-15 | Aaron | 38
1990-03-02 | Aaron | 41
1990-05-01 | Aaron | 10
(4 Zeilen)

PostgreSQL Crosstab generate_series of weeks for columns

From a table of "time entries" I'm trying to create a report of weekly totals for each user.
Sample of the table:
+-----+---------+-------------------------+--------------+
| id | user_id | start_time | hours_worked |
+-----+---------+-------------------------+--------------+
| 997 | 6 | 2018-01-01 03:05:00 UTC | 1.0 |
| 996 | 6 | 2017-12-01 05:05:00 UTC | 1.0 |
| 998 | 6 | 2017-12-01 05:05:00 UTC | 1.5 |
| 999 | 20 | 2017-11-15 19:00:00 UTC | 1.0 |
| 995 | 6 | 2017-11-11 20:47:42 UTC | 0.04 |
+-----+---------+-------------------------+--------------+
Right now I can run the following and basically get what I need
SELECT COALESCE(SUM(time_entries.hours_worked),0) AS total,
time_entries.user_id,
week::date
--Using generate_series here to account for weeks with no time entries when
--doing the join
FROM generate_series( (DATE_TRUNC('week', '2017-11-01 00:00:00'::date)),
(DATE_TRUNC('week', '2017-12-31 23:59:59.999999'::date)),
interval '7 day') as week LEFT JOIN time_entries
ON DATE_TRUNC('week', time_entries.start_time) = week
GROUP BY week, time_entries.user_id
ORDER BY week
This will return
+-------+---------+------------+
| total | user_id | week |
+-------+---------+------------+
| 14.08 | 5 | 2017-10-30 |
| 21.92 | 6 | 2017-10-30 |
| 10.92 | 7 | 2017-10-30 |
| 14.26 | 8 | 2017-10-30 |
| 14.78 | 10 | 2017-10-30 |
| 14.08 | 13 | 2017-10-30 |
| 15.83 | 15 | 2017-10-30 |
| 8.75 | 5 | 2017-11-06 |
| 10.53 | 6 | 2017-11-06 |
| 13.73 | 7 | 2017-11-06 |
| 14.26 | 8 | 2017-11-06 |
| 19.45 | 10 | 2017-11-06 |
| 15.95 | 13 | 2017-11-06 |
| 14.16 | 15 | 2017-11-06 |
| 1.00 | 20 | 2017-11-13 |
| 0 | | 2017-11-20 |
| 2.50 | 6 | 2017-11-27 |
| 0 | | 2017-12-04 |
| 0 | | 2017-12-11 |
| 0 | | 2017-12-18 |
| 0 | | 2017-12-25 |
+-------+---------+------------+
However, this is difficult to parse particularly when there's no data for a week. What I would like is a pivot or crosstab table where the weeks are the columns and the rows are the users. And to include nulls from each (for instance if a user had no entries in that week or week without entries from any user).
Something like this
+---------+---------------+--------------+--------------+
| user_id | 2017-10-30 | 2017-11-06 | 2017-11-13 |
+---------+---------------+--------------+--------------+
| 6 | 4.0 | 1.0 | 0 |
| 7 | 4.0 | 1.0 | 0 |
| 8 | 4.0 | 0 | 0 |
| 9 | 0 | 1.0 | 0 |
| 10 | 4.0 | 0.04 | 0 |
+---------+---------------+--------------+--------------+
I've been looking around online and it seems that "dynamically" generating a list of columns for crosstab is difficult. I'd rather not hard code them, which seems weird to do anyway for dates. Or use something like this case with week number.
Should I look for another solution besides crosstab? If I could get the series of weeks for each user including all nulls I think that would be good enough. It just seems that right now my join strategy isn't returning that.
Personally I would use a Date Dimension table and use that table as the basis for the query. I find it far easier to use tabular data for these types of calculations as it leads to SQL that's easier to read and maintain. There's a great article on creating a Date Dimension table in PostgreSQL at https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac, though you could get away with a much simpler version of this table.
Ultimately what you would do is use the Date table as the base for the SELECT cols FROM table section and then join against that, or probably use Common Table Expressions, to create the calculations.
I'll write up a solution to that if you would like demonstrating how you could create such a query.

Crosstab function and Dates PostgreSQL

I had to create a cross tab table from a Query where dates will be changed into column names. These order dates can be increase or decrease as per the dates passed in the query. The order date is in Unix format which is changed into normal format.
Query is following:
Select cd.cust_id
, od.order_id
, od.order_size
, (TIMESTAMP 'epoch' + od.order_date * INTERVAL '1 second')::Date As order_date
From consumer_details cd,
consumer_order od,
Where cd.cust_id = od.cust_id
And od.order_date Between 1469212200 And 1469212600
Order By od.order_id, od.order_date
Table as follows:
cust_id | order_id | order_size | order_date
-----------|----------------|---------------|--------------
210721008 | 0437756 | 4323 | 2016-07-22
210721008 | 0437756 | 4586 | 2016-09-24
210721019 | 10749881 | 0 | 2016-07-28
210721019 | 10749881 | 0 | 2016-07-28
210721033 | 13639 | 2286145 | 2016-09-06
210721033 | 13639 | 2300040 | 2016-10-03
Result will be:
cust_id | order_id | 2016-07-22 | 2016-09-24 | 2016-07-28 | 2016-09-06 | 2016-10-03
-----------|----------------|---------------|---------------|---------------|---------------|---------------
210721008 | 0437756 | 4323 | 4586 | | |
210721019 | 10749881 | | | 0 | |
210721033 | 13639 | | | | 2286145 | 2300040