I have a table (in PostgreSQL 9.3) with amount of people in cities grouped by their ages like this:
city | year | sex | age_0 | age_1 | age_2 | ... | age_115
---------------------------------------------------------
city1| 2014 | M | 12313 | 23414 | 52345 | ... | 0
city1| 2014 | F | 34562 | 23456 | 53456 | ... | 6
city2| 2014 | M | 3 | 2 | 2 | ... | 99
I'd like to break the columns down to rows, ending up with rows like this:
city | year | sex | age | amount | age_group
--------------------------------------------
city1| 2014 | M | 0 | 12313 | 0-6
city1| 2014 | M | 1 | 23414 | 0-6
city1| 2014 | M | 2 | 52345 | 0-6
city1| 2014 | M | ... | ... | ...
city1| 2014 | M | 115 | 0 | 7-115
and so on. I know I could do it with several (a lot) queries and UNIONs but instead I was wondering if there was a more elegant (less cut'n paste involving) way of doing such a query?
use arrays and unnest
select city,
year,
sex,
unnest(array[age_0 , age_1 , age_2 , ..., age_115]) as amount,
unnest(array[ 0 , 1 , 2 , ... , 115]) as age
from mytable
on large datasets this might be slow
did a quick look, there are many similar questions already asked , one good one with a good guide to dynamically generate the query you need ... les pasting for you link
generate query idiea
SELECT 'SELECT city , year , sex , unnest(ARRAY[' || string_agg(quote_ident(attname) , ',') || ']) AS amount from mytable' AS sql
FROM pg_attribute
WHERE attrelid = 'mytable'::regclass and attname ~ 'age_'
AND attnum > 0
AND NOT attisdropped
GROUP BY attrelid;
Related
I want to create a function that can create a table, in which part of the columns is derived from the other two tables.
input table1:
This is a static table for each loan. Each loan has only one row with information related to that loan. For example, original unpaid balance, original interest rate...
| id | loan_age | ori_upb | ori_rate | ltv |
| --- | -------- | ------- | -------- | --- |
| 1 | 360 | 1500 | 4.5 | 0.6 |
| 2 | 360 | 2000 | 3.8 | 0.5 |
input table2:
This is a dynamic table for each loan. Each loan has seraval rows show the loan performance in each month. For example, current unpaid balance, current interest rate, delinquancy status...
| id | month| cur_upb | cur_rate |status|
| ---| --- | ------- | -------- | --- |
| 1 | 01 | 1400 | 4.5 | 0 |
| 1 | 02 | 1300 | 4.5 | 0 |
| 1 | 03 | 1200 | 4.5 | 1 |
| 2 | 01 | 2000 | 3.8 | 0 |
| 2 | 02 | 1900 | 3.8 | 0 |
| 2 | 03 | 1900 | 3.8 | 1 |
| 2 | 04 | 1900 | 3.8 | 2 |
output table:
The output table contains information from table1 and table2. Payoffupb is the last record of cur_upb in table2. This table is built for model development.
| id | loan_age | ori_upb | ori_rate | ltv | payoffmonth| payoffupb | payoffrate |lastStatus | modification |
| ---| -------- | ------- | -------- | --- | ---------- | --------- | ---------- |---------- | ------------ |
| 1 | 360 | 1500 | 4.5 | 0.6 | 03 | 1200 | 4.5 | 1 | null |
| 2 | 360 | 2000 | 3.8 | 0.5 | 04 | 1900 | 3.8 | 2 | null |
Most columns in the output table can directly get or transferred from columns in the two input tables, but some columns can not get then leave blank.
My main question is how to write a function to take two tables as inputs and output another table?
I already wrote the feature transformation part for data files in 2018, but I need to do the same thing again for data files in some other years. That's why I want to create a function to make things easier.
As you want to insert the latest entry of table2 against each entry of table1 try this
insert into table3 (id, loan_age, ori_upb, ori_rate, ltv,
payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id)
t1.id, t1.loan_age, t1.ori_upb, t1.ori_rate, t1.ltv, t2.month, t2.cur_upb,
t2.cur_rate, t2.status
from
table1 t1
inner join
table2 t2 on t1.id=t2.id
order by t1.id , t2.month desc
DEMO1
EDIT for your updated question:
Function to do the above considering table1, table2, table3 structure will be always identical.
create or replace function insert_values(table1 varchar, table2 varchar, table3 varchar)
returns int as $$
declare
count_ int;
begin
execute format('insert into %I (id, loan_age, ori_upb, ori_rate, ltv, payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id) t1.id, t1.loan_age, t1.ori_upb,
t1.ori_rate,t1.ltv,t2.month,t2.cur_upb, t2.cur_rate, t2.status
from %I t1 inner join %I t2 on t1.id=t2.id order by t1.id , t2.month desc',table3,table1,table2);
GET DIAGNOSTICS count_ = ROW_COUNT;
return count_;
end;
$$
language plpgsql
and call above function like below which will return the number of inserted rows:
select * from insert_values('table1','table2','table3');
DEMO2
I have three tables.
TABLE_1:
T2_ID ver date boolean
---------------------------------------------------------
1 | X-20-50 | 2019-01-01 16:20:51.722336+00 | TRUE
2 | X-50-30 | 2019-02-26 16:20:51.722336+00 | TRUE
3 | X-20-32 | 2019-03-20 16:20:51.722336+00 | FALSE
1 | X-20-50 | 2019-01-09 16:20:51.722336+00 | FALSE
2 | X-20-50 | 2019-12-02 16:20:51.722336+00 | TRUE
3 | X-20-50 | 2019-01-24 16:20:51.722336+00 | TRUE
TABLE_2:
id | type | scheduler
--------------------------------------------------
1 | ABC | w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12
2 | PQR | w5,w9
3 | TRC | w1,w4,w8
TABLE_3
start_date_of_ver | end_date_of_ver | ver_name
-----------------------------------------------------------
2019-01-01 00:00:00+00 | 2019-04-01 00:00:00+00 | X-20-50
2019-02-25 00:00:00+00 | 2019-05-26 00:00:00+00 | X-50-30
2019-03-15 00:00:00+00 | 2019-06-06 00:00:00+00 | X-20-32
Table 4 should fulfill the below condition.
it takes version name (ver_name) as input
from this (ver_name), it takes start date and end date of version (from table_3) if the version period is 3 months then it creates 12 weeks table with id (type) as the first column and creates an entry of twelve-week according to table 2 of the scheduler.
information on table 4 will be updated as and when table 1 has entries of that particular week which are TRUE
Note: table 1, entries get generates on a daily basis.
Desired table: which has only ver_name as input and calculate below table.
When table_1 don't have any entries then table_4 should look like as below
Table_4: X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | w1 | w2 | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | w1 | | | w4 | | | | w8 | | | | |
When table_1 has entries then table_4 should look like as below
X-20-50
id_of_table_2 | week_1 | week_2 | week_3 | week_4 | week_5 | week_6 | week_7 | week_8 | week_9 | week_10 | week_11 | week_12 |
------------------------------------------------------------------------------------------------------------------------------
ABC | Done | Done | w3 | w4 | w5 | w6 | w7 | w8 | w9 | w10 | w11 | w12 |
PQR | | | | | w5 | | | | w9 | | | |
TRC | Done | | | w4 | | | | w8 | | | | |
You can create function which can take starting date of a week as input.
Example-
create function a(start_date)
RETURNS json
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
DECLARE
outputjson json;
BEGIN
EXECUTE 'select json_agg(*) from table_name where date >= '||start_date||' and (date '||start_date||' + integer ''7'')' into outputjson;
RETURN outputjson;
END;
$$
Hope this will help.
Your requirement needs a little refinement. You specify to retrieve weekly data yet fail to define a your week. On what day does it begin? Are all weeks 7 days long? What happens when Dec 31 falls on Tuesday is Friday Jan 3 in the same week (see current year calendar). Then there is the issue of user input and what it represents. Is it the desired start date and the week is that date and the next 6 days or any date within weekly period?
The following assumes an ISO 8601 definition (google it - lots of stuff). Every week begins on Monday and all weeks are 7 days long. (Thus the week containing 31-Dec-2019 also includes 3-Jan-2020). The routine extracts the ISO Year and ISO week user entered date.
--setup
create table weekly_something( c1 text, c2 text, date1 timestamptz, someem boolean);
insert into weekly_something( c1, c2, date1, someem )
values ('ABC','AB-20-50','2019-11-25 16:20:51.722336+00',TRUE)
, ('PQR','AB-50-30','2019-11-26 16:20:51.722336+00',TRUE)
, ('TRC','CD-20-32','2019-11-27 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',FALSE)
, ('ABC','AB-20-50','2019-12-02 16:20:51.722336+00',TRUE)
, ('JFF','yy-45-89','2019-12-31 16:20:51.722336+00',TRUE)
, ('JFF','yy-89-30','2020-01-03 16:20:51.722336+00',TRUE) ;
-- JFF Just For Fun
-- SQL Function
create function week_of(week_date date)
returns setof weekly_something
language sql stable strict
as $$
select *
from weekly_something
where (extract('isoyear' from week_date), extract('week' from week_date)) =
(extract('isoyear' from date1), extract('week' from date1));
$$;
-- test
select * from week_of('2019-11-26');
select * from week_of('2019-12-30');
Im trying to do a cumulative sum over specific periods of time for every row in Postgres, example:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 4 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 3 | Aaron |
|---------------------|------------------|------------------|
| 22-05-1990 | 7 | Aaron |
|---------------------|------------------|------------------|
Expected result, taking a range of 60 days:
|---------------------|------------------|------------------|
| Date | Value | Employee |
|---------------------|------------------|------------------|
| 25-01-1990 | 34 | Aaron |
|---------------------|------------------|------------------|
| 15-02-1990 | 38 | Aaron |
|---------------------|------------------|------------------|
| 02-03-1990 | 41 | Aaron |
|---------------------|------------------|------------------|
| 01-05-1990 | 10 | Aaron |
|---------------------|------------------|------------------|
I tried with the following but the results are not correct:
WITH tab AS (SELECT * FROM table_with_values)
SELECT tab.Date, SUM(tab.Value)
FILTER (WHERE tab.Date<=tab.Date AND tab.Date >=t.Date - INTERVAL '60 DAY')
OVER(PARTITION BY tab.Employee ORDER BY tab.Date ROWS BETWEEN UNBOUND PRECEDENT AND CURRENT ROW)
AS values_cumulative, tab.Employee
FROM tab
Try this:
SELECT date, employee, sum(bvalue)
FROM (
SELECT a.*, b.date as bdate, b.value as bvalue
FROM testtable a
LEFT JOIN testtable b ON
a.employee = b.employee AND
b.date <= a.date AND
b.date >= a.date - integer '60') c
GROUP BY employee, date
ORDER BY date ASC;
date | employee | sum
------------+----------+-----
1990-01-25 | Aaron | 34
1990-02-15 | Aaron | 38
1990-03-02 | Aaron | 41
1990-05-01 | Aaron | 10
(4 Zeilen)
Supposing I have a set of transactions (purchases) with dates for a set of customers, I want to calculate a rolling x day sum of purchase amount and number of purchases by customer in that same window. I've gotten it to work using a window function, but I have to fill in for dates where the customer did not make any purchases. In so doing, I'm using a Cartesian product. Is there a more efficient approach so that it's more scalable as the number of customers – and time window – increases?
Edit: As noted in the comments, I'm on PostgreSQL v9.3.
Here's sample data (note that some customers may have 0, 1, or multiple purchases on a given date):
| id | cust_id | txn_date | amount |
|----|---------|------------|--------|
| 1 | 123 | 2017-08-17 | 10 |
| 2 | 123 | 2017-08-17 | 5 |
| 3 | 123 | 2017-08-18 | 5 |
| 4 | 123 | 2017-08-20 | 50 |
| 5 | 123 | 2017-08-21 | 100 |
| 6 | 456 | 2017-08-01 | 5 |
| 7 | 456 | 2017-08-01 | 5 |
| 8 | 456 | 2017-08-01 | 5 |
| 9 | 456 | 2017-08-30 | 5 |
| 10 | 456 | 2017-08-01 | 1000 |
| 11 | 789 | 2017-08-15 | 1000 |
| 12 | 789 | 2017-08-30 | 1000 |
And here's the desired output:
| cust_id | txn_date | sum_dly_txns | tot_txns_7d | cnt_txns_7d |
|---------|------------|--------------|-------------|-------------|
| 123 | 2017-08-17 | 15 | 15 | 2 |
| 123 | 2017-08-18 | 5 | 20 | 3 |
| 123 | 2017-08-20 | 50 | 70 | 4 |
| 123 | 2017-08-21 | 100 | 170 | 5 |
| 456 | 2017-08-01 | 1015 | 1015 | 4 |
| 456 | 2017-08-30 | 5 | 5 | 1 |
| 789 | 2017-08-15 | 1000 | 1000 | 1 |
| 789 | 2017-08-30 | 1000 | 1000 | 1 |
Here's SQL that produces the totals as desired:
SELECT *
FROM (
-- One row per day per user
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
),
-- Every possible transaction date for every user
dummydates AS (
SELECT txn_date, uids.cust_id
FROM (
SELECT generate_series(
timestamp '2017-08-01'
,timestamp '2017-08-30'
,interval '1 day')::date
) d(txn_date)
CROSS JOIN (SELECT DISTINCT cust_id FROM daily_txns) uids
),
txns_dummied AS (
SELECT
d.cust_id
,d.txn_date
,COALESCE(sum_dly_txns,0) AS sum_dly_txns
,COALESCE(cnt_dly_txns,0) AS cnt_dly_txns
FROM dummydates d
LEFT JOIN daily_txns dx
ON d.txn_date = dx.txn_date
AND d.cust_id = dx.cust_id
ORDER BY d.txn_date, d.cust_id
)
SELECT
cust_id
,txn_date
,sum_dly_txns
,SUM(COALESCE(sum_dly_txns,0)) OVER w AS tot_txns_7d
,SUM(cnt_dly_txns) OVER w AS cnt_txns_7d
FROM txns_dummied
WINDOW w AS (
PARTITION BY cust_id
ORDER BY txn_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7d moving window
)
ORDER BY cust_id, txn_date
) xfers
WHERE sum_dly_txns > 0 -- Omit dates with no transactions
;
SQL Fiddle
Instead of ROWS BETWEEN 6 PRECEDING AND CURRENT ROW did you want to write RANGE '6 days' PRECEEDING ?
This must be what you are looking for:
SELECT DISTINCT
cust_id
,txn_date
,SUM(amount) OVER (PARTITION BY cust_id, txn_date) sum_dly_txns
,SUM(amount) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
,COUNT(*) OVER (PARTITION BY cust_id ORDER BY txn_date RANGE '6 days' PRECEDING)
from transactions
ORDER BY cust_id, txn_date
Edit: Since you are using an old version (I tested the one above on my postgresql 11), the point above will not make much sense so you will need to old-fashioned SQL (that is, witout window functions).
It is a bit less efficient but does a fair job.
WITH daily_txns AS (
SELECT
t.cust_id
,t.txn_date AS txn_date
,SUM(t.amount) AS sum_dly_txns
,COUNT(t.id) AS cnt_dly_txns
FROM transactions t
GROUP BY t.cust_id, txn_date
)
SELECT t1.cust_id, t1.txn_date, t1.sum_dly_txns, SUM(t2.sum_dly_txns), SUM(t2.cnt_dly_txns)
from daily_txns t1
join daily_txns t2 ON t1.cust_id = t2.cust_id and t2.txn_date BETWEEN t1.txn_date - 7 and t1.txn_date
group by t1.cust_id, t1.txn_date, t1.sum_dly_txns
order by t1.cust_id, t1.txn_date
I've got some records on my database that have a 'createdAt' timestamp.
What I'm trying to get out of postgresql is those records grouped by 'createdAt'
So far I've got this query:
SELECT date_trunc('day', "updatedAt") FROM goal GROUP BY 1
Which gives me:
+---+------------+-------------+
| date_trunc |
+---+------------+-------------+
| Sep 20 00:00:00 |
+---+------------+-------------+
Which are the days where the records got created.
My question is: Is there any way to generate something like:
| Sep 20 00:00:00 |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| |
| Sep 24 00:00:00 |
| |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| 2 | John De | male | NY | 32 |
That means group by date_trunc and select all the columns of those rows?
Thanks a lot!
Please try SELECT date_trunc('day', "updatedAt"), name, gender, state, age FROM goal GROUP BY 1,2,3. It will not provide as the structure, you expect, but will "group by date_trunc and select all the columns ".