Postgres lead window with a series index

Postgres lead window with a series index - postgresql

So I'm working with a table of instruments and their historic closing prices and I'm trying to match a day with an arbitrary number of days in the future's closing price. To do this I'm trying to use the lead function with a generated series's index.
select subq.*, lead(subq.close, x) over w, lead(subq.date, x) over w, x from
select instrument_id, close, timestamp::date as date from
instrument_data_daily order by instrument_id, date) as subq,
generate_series(2, 5) as x
window w as (PARTITION BY instrument_id ORDER BY date);
However this provides what I consider to be incorrect results:
instrument_id | close | date | lead | lead | x
---------------|-------------|------------|-------------|------------|---
801 | 34499.96 | 2014-12-04 | 34499.96 | 2014-12-04 | 2
801 | 34499.96 | 2014-12-04 | 31599.99 | 2014-12-05 | 3
801 | 34499.96 | 2014-12-04 | 31599.99 | 2014-12-05 | 4
801 | 34499.96 | 2014-12-04 | 28599.99 | 2014-12-08 | 5
801 | 31599.99 | 2014-12-05 | 31599.99 | 2014-12-05 | 2
801 | 31599.99 | 2014-12-05 | 28599.99 | 2014-12-08 | 3
801 | 31599.99 | 2014-12-05 | 28599.99 | 2014-12-08 | 4
801 | 31599.99 | 2014-12-05 | 25800.04 | 2014-12-09 | 5
801 | 28599.99 | 2014-12-08 | 28599.99 | 2014-12-08 | 2
801 | 28599.99 | 2014-12-08 | 25800.04 | 2014-12-09 | 3
801 | 28599.99 | 2014-12-08 | 25800.04 | 2014-12-09 | 4
801 | 28599.99 | 2014-12-08 | 23399.95 | 2014-12-10 | 5
801 | 25800.04 | 2014-12-09 | 25800.04 | 2014-12-09 | 2
801 | 25800.04 | 2014-12-09 | 23399.95 | 2014-12-10 | 3
801 | 25800.04 | 2014-12-09 | 23399.95 | 2014-12-10 | 4
801 | 25800.04 | 2014-12-09 | 21499.98 | 2014-12-11 | 5
Note that the lead dates for indexes 3 and 4 are the same.
The underlying query generates a table with no duplicates:
select instrument_id, close, timestamp::date as date from
instrument_data_daily order by instrument_id, date;
Provides the following results:
instrument_id | close | date
---------------+-------------+------------
801 | 34499.96 | 2014-12-04
801 | 31599.99 | 2014-12-05
801 | 28599.99 | 2014-12-08
801 | 25800.04 | 2014-12-09
801 | 23399.95 | 2014-12-10
801 | 21499.98 | 2014-12-11
801 | 23100.00 | 2014-12-12
801 | 23300.04 | 2014-12-15
So we can see that the underlying data doesn't contain any problems with duplicates, and that the generated series index, x, is where I'm expecting it to be. Any ideas as to why the window would pull the wrong index?
(Data set is truncated to fit the example, but it's about a quarter of a million rows deep, making joins expensive).
edit: (Adding in expected results for clarity)
However the expected results are that each date is paired with the correct amount of lead indexed based on x, as follows: (Note this is manually created, query has not yet been updated)
instrument_id | close | date | lead | lead | x
---------------|-------------|------------|-------------|------------|---
801 | 34499.96 | 2014-12-04 | 31599.99 | 2014-12-05 | 2
801 | 34499.96 | 2014-12-04 | 28599.99 | 2014-12-08 | 3
801 | 34499.96 | 2014-12-04 | 25800.04 | 2014-12-09 | 4
801 | 34499.96 | 2014-12-04 | 23399.95 | 2014-12-10 | 5
801 | 31599.99 | 2014-12-05 | 31599.99 | 2014-12-08 | 2
801 | 31599.99 | 2014-12-05 | 25800.04 | 2014-12-09 | 3
801 | 31599.99 | 2014-12-05 | 23399.95 | 2014-12-10 | 4
801 | 31599.99 | 2014-12-05 | 21499.98 | 2014-12-11 | 5
801 | 28599.99 | 2014-12-08 | 25800.04 | 2014-12-09 | 2
801 | 28599.99 | 2014-12-08 | 23399.95 | 2014-12-10 | 3
801 | 28599.99 | 2014-12-08 | 21499.98 | 2014-12-11 | 4
801 | 28599.99 | 2014-12-08 | 23100.00 | 2014-12-12 | 5
801 | 25800.04 | 2014-12-09 | 23399.95 | 2014-12-10 | 2
801 | 25800.04 | 2014-12-09 | 21499.98 | 2014-12-11 | 3
801 | 25800.04 | 2014-12-09 | 23100.00 | 2014-12-12 | 4
801 | 25800.04 | 2014-12-09 | 23300.04 | 2014-12-15 | 5
We should actually update the generated series to be from 1 to x, where x is maximal lookahead desired. However the same overlapping lead results occur for any nontrivial series results.

Related

How to group MVA field for faceted in sphinx

I have an index where some data's has duplicate, all fields are similar except for latitude,longitude and id (field id is not realy ID, just generated row_number() OVER () AS id).
it's example:
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 6 | 919 | 15,243 | 0.825162 | 0.692828 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 8 | 918 | 8,154 | 0.825162 | 0.692828 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 10 | 920 | 17,283,288 | 0.958915 | 1.282215 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 12 | 924 | 12,208 | 0.973336 | 0.658237 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 14 | 923 | 21,365 | 0.973336 | 0.658237 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 16 | 922 | 20,359 | 0.973336 | 0.658237 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 18 | 921 | 19,346 | 0.973336 | 0.658237 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Now I want to group data by vacancy_id
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy group by vacancy_id;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
| 21 | 961 | 4,105 | 0.959217 | 1.280721 |
| 23 | 960 | 8,155 | 0.959217 | 1.280721 |
| 25 | 959 | 12,208 | 0.959217 | 1.280721 |
| 27 | 928 | 1,60 | 0.963734 | 1.070297 |
| 29 | 927 | 32,513 | 0.963734 | 1.070297 |
| 31 | 929 | 6,140 | 0.786553 | 0.678649 |
| 33 | 932 | 1,40,46 | 0.824627 | 0.694182 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Result is awesome! But problem begins when I want to get all grouped data with faceted
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+-----------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 |
+------+------------+-----------------+----------+-----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
Additionaly
I fount here http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/ but just written "If you have a MVA facet, you need to use the GROUPBY() function which returns the actual value on which the grouping was made." and without examle.
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude,GROUPBY() as selected,COUNT(*) from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+----------+----------+
| id | vacancy_id | prof_area_ids | latitude | longitude | selected | count(*) |
+------+------------+-----------------+----------+-----------+----------+----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | 917 | 1 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 | 1004 | 2 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 | 1072 | 3 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 | 1136 | 3 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 | 1097 | 3 |
+------+------------+-----------------+----------+-----------+----------+----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Also faceted result is wrong.

Seems, wanting effectively COUNT(DISTINCT vacancy_id) on the FACET rather than the default COUNT(*), but alas it turns out
... FACET prof_area_ids,COUNT(DISTINCT vacancy_id) AS vacancies BY prof_area_ids
doesnt work. The bit before BY only supports attributes, not custom functions.
... will just have to write it out the long way, with full queries...
select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy
where prof_area_ids=199 group by vacancy_id;
SELECT GROUPBY() AS prof_area_id, COUNT(DISTINCT vacancy_id) FROM jobVacancy
WHERE prof_area_ids=199 GROUP BY prof_area_id;
Same results, just slightly more verbose. ie rather than using FACET shorthand, write it
out in full, as multiple seperate queries.

Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
It looks like you misunderstand how FACET works. It seems to me, that you think it takes as a base the main query's result, but it actually just does another grouping. E.g. here:
mysql> select g, t from idx_mva where t = 11 group by g facet t;
+------+----------+
| g | t |
+------+----------+
| 1 | 11,12 |
| 2 | 11,13,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
for t=11 you can see that as in your case it's found 3 times in the 1st query's result, but the count for that is 6 in the FACET's query result. This is because it actually occurs 6 times in the index:
mysql> select * from idx_mva where t = 11;
+------+------+----------+
| id | g | t |
+------+------+----------+
| 2 | 1 | 11,12 |
| 3 | 1 | 11,15 |
| 3 | 2 | 11,13,15 |
| 6 | 3 | 9,11 |
| 8 | 5 | 11,12,15 |
| 11 | 2 | 3,11,15 |
+------+------+----------+
6 rows in set, 1 warning (0.00 sec)
and it happens 3 times in the 1st case only because the t's value is returned only once for each of the groups. You can use group_concat() to see more values from the same group:
mysql> select g, group_concat(to_string(t)) from idx_mva where t = 11 group by g facet t;
+------+----------------------------+
| g | group_concat(to_string(t)) |
+------+----------------------------+
| 1 | 11,12,11,15 |
| 2 | 11,13,15,3,11,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------------------------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
If you want to learn more about faceting here's an interactive course about that - https://play.manticoresearch.com/faceting/

faceted search for field with multiple value

I have a table where field with multiple value by comma:
+------+---------------+
| id | education_ids |
+------+---------------+
| 3 | 7,5 |
| 4 | 7,3 |
| 5 | 1,5 |
| 8 | 3 |
| 9 | 5,7 |
| 11 | 9 |
...
+------+---------------+
when I trying use faceted search:
select id,education_ids from jobResume facet education_ids;
I'm getting this response:
+---------------+----------+
| education_ids | count(*) |
+---------------+----------+
| 7,5 | 3558 |
| 7,3 | 3655 |
| 1,5 | 3686 |
| 3 | 31909 |
| 5,7 | 3490 |
| 9 | 31743 |
| 9,6 | 3535 |
| 8,2 | 3547 |
| 6,2,7 | 291 |
| 7,8,1 | 291 |
| 1,2 | 3637 |
| 7 | 31986 |
| 5,9,7 | 408 |
| 1,1,5 | 365 |
| 5 | 31768 |
| 3,8,3,7 | 32 |
| 3,7,6 | 431 |
| 2 | 31617 |
| 5,5 | 3614 |
| 9,9,2,2 | 6 |
+---------------+----------+
but that's not what I wanted to see. I would like to where each value had its own count, for example like here:
+---------------+----------+
| education_ids | count(*) |
+---------------+----------+
| 10 | 961 |
| 11 | 1653 |
| 12 | 1998 |
| 13 | 2090 |
| 14 | 1058 |
| 15 | 347 |
...
+---------------+----------+
can I get such a result with sphinx?

Make sure you use an MVA, not a string attribute:
index rt
{
type = rt
rt_field = f
rt_attr_multi = education_ids
path = rt
}
snikolaev#dev:$ mysql -P9306 -h0
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 3.2.2 62ea5ff#191220 release
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> insert into rt(education_ids) values((7,5)), ((7,3)), ((7,1)), ((5,1)), ((5,3));
Query OK, 5 rows affected (0.00 sec)
mysql> select * from rt facet education_ids;
+---------------------+---------------+
| id | education_ids |
+---------------------+---------------+
| 2810610458032078849 | 5,7 |
| 2810610458032078850 | 3,7 |
| 2810610458032078851 | 1,7 |
| 2810610458032078852 | 1,5 |
| 2810610458032078853 | 3,5 |
+---------------------+---------------+
5 rows in set (0.00 sec)
+---------------+----------+
| education_ids | count(*) |
+---------------+----------+
| 7 | 3 |
| 5 | 3 |
| 3 | 2 |
| 1 | 2 |
+---------------+----------+
4 rows in set (0.00 sec)
BTW here's an interactive course about faceting in Sphinx / Manticore in case you want to learn more about that - https://play.manticoresearch.com/faceting/

Summarize the cost by groups in org table

Suppose such a spreadsheet in org table
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| | B | 52.67 | 2 | 105.34 | diagnosis |
| | C | 3.08 | 1 | 3.08 | materials |
| | D | 3.85 | 2 | 7.7 | materials |
| | E | 33.66 | 2 | 67.32 | materials |
| | F | 40 | 1 | 40 | treatments |
| | G | 16.5 | 1 | 16.5 | materials |
| | H | 4 | 3 | 12 | treatments |
| | I | 40 | 1 | 40 | bed |
| x | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: $5=$3*$4
I want to sum up the material fees.
Is it possible to calculate it by grouping like vsum(where Categories == materials)?

One way to do this with an elisp expression will be:
|------------+-------+------------+--------+--------+------------|
| Date | Items | Unit Price | Amount | Amount | Categories |
|------------+-------+------------+--------+--------+------------|
| 2019/09/17 | A | 2.64 | 1 | 2.64 | materials |
| | B | 52.67 | 2 | 105.34 | diagnosis |
| | C | 3.08 | 1 | 3.08 | materials |
| | D | 3.85 | 2 | 7.7 | materials |
| | E | 33.66 | 2 | 67.32 | materials |
| | F | 40 | 1 | 40 | treatments |
| | G | 16.5 | 1 | 16.5 | materials |
| | H | 4 | 3 | 12 | treatments |
| | I | 40 | 1 | 40 | bed |
| x | M | 6 | 13 | 78 | treatments |
|------------+-------+------------+--------+--------+------------|
| TOTAL: | | | | 97.24 | |
|------------+-------+------------+--------+--------+------------|
#+TBLFM: $5=$3*$4
#+TBLFM: #12$5='(apply #'+ (cl-mapcar (lambda (num category) (if (eq category 'materials) num 0)) '(#II$5..#III$5) '(#II$6..#III$6)));L
cl-mapcar applies + to cell #12$5 by comparing the list which is column 6 to symbol'materials.
This solution and a `calc solution in emacsSE

I have a query that groups usage by user by day how would I add a running total to this query?

I have the following query:
SELECT
usersq1.id AS user_id, name, completed_at,
COUNT(usersq1.id) AS trips,
SUM(cost_amount_cents) AS daily_cost_amount_cents
FROM usersq1
LEFT OUTER JOIN tripsq1
ON usersq1.id = user_id
GROUP by usersq1.id, name, completed_at
ORDER by user_id, name, completed_at;
Which returns the following:
user_id | name | completed_at | trips | daily_cost_amount_cents
---------+---------------------+--------------+-------+-------------------------
1001 | Makeda Mosser | 2017-06-01 | 2 | 125
1001 | Makeda Mosser | 2017-06-02 | 1 | 125
1001 | Makeda Mosser | 2017-06-03 | 2 | 350
1001 | Makeda Mosser | 2017-06-04 | 2 | 200
1001 | Makeda Mosser | 2017-06-06 | 1 | 100
1001 | Makeda Mosser | 2017-06-07 | 1 | 125
1001 | Makeda Mosser | 2017-06-08 | 1 | 150
1002 | Libbie Luby | 2017-06-02 | 2 | 125
1002 | Libbie Luby | 2017-06-09 | 1 | 175
1003 | Linn Loughran | 2017-06-03 | 1 | 75
1004 | Natacha Ned | 2017-06-04 | 1 | 100
1005 | Lorrine Lunt | 2017-06-05 | 1 | 125
1006 | Tami Tineo | 2017-10-06 | 1 | 150
1007 | Delisa Deen | 2017-10-07 | 1 | 175
1008 | Mimi Miltenberger | 2017-10-08 | 1 | 200
1009 | Seth Sneller | 2017-10-09 | 1 | 25
1010 | Rickie Rossi | 2017-10-10 | 1 | 50
1011 | Jenise Jeanbaptiste | 2017-06-01 | 1 | 200
1011 | Jenise Jeanbaptiste | 2017-07-01 | 1 | 75
1012 | Genia Glatz | 2017-06-02 | 1 | 25
1012 | Genia Glatz | 2017-07-02 | 1 | 50
1013 | Onita Oddo | 2017-06-03 | 1 | 50
1014 | Dario Dreyer | 2017-06-04 | 1 | 75
1014 | Dario Dreyer | 2017-06-24 | 5 | 750
1015 | Toby Trent | | 1 |
I would like to produce another cumulative sum column which keeps a running total of daily_cost_amount_cents per user. The expected outlook I would like is something like this:
+---------+---------------------+------------+-------+-------------------------+-----------+
| user_id | name | created_at | trips | daily_cost_amount_cents | cum_cents |
+---------+---------------------+------------+-------+-------------------------+-----------+
| 1001 | Makeda Mosser | 6/1/17 | 2 | 125 | 125 |
| 1001 | Makeda Mosser | 6/2/17 | 1 | 125 | 250 |
| 1001 | Makeda Mosser | 6/3/17 | 2 | 350 | 600 |
| 1001 | Makeda Mosser | 6/4/17 | 2 | 200 | 800 |
| 1001 | Makeda Mosser | 6/6/17 | 1 | 100 | 900 |
| 1001 | Makeda Mosser | 6/7/17 | 1 | 125 | 1025 |
| 1001 | Makeda Mosser | 6/8/17 | 1 | 150 | 1175 |
| 1002 | Libbie Luby | 6/2/17 | 2 | 125 | 125 |
| 1002 | Libbie Luby | 6/9/17 | 1 | 175 | 300 |
| 1003 | Linn Loughran | 6/3/17 | 1 | 75 | 75 |
| 1004 | Natacha Ned | 6/4/17 | 1 | 100 | 100 |
| 1005 | Lorrine Lunt | 6/5/17 | 1 | 125 | 125 |
| 1006 | Tami Tineo | 10/6/17 | 1 | 150 | 150 |
| 1007 | Delisa Deen | 10/7/17 | 1 | 175 | 175 |
| 1008 | Mimi Miltenberger | 10/8/17 | 1 | 200 | 200 |
| 1009 | Seth Sneller | 10/9/17 | 1 | 25 | 25 |
| 1010 | Rickie Rossi | 10/10/17 | 1 | 50 | 50 |
| 1011 | Jenise Jeanbaptiste | 6/1/17 | 1 | 200 | 200 |
| 1011 | Jenise Jeanbaptiste | 7/1/17 | 1 | 75 | 275 |
| 1012 | Genia Glatz | 6/2/17 | 1 | 25 | 25 |
| 1012 | Genia Glatz | 7/2/17 | 1 | 50 | 75 |
| 1013 | Onita Oddo | 6/3/17 | 1 | 50 | 50 |
| 1014 | Dario Dreyer | 6/4/17 | 1 | 75 | 75 |
| 1014 | Dario Dreyer | 6/24/17 | 5 | 750 | 750 |
| 1015 | Toby Trent | | 0 | | |
+---------+---------------------+------------+-------+-------------------------+-----------+
I am pretty sure that I need to use a window function to do this but can't seem to do it while preserving the grouping by user_id and created_by

The problem is that in the presence of a GROUP BY clause, the window functions iterate over each group rather than multiple grouped rows. Put your query into a WITH clause and you can easily do the windowing you want:
WITH t AS (
SELECT usersq1.id AS user_id,
name,
completed_at,
COUNT(completed_at) AS trips, -- To correctly handle 0 trips
SUM(cost_amount_cents) AS daily_cost_amount_cents
FROM usersq1
LEFT OUTER JOIN tripsq1 ON usersq1.id = user_id
GROUP BY usersq1.id, name, completed_at
ORDER BY user_id, name, completed_at
) SELECT user_id,
name,
completed_at AS created_at,
trips,
daily_cost_amount_cents,
SUM(daily_cost_amount_cents) OVER (PARTITION BY user_id
ORDER BY user_id, completed_at)
FROM t;

PostgreSQL aggregate function for each row across multiple unknown number of columns

I looked through similar questions like this one, but they seem to have a definite number of columns. I would like to input a table that I do not know the number of columns.
Question:
How to calculate aggregate functions (e.g. avg() or sum() ) for each row across several columns if number of columns is not known in advance?
I have put the input table panel_stats_rnd csv and a DLL to create it here.
I would like to calculate for each row the rnd_avg_parcelcount as average of all columns c_1_avg_parcelcount, c_2_avg_parcelcount, ... where I can have input tables with any number (say 100) columns of _avg_parcelcount. And for columns rnd_sum_parcelcount I would like to calculate sum() of all columns that start with c_ and end with _sum_parcelcount.
The table looks like this:
SELECT * FROM panel_stats_rnd;
gid | d | dist_from | dist_to | distlabel | rnd_avg_parcelcount | rnd_sum_parcelcount | rnd_avg_callcount | rnd_sum_callcount | rnd_avg_perccalled | called_avg_parcelcount | called_sum_parcelcount | called_avg_callcount | called_sum_callcount | called_avg_perccalled | c_1_avg_parcelcount | c_1_sum_parcelcount | c_1_avg_callcount | c_1_sum_callcount | c_1_avg_perccalled | c_2_avg_parcelcount | c_2_sum_parcelcount | c_2_avg_callcount | c_2_sum_callcount | c_2_avg_perccalled
-----+----+-----------+---------+-----------+---------------------+---------------------+-------------------+-------------------+--------------------+------------------------+------------------------+----------------------+----------------------+-----------------------+---------------------+---------------------+-------------------+-------------------+----------------------+---------------------+---------------------+-------------------+-------------------+----------------------
1 | 0 | 0 | 100 | 0-100 | | | | | | 119045 | 119045 | 119045 | 23 | 0.000193204250493511 | 119045 | 119045 | 119045 | 16 | 0.000134402956865051 | 119045 | 119045 | 119045 | 16 | 0.000134402956865051
2 | 1 | 100 | 200 | 100-200 | | | | | | 163140 | 163140 | 163140 | 22 | 0.000134853500061297 | 163140 | 163140 | 163140 | 17 | 0.000104204977320093 | 163140 | 163140 | 163140 | 18 | 0.000110334681868334
3 | 2 | 200 | 300 | 200-300 | | | | | | 135934 | 135934 | 135934 | 10 | 7.3565112481057e-05 | 135934 | 135934 | 135934 | 18 | 0.000132417202465903 | 135934 | 135934 | 135934 | 15 | 0.000110347668721585
4 | 3 | 300 | 400 | 300-400 | | | | | | 116874 | 116874 | 116874 | 13 | 0.000111230898232284 | 116874 | 116874 | 116874 | 11 | 9.41184523503944e-05 | 116874 | 116874 | 116874 | 18 | 0.000154012012937009
5 | 4 | 400 | 500 | 400-500 | | | | | | 93216 | 93216 | 93216 | 12 | 0.000128733264675592 | 93216 | 93216 | 93216 | 10 | 0.000107277720562993 | 93216 | 93216 | 93216 | 12 | 0.000128733264675592
6 | 5 | 500 | 600 | 500-600 | | | | | | 69992 | 69992 | 69992 | 7 | 0.0001000114298777 | 69992 | 69992 | 69992 | 10 | 0.000142873471253858 | 69992 | 69992 | 69992 | 7 | 0.0001000114298777
7 | 6 | 600 | 700 | 600-700 | | | | | | 50816 | 50816 | 50816 | 10 | 0.000196788413098237 | 50816 | 50816 | 50816 | 6 | 0.000118073047858942 | 50816 | 50816 | 50816 | 0 | 0
8 | 7 | 700 | 800 | 700-800 | | | | | | 34814 | 34814 | 34814 | 0 | 0 | 34814 | 34814 | 34814 | 6 | 0.000172344459125639 | 34814 | 34814 | 34814 | 4 | 0.000114896306083759
9 | 8 | 800 | 900 | 800-900 | | | | | | 23023 | 23023 | 23023 | 1 | 4.34348260435217e-05 | 23023 | 23023 | 23023 | 4 | 0.000173739304174087 | 23023 | 23023 | 23023 | 1 | 4.34348260435217e-05
10 | 9 | 900 | 1000 | 900-1000 | | | | | | 14215 | 14215 | 14215 | 1 | 7.03482237073514e-05 | 14215 | 14215 | 14215 | 1 | 7.03482237073514e-05 | 14215 | 14215 | 14215 | 5 | 0.000351741118536757
11 | 10 | 1000 | 5000 | 1000-5000 | | | | | | 23527 | 23527 | 23527 | 0 | 0 | 23527 | 23527 | 23527 | 0 | 0 | 23527 | 23527 | 23527 | 3 | 0.000127513070089684
(11 rows)
I tried the following for 2 columns (works but I'd rather not write it 5 times for 100 columns, besides the number of columns has to be a parameter):
SELECT d,c_1_avg_parcelcount,c_2_avg_parcelcount,
(SELECT avg(c) FROM (VALUES (c_1_avg_parcelcount) , (c_2_avg_parcelcount) ) T (c)) AS Avg_,
(SELECT sum(c) FROM (VALUES (c_1_avg_parcelcount) , (c_2_avg_parcelcount) ) T (c)) AS sum_
FROM panel_stats_rnd;
I also tried the following but doesn't work.
WITH cols AS (
select value(column_name) from information_schema.columns
where table_name = 'panel_stats_rnd'
AND column_name SIMILAR TO 'c_%avg_parcelcount'
AND column_name != 'called_avg_parcelcount'
)
SELECT *, (SELECT avg(Col) FROM cols V(Col) ) AS col_average
FROM panel_stats_rnd;
I am almost there but something is missing...

select
*,
(select avg(v::numeric)
from json_each_text(row_to_json(panel_stats_rnd.*)) as j(k,v)
where k like 'c\_%\_avg\_parcelcount') as rnd_avg_parcelcount,
(select sum(v::numeric)
from json_each_text(row_to_json(panel_stats_rnd.*)) as j(k,v)
where k like 'c\_%\_sum\_parcelcount') as rnd_sum_parcelcount
from
panel_stats_rnd;
Look at the documentation about functions involved.
There are escapes for underlying characters (\_) because for like operator it is meaning any single character, for example select 'a' like '_'; is true.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres lead window with a series index - postgresql

Related

How to group MVA field for faceted in sphinx

faceted search for field with multiple value

Summarize the cost by groups in org table

I have a query that groups usage by user by day how would I add a running total to this query?

PostgreSQL aggregate function for each row across multiple unknown number of columns

Categories

Resources