Trying to scale this down so the answer is simple. I can probably extrapolate the answers here to apply to a bigger data set.
Given the following table:
+------+-----+
| name | age |
+------+-----+
| a | 5 |
| b | 7 |
| c | 8 |
| d | 8 |
| e | 10 |
+------+-----+
I want to make a table that shows the count of people where their age is equal to or greater than x. For instance, the table about would produce:
+--------------+-------+
| at least age | count |
+--------------+-------+
| 5 | 5 |
| 6 | 4 |
| 7 | 4 |
| 8 | 3 |
| 9 | 1 |
| 10 | 1 |
+--------------+-------+
Is there a single query that can accomplish this task? Obviously, it is easy to write a simple function for it, but I'm hoping to be able to do this quickly with one query.
Thanks!
Yes, what you're looking for is a window function.
with cte_age_count as (
select age,
count(*) c_star
from people
group by age)
select age,
sum(c_star) over (order by age
range between unbounded preceding
and current row)
from cte_age_count
Not syntax checked ... let me know if it works!
Related
I have one table where I dump all records from different sources (x, y, z) like below
+----+------+--------+
| id | source |
+----+--------+
| 1 | x |
| 2 | y |
| 3 | x |
| 4 | x |
| 5 | y |
| 6 | z |
| 7 | z |
| 8 | x |
| 9 | z |
| 10 | z |
+----+--------+
Then I have one mapping table where I map values between sources based on my usecase like below
+----+-----------+
| id | mapped_id |
+----+-----------+
| 1 | 2 |
| 1 | 9 |
| 3 | 7 |
| 4 | 10 |
| 5 | 1 |
+----+-----------+
I want merged results where I can see only unique results like
+-----+------------+
| id | mapped_ids |
+-----+------------+
| 1 | 2,9,5 |
| 3 | 7 |
| 4 | 10 |
| 6 | null |
| 8 | null |
+-----+------------+
I am trying different options but could not figure this out, is there way I can write joins to do this. I have to use the mapping table where associations are stored and identify unique records along with records which are not mapped anywhere.
My understanding is, you want to see all dump_table IDs that do not appear in the mapping_id column and then aggregate the mapped_ids for those that are left:
select d1.id,
array_agg(m1.mapped_id order by m1.mapped_id) filter (where m1.mapped_id is not null) as mapped_ids
from dump_table d1
left join mapping_table m1 using (id)
where not exists (select *
from mapping_table m2
where m2.mapped_id = d1.id)
group by d1.id;
Online example: https://rextester.com/JQZ17650
Try something like this:
SELECT id, name, ARRAY_AGG(mapped_id) AS mapped_ids
FROM table1 AS t1
LEFT JOIN table2 AS t2 USING (id)
GROUP BY id, name
I need help with updating table from another table in Postgres Db.
Long story short we ended up with corrupted data in db, and now I need to update one table with values from another.
I have table with this data table wfc:
| step_id | command_id | commands_order |
|---------|------------|----------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 3 |
and I want to update values in command_order column from another table, so I can have result like this:
| step_id | command_id | commands_order|
|---------|------------|---------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 2 |
| 2 | 4 | 3 |
It was looking like easy task, but problem is to update rows for same command_id, it is writing same value in commands_order
SQL that I tried is:
UPDATE wfc
SET commands_order = CAST(sq.input_step_id as INTEGER)
FROM (
SELECT wfp.step_id, wfp.input_command_id, wfp.input_step_id
from wfp
order by wfp.step_id, wfp.input_step_id
) AS sq
WHERE (wfc.step_id=sq.step_id AND wfc.command_id=CAST(sq.input_command_id as INTEGER));
SQL Fiddle http://sqlfiddle.com/#!15/4efff4/4
I am pretty stuck with this, please help.
Thanks in advance.
Assuming you are trying to number the rows in the order in which they were created, and as long as you understand that ctid will chnage on update and with VACCUUM FULL, you can do the following:
select step_id, command_id, rank - 1 as command_order
from (select step_id, command_id, ctid as wfc_ctid, rank() over
(partition by step_id order by ctid)
from wfc) as wfc_ordered;
This will give you the wfc table with the ordering that you want. If you do update the original table, the ctids will change, so it's probably safer to create a copy of the table with the above query.
I have the following table that I have loaded in Tableau (It has only one column CreatedOnDate)
+-----------------+
| CreatedOnDate |
+-----------------+
| 1/1/2016 |
| 1/2/2016 |
| 1/3/2016 |
| 1/4/2016 |
| 1/5/2016 |
| 1/6/2016 |
| 1/7/2016 |
| 1/8/2016 |
| 1/9/2016 |
| 1/10/2016 |
| 1/11/2016 |
| 1/12/2016 |
| 1/13/2016 |
| 1/14/2016 |
+-----------------+
I want to be able to find the maximum date in the table, compare it with every date in the table and get the difference in days. For the above table, the maximum date in table is 1/14/2016. Every date is compared to 1/14/2016 to find the difference.
Expected Output
+-----------------+------------+
| CreatedOnDate | Difference |
+-----------------+------------+
| 1/1/2016 | 13 |
| 1/2/2016 | 12 |
| 1/3/2016 | 11 |
| 1/4/2016 | 10 |
| 1/5/2016 | 9 |
| 1/6/2016 | 8 |
| 1/7/2016 | 7 |
| 1/8/2016 | 6 |
| 1/9/2016 | 5 |
| 1/10/2016 | 4 |
| 1/11/2016 | 3 |
| 1/12/2016 | 2 |
| 1/13/2016 | 1 |
| 1/14/2016 | 0 |
+-----------------+------------+
My goal is to create this Difference calculated field. I am struggling to find a way to do this using DATEDIFF.
And help would be appreciated!!
woodhead92, this approach would work, but means you have to use table calculations. Much more flexible approach (available since v8) is Level of Details expressions:
First, define a MAX date for the whole dataset with this calculated field called MaxDate LOD:
{FIXED : MAX(CreatedOnDate) }
This will always calculate the maximum date on table (will overwrite filters as well, if you need to reflect them, make sure you add them to context.
Then you can use pretty much the same calculated field, but no need for ATTR or Table Calculations:
DATEDIFF('day', [CreatedOnDate], [MaxDate LOD])
Hope this helps!
I need to set a sequence inside T-SQL when in the first column I have sequence marker (which is repeating) and use other column for ordering.
It is hard to explain so I try with example.
This is what I need:
|------------|-------------|----------------|
| Group Col | Order Col | Desired Result |
|------------|-------------|----------------|
| D | 1 | NULL |
| A | 2 | 1 |
| C | 3 | 1 |
| E | 4 | 1 |
| A | 5 | 2 |
| B | 6 | 2 |
| C | 7 | 2 |
| A | 8 | 3 |
| F | 9 | 3 |
| T | 10 | 3 |
| A | 11 | 4 |
| Y | 12 | 4 |
|------------|-------------|----------------|
So my marker is A (each time I met A I must start new group inside my result). All rows before first A must be set to NULL.
I know that I can achieve that with loop but it would be slow solution and I need to update a lot of rows (may be sometimes several thousand).
Is there a way to achive this without loop?
You can use window version of COUNT to get the desired result:
SELECT [Group Col], [Order Col],
COUNT(CASE WHEN [Group Col] = 'A' THEN 1 END)
OVER
(ORDER BY [Order Col]) AS [Desired Result]
FROM mytable
If you need all rows before first A set to NULL then use SUM instead of COUNT.
Demo here
I am trying to transform an SQL query into Postgres, and i am struggling to do the below,
Example :
RESULTS FROM FIRST QUERY FOR MY NEXT QUERY : select sourcename, type, count(type), sum(size) from table;
sourcename | Type | count | sum
------------+-----------------------------------------------------+-------+----------
A | TYPE1 | 21 | 10485378
B | TYPE1 | 12 | 5241177
C | TYPE1 | 12 | 5242254
D | TYPE1 | 12 | 5570560
A | TYPE2 | 11 | 5239645
B | TYPE2 | 12 | 5570560
C | TYPE2 | 12 | 5241862
D | TYPE2 | 11 | 5570560
OUTPUT I NEED:
sourcename | Type | count | sum
------------+-----------------------------------------------------+-------+----------
A | TYPE1 | 21 | 10485378
| TYPE2 | 12 | 5241177
TOTAL | | 33 | sum(values above)
B | TYPE1 | 12 | 5242254
| TYPE2 | 12 | 5570560
TOTAL | | 24 | sum(values above)
C | TYPE1 | 11 | 5239645
| TYPE2 | 12 | 5570560
TOTAL | | 23 | sum(values above)
D | TYPE1 | 12 | 5241862
| TYPE2 | 11 | 5570560
TOTAL | | 23 | sum(values above)
NOTE: I have used WITH AS to get the total but displaying them between each Source name i.e. A,B,C,D is something i am not able to do, i am also not able to suppress A from getting displayed twice in the results, like in oracle its break on sourcename, but is there a postgres equivalent?. also compute sum on column is in oracle , i could not find a postgres equivalent. I have finally found a way out using both shell and psql to get the below desired results, but i need to know if there is a better way to do this without using shell in between. Any help is much appreciated. I am using psql 9.1.3
I am new to this forum, so if you see my table results not aligned for viewing let me know, i will try to set it right.
(Using PostgreSQL 9.1)