Count Non-NULL Elements Across Multiple Columns for Each Row - postgresql
Back Story
I have an employee table where each employee has one or more work role(s) assigned to them, and for each assigned role it shows the year that it was assigned. I have a query that builds a matrix table, with one row per employee, like so:
name
role1
role2
role3
role4
role5
Bunny, Bugs
2022
Coyote, Wiley
2018
2018
Pig, Porky
2017
Mouse, Mickey
2020
Panther, Pink
2019
Cheese, Chuckey
2021
2017
Duck, Donald
2021
2021
Devil, Taz
2019
Brown, Charlie
2021
2019
Flintstone, Fred
2019
2011
2016
Summary Columns
I need to have some additional columns that compute the following for each employee:
Number of roles assigned to the employee (COUNT?)
The year of the oldest role assigned (LEAST)
The year of the most recent role assigned (GREATEST)
Desired Results
This table illustrates the desired results of my analysis:
name
role1
role2
role3
role4
role5
role_count
role_oldest
role_newest
Bunny, Bugs
2022
1
2022
2022
Coyote, Wiley
2018
2015
2
2015
2018
Pig, Porky
2017
1
2017
2017
Mouse, Mickey
2020
1
2020
2020
Panther, Pink
2019
1
2019
2019
Cheese, Chuckey
2021
2017
2
2017
2021
Duck, Donald
2021
2021
2
2021
2021
Devil, Taz
2019
1
2019
2019
Brown, Charlie
2021
2019
2
2019
2021
Flintstone, Fred
2019
2011
2016
3
2016
2019
My Mixed Results
For the role_oldest and role_newest columns, I was able to use the following statements:
UPDATE employee_roles
SET role_oldest = LEAST(role1, role2, role3, role4, role5);
UPDATE employee_roles
SET role_newest = GREATEST(role1, role2, role3, role4, role5);
However, in iterative attempts at computing the role_count, I unsuccessfully tried various combinations of the COUNT, ARRAY, UNNESTand STRING_TO_ARRAY functions, like so:
UPDATE employee_roles
SET role_count = COUNT(role1, role2, role3, role4, role5);
UPDATE employee_roles
SET role_count = COUNT(UNNEST(role1, role2, role3, role4, role5));
UPDATE employee_roles
SET role_count = COUNT(UNNEST(ARRAY(role1, role2, role3, role4, role5)));
UPDATE employee_roles
SET role_count = COUNT(UNNEST(ARRAY(STRING_TO_ARRAY(role1, role2, role3, role4, role5))));
Climbing Out of the Rabbit Hole
It is apparent that I am going the wrong way in this exercise, so I am pulling back out and seeking your help in this. I am sure there is indeed an elegant solution for this, and it is right under my nose, and I hope someone can help me find it.
Similar Questions with Complicated Answers
I have found numerous other similar posts here on SO, and every one of them include crazy solutions that require lots of PSQL gymnastics, and I find it hard to believe that there is not a simple function that does this.
Code for Creating Sample Table
The following code blocks will help quickly create the sample table for this exercise:
create table employee_roles
(
name text,
role1 integer default null,
role2 integer default null,
role3 integer default null,
role4 integer default null,
role5 integer default null,
role6 integer default null,
role_count integer default null,
role_oldest integer default null,
role_newest integer default null
);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Bunny, Bugs',null,null,null,null,2022);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Coyote, Wiley',2018,null,null,2018,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Pig, Porky',null,null,2017,null,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Mouse, Mickey',null,null,null,2020,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Panther, Pink',2019,null,null,null,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Cheese, Chuckey',null,null,2021,2017,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Duck, Donald',2021,null,null,2021,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Devil, Taz',null,2019,null,null,null);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Brown, Charlie',null,null,2021,null,2019);
insert into employee_roles (name,role1,role2,role3,role4,role5)
values ('Flintstone, Fred',2019,null,2011,2016,null);
I'm not sure which version you're using (can you update?) -- in v. 14, it doesn't look like GREATEST() and LEAST() behave in the way you described.
You can try the following to get the COUNT output you're looking for:
edb=# select name, array_length(array_remove(array[role1,role2,role3,role4,role5],null),1) from employee_roles ;
name | array_length
------------------+--------------
Bunny, Bugs | 1
Coyote, Wiley | 2
Pig, Porky | 1
Mouse, Mickey | 1
Panther, Pink | 1
Cheese, Chuckey | 2
Duck, Donald | 2
Devil, Taz | 1
Brown, Charlie | 2
Flintstone, Fred | 3
(10 rows)
Related
missing years - replace with empty row
I am tying to retrieve some data and it is as below: id year value 1 2015 200 1 2016 3000 1 2018 500 2 2010 455 2 2015 678 2 2020 100 as you can see some years are missing - I would like to add the rows with missing years, null for column value and I want to do it per specific ids - any ideas?
You can combine GENERATE_SERIES() with a left join do expand the missing years. For example: select x.id, x.y, t.value from (select id, generate_series(min(year), max(year)) as y from t group by id) x left join t on t.id = x.id and t.year = x.y Result: id y value --- ----- ----- 1 2015 200 1 2016 3000 1 2017 null 1 2018 500 2 2010 455 2 2011 null 2 2012 null 2 2013 null 2 2014 null 2 2015 678 2 2016 null 2 2017 null 2 2018 null 2 2019 null 2 2020 100
Postgres Crosstab on double columns with unknown value
So i have a table like this in my Postgres v.10 DB CREATE TABLE t1(id integer primary key, ref integer,v_id integer,total numeric, year varchar, total_lastyear numeric,lastyear varchar ) ; INSERT INTO t1 VALUES (1, 2077,15,10000,2020,9000,2019), (2, 2000,13,190000,2020,189000,2019), (3, 2065,11,10000,2020,10000,2019), (4, 1999,14,2300,2020,9000,2019); select * from t1 = id ref v_id total year total_lastyear lastyear 1 2077 15 10000 2020 9000 2019 2 2000 13 190000 2020 189000 2019 3 2065 11 10000 2020 10000 2019 4 1999 14 2300 2020 9000 2019 Now i want to Pivot this table so that i have 2020 and 2019 as columns with the total amounts as values. My Problems: I don't know how two pivot two columns in the same query, is that even possibly or do you have to make two steps? The years 2020 and 2019 are dynamic and can change from one day to another. The year inside the column is the same on every row. So basicly i need to save the years inside lastyear and year in some variable and pass it to the Crosstab query. This far i made it myself but i only managed to pivot one year and the 2019 and 2020 years is hardcoded. Demo
You can pivot one at a time with WITH. WITH xd1 AS ( SELECT * FROM crosstab('SELECT ref,v_id,year,total FROM t1 ORDER BY 1,3', 'SELECT DISTINCT year FROM t1 ORDER BY 1') AS ct1(ref int,v_id int,"2020" int) ), xd2 AS ( SELECT * FROM crosstab('SELECT ref,v_id,lastyear,total_lastyear FROM t1 ORDER BY 1,3', 'SELECT DISTINCT lastyear FROM t1 ORDER BY 1') AS ct2(ref int,v_id int,"2019" int) ) SELECT xd1.ref,xd1.v_id,xd1."2020",xxx."2019" FROM xd1 LEFT JOIN xd2 AS xxx ON xxx.ref = xd1.ref AND xxx.v_id = xd1.v_id; This doesn't prevent from last_year and year colliding. You still have to know the years query will return as you have to define record as it is returned by crosstab. You could wrap it in an EXECUTE format() to make it more dynamic and deal with some stringology. This issue was mentioned here.
PostgresSQL: Fill values for null rows based on rows for which we do have values
I have the following table: country year rank 1 Austria 2019 1 2 Austria 2018 NA 3 Austria 2017 NA 4 Austria 2016 NA 5 Spain 2019 2 6 Spain 2018 NA 7 Spain 2017 NA 8 Spain 2016 NA 9 Belgium 2019 3 10 Belgium 2018 NA 11 Belgium 2017 NA 12 Belgium 2016 NA I want to fill in the NA values for 2018, 2017 and 2016 for each country with the value for 2019 (which we have). I want the output table to look like this: country year rank 1 Austria 2019 1 2 Austria 2018 1 3 Austria 2017 1 4 Austria 2016 1 5 Spain 2019 2 6 Spain 2018 2 7 Spain 2017 2 8 Spain 2016 2 9 Belgium 2019 3 10 Belgium 2018 3 11 Belgium 2017 3 12 Belgium 2016 3 I do not know where to get started with this question. I typically work with R but am now working on a platform which uses postgresSQL. I could do this in R but thought it would be worthwhile to figure out how it is done with postgres. Any help with this would be greatly appreciated. Thank you.
Using an update to join to find the non NULL rank value for each country: UPDATE yourTable AS t1 SET "rank" = t2.max_rank FROM ( SELECT country, MAX("rank") AS max_rank FROM yourTable GROUP BY country ) t2 WHERE t2.country = t1.country; -- AND year IN (2016, 2017, 2018) Add the commented out portion of the WHERE clause if you really only want to target certain years (your example seems to imply that you want to backfill all missing data). If you just want to view your data in the format of the output, then use MAX as an analytic function: SELECT country, year, MAX("rank") OVER (PARTITION BY country) AS "rank" FROM yourTable ORDER BY country, year DESC;
If you just want the output then try this, with cte as ( select distinct on (country) * from test order by country, year desc ) select t1.id,t1.country,t1.year,t2.rank from test t1 left join cte t2 on t1.country=t2.country If you want to update your table then try this: with cte as ( select distinct on (country) * from test order by country, year desc ) update test set rank=cte.rank from cte where test.country=cte.country DEMO
convert values from two columns into one new column
I have two columns: year and month: Year Month 2017 01 2017 02 2018 12 2019 06 2020 07 With select to_date(concat(Year, Month), 'YYYYMM') csv_date FROM my_table; I can get just one column with date datatype. How can I add this column in my table, to get this: Year Month csv_date 2017 01 2017-01-00 2017 02 2017-02-00 2018 12 2018-12-00 2019 06 2019-06-00 2020 07 2020-07-00
You can not have a column defined as date that contains 00 for the day. That would be an invalid date, and Postgres will not allow it. The suggested method of concatenating the 2 works if the year and month are defined as a string type column, but the result will have '01' for the day. If those columns are defined as numeric then you can use the make date function. with my_table(tyr, tmo, nyr,nmo) as ( values ('2020', '04', 2020, 04 ) ) select to_date(concat(tyr, tmo), 'YYYYMM') txt_date , make_date(nyr,nmo,01) num_date from my_table; With that said then use the to_char function for a date column you can to get just year and month (and if you must) add the '-00'. so with my_table (adate) as ( values ( date '2020-04-01') ) select adate, to_char(adate,'yyyy-mm') || '-00' as yyyymm from mytable; If you are on v12 and want to add the column you can add it as a generated column. This will have the advantage that it cannot be updated independently but will automatically update when the source columns(s) get updated. See fiddle complete example; alter table my_table add column cvs_date date generated always as (make_date(yr, mo,01)) stored;
Using PostgreSQL Query If you want to add new column then alter table my_table add column csv_date date; update my_table set csv_date=to_date(concat(Year, Month), 'YYYYMM'); If you want only select output then: select year, month, to_date(concat(Year, Month), 'YYYYMM') csv_date FROM my_table;
Find max value in a group in FileMaker
How to select only max values in a group in the following set id productid price year --------------------------- 1 11 0,10 2015 2 11 0,12 2016 3 11 0,11 2017 4 22 0,08 2016 5 33 0,02 2016 6 33 0,01 2017 Expected result for each productid and max year would be id productid price year --------------------------- 3 11 0,11 2017 4 22 0,08 2016 6 33 0,01 2017
This works for me. ExecuteSQL ( "SELECT t.id, t.productid, t.price, t.\"year\" FROM test t WHERE \"year\" = (SELECT MAX(\"year\") FROM test tt WHERE t.productid = tt.productid)" ; " " ; "") Adapted from this answer: https://stackoverflow.com/a/21310671/832407
A simple SQL query will give you a last year for every product record ExecuteSQL ( "SELECT productid, MAX ( \"year\") FROM myTable GROUP By productid"; "";"" ) To get to the price for that year is going to be trickier, as FileMaker SQL does not fully support subqueries or temp tables.