I'd like to obtain is the first ocurrence of non-null value per category.
If there are just null values, the result of this category shall be NULL.
For a table like this:
Category Value
1 NULL
1 1922
2 23
2 99
3 NULL
3 NULL
the result should be
Category Value
1 1922
2 23
3 NULL
How can this be achieved using postgres?
Unfortunately the two features that would make this trivial are not implemented in postgresql
IGNORE NULLS in FIRST_VALUE, LAST_VALUE
FILTER clause in non-aggregate window functions
However, you can hack the desired result using groupby & array_agg , which does support the FILTER clause, and then pick the first element using square-bracket syntax. (recall that postgresql array indexing starts with 1)
Also, I would advise that you provide an explicit ordering for the aggregation step. Otherwise the value that ends up as the first element would depend on the query plan & physical data layout of the underlying table.
WITH vals (category, val) AS ( VALUES
(1,NULL),
(1,1922),
(2,23),
(2,99),
(3,NULL),
(3,NULL)
)
SELECT
category
, (ARRAY_AGG(val) FILTER (WHERE val IS NOT NULL))[1]
FROM vals
GROUP BY 1
produces the following output:
category | array_agg
----------+-----------
1 | 1922
3 |
2 | 23
(3 rows)
Related
Can anyone tell me which command is used for concatenate three columns data into one column in PostgreSQL database?
e.g. If the columns are
begin | Month | Year
12 | 1 | 1988
13 | 3 |
14 | | 2000
| 5 | 2012
output:
Result
12-1-1988
13-3-null
14-null-2000
null-5-2012
Actually, I have concatenated two columns but it is displaying only those values in the result
which is not null in all columns but i want to display that value also which is not null in single
column.
If you simply used a standard concatenation function like concat() or the || operator, you'd get a complete null string when any element is null.
You could use the function concat_ws() which ignores a null value. But you are expecting them to be shown.
So you need to cast the real null value into a non-null text 'null'. This could be done using the COALESCE() function, which takes several arguments and returns the first non-null. But here the problem occurs, that the 'null' string is of another type (text) than the columns (int). So you have to equalize the types, e.g. by casting the int values into text before. So, finally your query could look like this:
Click: demo:db<>fiddle
SELECT
concat_ws('-',
COALESCE(begin::text, 'null'),
COALESCE(month::text, 'null'),
COALESCE(year::text, 'null')
)
FROM mytable
Say I have a table called list, where there are items like these (the ids are random uuids):
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 Bar
x 4 Baz
I want to maintain the property that rank column always goes from 0 to n-1 (n being the number of rows)---if a client asks to insert an item with rank = 3, then the pg server should push the current 3 and 4 to 4 and 5, respectively:
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 New Item!
x 4 Bar
x 5 Baz
My current strategy is to have a dedicated insertion function add_item(item) that scans through the table, filter out items with rank equal or greater than that of the item being inserted, and increment those ranks by one. However, I think this approach will run into all sorts of problems---like race conditions.
Is there a more standard practice or more robust approach?
Note: The rank column is completely independent of rest of the columns, and insertion is not the only operation I need to support. Think of it as the back-end of a sortable to-do list, and the user can add/delete/reorder the items on the fly.
Doing verbatim what you suggest might be difficult or not possible at all, but I can suggest a workaround. Maintain a new column ts which stores the time a record is inserted. Then, insert the current time along with rest of the record, i.e.
id rank text ts
--- ----- ----- --------------------
x 0 Hello 2017-12-01 12:34:23
x 1 World 2017-12-03 04:20:01
x 2 Foo ...
x 3 New Item! 2017-12-12 11:26:32
x 3 Bar 2017-12-10 14:05:43
x 4 Baz ...
Now we can easily generate the ordering you want via a query:
SELECT id, rank, text,
ROW_NUMBER() OVER (ORDER BY rank, ts DESC) new_rank
FROM yourTable;
This would generate 0 to 5 ranks in the above sample table. The basic idea is to just use the already existing rank column, but to let the timestamp break the tie in ordering should the same rank appear more than once.
you can wrap it up to function if you think its worth of:
t=# with u as (
update r set rank = rank + 1 where rank >= 3
)
insert into r values('x',3,'New val!')
;
INSERT 0 1
the result:
t=# select * from r;
id | rank | text
----+------+----------
x | 0 | Hello
x | 1 | World
x | 2 | Foo
x | 3 | New val!
x | 4 | Bar
x | 5 | Baz
(6 rows)
also worth of mention you might have concurrency "chasing condition" problem on highly loaded systems. the code above is just a sample
You can have a “computed rank” which is a double precision and a “displayed rank” which is an integer that is computed using the row_number window function on output.
When a row is inserted that should rank between two rows, compute the new rank as the arithmetic mean of the two ranks.
The advantage is that you don't have to update existing rows.
The down side is that you have to calculate the displayed ranks before you can insert a new row so that you know where to insert it.
This solution (like all others) are subject to race conditions.
To deal with these, you can either use table locks or serializable transactions.
The only way to prevent a race condition would be to lock the table
https://www.postgresql.org/docs/current/sql-lock.html
Of course this would slow you down if there are lots of updates and inserts.
If can somehow limit the scope of your updates then you can do a SELECT .... FOR UPDATE on that scope. For example if the records have a parent_id you can do a select for update on the parent record first and any other insert who does the same select for update would have to wait till your transaction is done.
https://www.postgresql.org/docs/current/explicit-locking.html#:~:text=5.-,Advisory%20Locks,application%20to%20use%20them%20correctly.
Read the section on advisory locks to see if you can use those in your application. They are not enforced by the system so you'll need to be careful of how you write your application.
id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?
Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.
I have the following tables:
business
id catid subcatid
---------------------
10 {1} {10,20}
20 {2} {30,40}
30 {3} {50,60,70}
cat_subcat
catid shortname parent_id bid
--------------------------------------------
1 A 10
2 B 20
3 c 30
10 x 1 10
20 y 1 10
30 z 2 20
40 w 2 20
Both the tables have a relationship using id. The problem I am getting is outlined below. Here is my query currently:
SELECT ARRAY[category_id]::int[] from cat_subcat
where parentcategoryid IS not NULL and shortname ilike ('x,y');
I want to get the category_id for an entered shortname, but my query is not giving the proper output. If I pass one shortname it will retrieve the category_id, but if I pass more than one shortname it will not display category_id. Please tell me how to get the category_id for more than one shortname passed.
To actually use pattern matching with ILIKE, you cannot use a simple IN expression. Instead, you need ILIKE ANY (...) or ALL (...), depending on whether you want the tests ORed or ANDed:
Also, your ARRAY constructor will be applied to individual rows, which seems rather pointless. I assume you want this instead (educated guess):
SELECT array_agg(catid) AS cats
FROM cat_subcat
WHERE parent_id IS NOT NULL
AND shortname ILIKE ANY ('{x,y}');
Well, as long as you don't use wildcards (%, _) for your pattern, you can translate this to:
AND lower(shortname) IN ('x','y');
But that would be rather pointless, since Postgres internally converts this to:
AND lower(shortname) = ANY ('{x,y}');
.. before evaluating.
I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.