2 Level pivot using Postgresql - postgresql

I have a table whose schema along with data (table_name : raw_data) appears to be this :
name | category | clear_date |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | GOOD | 2020-05-30 |
A | BAD | 2020-05-30 |
A | BAD | 2020-05-30 |
Now if I perform a "groupby" operation using the following statement :
SELECT name, category, date(clear_date), count(clear_date)
FROM raw_data
GROUP BY name, category, date(clear_date)
ORDER BY name
I get the following answer :
name | caetgory | date | count |
A | GOOD |2020-05-30 | 4 |
A | BAD |2020-05-30 | 1 |
A | BAD |2020-05-31 | 1 |
IN order to produce the pivot in following format :
name | category | 2020-05-30 | 2020-05-31 |
A | GOOD | 4 | NULL |
A | BAD | 1 | 1 |
I am using the following query :
select * from crosstab (
'select name, category, date(clear_date), count(clear_date) from raw_data group by name, category, date(clear_date) order by 1,2,3',
'select distinct date(clear_date) from raw_data order by 1'
)
as newtable (
node_name varchar, alarm_name varchar, "2020-05-30" integer, "2020-05-31" integer
)
ORDER BY name
But I am getting results as follows :
name | category | 2020-05-30 | 2020-05-31 |
A | BAD | 4 | 1 |
Can anyone please try to suggest how can i achieve the result mentioned above. It appears crosstab removes the duplicate entry of A automatically.

Not sure if this is possible using crosstab because you have a missing records in some dates. Here is an example how to get expected result but not sure is what you need. Anyway hope this helps.
SELECT r1.*, r2.counter AS "2020-05-30", r3.counter AS "2020-05-31"
FROM (
SELECT DISTINCT name, category
FROM raw_data
) AS r1
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-30'
GROUP BY name, category
) AS r2 ON (r2.category = r1.category AND r2.name = r1.name)
LEFT JOIN (
SELECT name, category, count(*) AS counter
FROM raw_data
WHERE clear_date = '2020-05-31'
GROUP BY name, category
) AS r3 ON (r3.category = r1.category AND r3.name = r1.name)
ORDER BY r1.category DESC;

Related

Postgresql - How to use string_to_array on another column value

How can I use string_to_array or split_part on another column value.
I want do something like select * from tenants where id IN (select string_to_array(select ancestry from tenants where id = 39,'/'));
-[ RECORD 1 ]-------------+----------------------
id | 1
domain |
subdomain |
name | My Company
login_text |
logo_file_name |
logo_content_type |
logo_file_size |
logo_updated_at |
login_logo_file_name |
login_logo_content_type |
login_logo_file_size |
login_logo_updated_at |
ancestry |
divisible | t
description | Tenant for My Company
use_config_for_attributes | t
default_miq_group_id | 1
source_type |
source_id |
-[ RECORD 3 ]-------------+----------------------
id | 35
domain |
subdomain |
name | Tenant_2
login_text |
logo_file_name |
logo_content_type |
logo_file_size |
logo_updated_at |
login_logo_file_name |
login_logo_content_type |
login_logo_file_size |
login_logo_updated_at |
ancestry | 1
divisible | t
description | Tenant_2
use_config_for_attributes | f
default_miq_group_id | 36
source_type |
source_id |
-[ RECORD 7 ]-------------+----------------------
id | 39
domain |
subdomain |
name | Child_Teanant_202
login_text |
logo_file_name |
logo_content_type |
logo_file_size |
logo_updated_at |
login_logo_file_name |
login_logo_content_type |
login_logo_file_size |
login_logo_updated_at |
ancestry | 1/35
divisible | t
description | Child_Teanant_202
use_config_for_attributes | f
default_miq_group_id | 52
source_type |
source_id |
Use regex to enforce word boundaries:
select *
from tenants
where (select ancestry from tenants where id = 39)
~ ('\y' || id || '\y')
See live demo.
Without the word boundaries an id of 1 would match an ancestry of 123.
Note Postgres's unusual regex for word boundary \y, which elsewhere is \b.
There are two ways to solve this.
One is to simply unnest the elements of ancestry
select *
from tenants
where id in (select a.id::int
from tenants t2
cross join unnest(string_to_array(t2.ancestry, '/')) as a(id)
where t2.id = 39);
Converting the string to an array in order to be able to use the = ANY() operator is a bit tricky, because you need two levels of parentheses plus a type cast to an integer array to make that work:
select *
from tenants
where id = any ((select string_to_array(t2.ancestry, '/')
from tenants t2
where t2.id = 39)::int[]);
Online example

Postgresql use more than one row as expression in sub query

As the title says, I need to create a query where I SELECT all items from one table and use those items as expressions in another query. Suppose I have the main table that looks like this:
main_table
-------------------------------------
id | name | location | //more columns
---|------|----------|---------------
1 | me | pluto | //
2 | them | mercury | //
3 | we | jupiter | //
And the sub query table looks like this:
some_table
---------------
id | item
---|-----------
1 | sub-col-1
2 | sub-col-2
3 | sub-col-3
where each item in some_table has a price which is in an amount_table like so:
amount_table
--------------
1 | 1000
2 | 2000
3 | 3000
So that the query returns results like this:
name | location | sub-col-1 | sub-col-2 | sub-col-3 |
----------------------------------------------------|
me | pluto | 1000 | | |
them | mercury | | 2000 | |
we | jupiter | | | 3000 |
My query currently looks like this
SELECT name, location, (SELECT item FROM some_table)
FROM main_table
INNER JOIN amount_table WHERE //match the id's
But I'm running into the error more than one row returned by a subquery used as an expression
How can I formulate this query to return the desired results?
you should decide on expected result.
to get one-tp-many relation:
SELECT name, location, some_table.item
FROM main_table
JOIN some_table on true -- or id if they match
INNER JOIN amount_table --WHERE match the id's
to get one-to-one with all rows:
SELECT name, location, (SELECT array_agg(item) FROM some_table)
FROM main_table
INNER JOIN amount_table --WHERE //match the id's

postgres select latest record per entity except on field value

I have a log table that looks something like this:
---------------------------------------------
| id | company | type | date_created |notes|
---------------------------------------------
| 1 | co1 | | 2016-06-30 | ... |
| 2 | co2 | ERROR | 2016-06-30 | ... |
| 3 | co1 | | 2016-06-29 | ... |
| 4 | co2 | | 2016-06-29 | ... |
I have the following which selects the latest record per entity:
SELECT *
FROM import_data_log a
JOIN (SELECT company, max(date_created) date_created
FROM import_data_log
GROUP BY company) b
ON a.company = b.company AND a.date_created = b.date_created
which gives the result:
| 1 | co1 | | 2016-06-30 | ... |
| 2 | co2 | ERROR | 2016-06-30 | ... |
I need to add a condition that does not select the entry with type = ERROR and get the next highest date for that company, so it would give:
| 1 | co1 | | 2016-06-30 | ... |
| 4 | co2 | | 2016-06-29 | ... |
Any ideas? It's probably something simple but for the life of me I can't seem to get it to work.
UPDATE / FIX:
Ok so after a lot of hair pulling, for anyone running into this issue, apparently Postgres doesn't compare null fields with anything, so it completely ignores all rows with type = null.
The way I fixed it is this, there is probably a nicer solution out there but for now this works:
SELECT *
FROM import_data_log a
JOIN (SELECT company, max(date_created) date_created
FROM import_data_log
WHERE (type <> 'ERROR' OR type is null)
GROUP BY company) b
ON a.company = b.company AND a.date_created = b.date_created
Use the below Query
SELECT id,company,type,max(date_created),notes
FROM import_data_log
WHERE type != 'ERROR'
GROUP BY company
select distinct on (company) *
from import_data_log
where type is distinct from 'ERROR'
order by company, date_created desc
Check distinct on and is [not] distinct from:
Ordinary comparison operators yield null (signifying "unknown"), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL. When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM constructs

Updating multiple rows with a certain value from the same table

So, I have the next table:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | null |
12:10:00| update | null |
12:15:00| insert | null |
12:20:00| out | null |
12:30:00| access | 2 |
12:35:00| select | null |
The table is bigger (aprox 1-1,5 mil rows) and there will be ID equal to 2,3,4 etc and rows between.
The following should be the result:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | 1 |
12:10:00| update | 1 |
12:15:00| insert | 1 |
12:20:00| out | 1 |
12:30:00| access | 2 |
12:35:00| select | 2 |
What is the most simple method to update the rows without making the log full? Like, one ID at a time.
You can do it with a sub query:
UPDATE YourTable t
SET t.ID = (SELECT TOP 1 s.ID
FROM YourTable s
WHERE s.time < t.time AND s.name = 'access'
ORDER BY s.time DESC)
WHERE t.name <> 'access'
Index on (ID,time,name) will help.
You can do it using CTE as below:
;WITH myCTE
AS ( SELECT time
, name
, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY time ) AS [rank]
, ID
FROM YourTable
)
UPDATE myCTE
SET myCTE.ID = myCTE.rank
SELECT *
FROM YourTable ORDER BY ID

PostgreSQL: Combine Count and DISTINCT ON

Given this table
| id | name | created_at |
| 1 | test | 2015-02-24 11:13:28.605968 |
| 2 | other | 2015-02-24 13:04:56.968004 |
| 3 | test | 2015-02-24 11:14:24.670765 |
| 4 | test | 2015-02-24 11:15:05.293904 |
And this query which returns only the rows id 2 and id 4.
SELECT DISTINCT ON (documents.name) documents.*
FROM "documents"
ORDER BY documents.name, documents.created_at DESC
How can i return the number of rows affected? Something like
SELECT COUNT(DISTINCT ON (documents.name) documents.*) FROM "documents"
You can use an outer query:
SELECT COUNT(1)
FROM (
SELECT DISTINCT ON (name) *
FROM documents
ORDER BY name, created_at DESC
) alias