How to index subgroups in table in Postgres? - postgresql

Supposing I have a table like this:
select country_id, city_id, person_id from mytable
country_id,city_id,person_id
123,45,100334
123,45,3460456
123,45,943875
123,121,4362
123,121,124747
146,87,3457320
146,89,3495879
146,89,34703924
I want to index the subgroups of country_id and city_id to get such result:
select country_id, city_id, person_id, ???, ??? from mytable
country_id,city_id,person_id,country_num,city_num
123,45,100334,1,1
123,45,3460456,1,1
123,45,943875,1,1
123,121,4362,1,2
123,121,124747,1,2
146,87,3457320,2,1
146,89,3495879,2,2
146,89,34703924,2,2
In other words, I want to numerate all countries in the sequence with integer numbers from 1, and also I want to mark cities the same way within each country separately. Is there an elegant way to do it in Postgres?

demo:db<>fiddle
Use dense_rank() window function:
SELECT
*,
dense_rank() OVER (ORDER BY country_id),
dense_rank() OVER (PARTITION BY country_id ORDER BY city_id)
FROM
mytable
Further reading

Related

How to get MIN of a column based of another column without sub query?

I want to get the min(id) from a table based on the date inserted, as not always the min(id) will be the same as the id of the min(date_inserted). I can do it with subquery but I need to do it without the subquery as it is part of a bigger SQL.
This is what I got so far:
create table tbl(
id int ,
userid int,
dt timestamp)
SELECT id
FROM (
-- if I could do min(id order by dt) that would solve the issue
select id, row_number() over(partition by userid order by dt) row_number
from table
where userid = :userid
) withs
WHERE row_number=1

PostgreSQL - How to count when Distinct On

How to get count of rows for each user_id
select distinct on (user_id) *
from some_table
As in such SQL:
select user_id, count(*)
from some_table
group by user_id
Try this:
SELECT DISTINCT ON (a.user_id)
a.*
FROM
(
SELECT user_id
, count(*) OVER(PARTITION BY user_id)
FROM some_table
) a
If you want to be able to use SELECT * in order to get a "sample row", depending on how large your table is you may be able to use a correlated subquery to get the count of rows for that particular user id:
select distinct on (user_id) *
, (select count (1)
from some_table st2
where st2.user_id = some_table.user_id) as user_row_count
from some_table

T-SQL - Pivot/Crosstab - variable number of values

I have a simple data set that looks like this:
Name Code
A A-One
A A-Two
B B-One
C C-One
C C-Two
C C-Three
I want to output it so it looks like this:
Name Code1 Code2 Code3 Code4 Code...n ...
A A-One A-Two
B B-One
C C-One C-Two C-Three
For each of the 'Name' values, there can be an undetermined number of 'Code' values.
I have been looking at various examples of Pivot SQL [including simple Pivot sql and sql using the XML function?] but I have not been able to figure this out - or to understand if it is even possible.
I would appreciate any help or pointers.
Thanks!
Try it like this:
DECLARE #tbl TABLE([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO #tbl VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
SELECT p.*
FROM
(
SELECT *
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM #tbl
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (Code1,Code2,Code3,Code4,Code5 /*add as many as you need*/)
)p;
This line
,CONCAT('Code',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
will use a partitioned ROW_NUMBER in order to create numbered column names per code. The rest is simple PIVOT...
UPDATE: A dynamic approach to reflect the max amount of codes per group
CREATE TABLE TblTest([Name] VARCHAR(100),Code VARCHAR(100));
INSERT INTO TblTest VALUES
('A','A-One')
,('A','A-Two')
,('B','B-One')
,('C','C-One')
,('C','C-Two')
,('C','C-Three');
DECLARE #cols VARCHAR(MAX);
WITH GetMaxCount(mc) AS(SELECT TOP 1 COUNT([Code]) FROM TblTest GROUP BY [Name] ORDER BY COUNT([Code]) DESC)
SELECT #cols=STUFF(
(
SELECT CONCAT(',Code',Nmbr)
FROM
(SELECT TOP((SELECT mc FROM GetMaxCount)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) t(Nmbr)
FOR XML PATH('')
),1,1,'');
DECLARE #sql VARCHAR(MAX)=
'SELECT p.*
FROM
(
SELECT *
,CONCAT(''Code'',ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Code)) AS ColumnName
FROM TblTest
)t
PIVOT
(
MAX(Code) FOR ColumnName IN (' + #cols + ')
)p;';
EXEC(#sql);
GO
DROP TABLE TblTest;
As you can see, the only part which will change in order to reflect the actual amount of columns is the list in PIVOTs IN() clause.
You can create a string, which looks like Code1,Code2,Code3,...CodeN and build the statement dynamically. This can be triggered with EXEC().
I'd prefer the first approach. Dynamically created SQL is very mighty, but can be a pain in the neck too...

Get second last row for every record in postgresql query

I had table lets say table_inventory. On the table_inventory i put a trigger for every update of stock insert new row in audit_inventory table:
table column are look like:
table_inventory
|sr_id|inventory_id|p_name|stock|
audit_inventory
|insert_time||sr_id|inventory_id|p_name|stock|
Now my problem is for every inventory_id of table_inventory there are multiple entry in audit_inventory as i put trigger for every update of stock insert a row with time in audit_inventory, so i want to select second last stock value for every inventory_id of table_inventory. I write some cte to do that but unable to get for every inventory_id.
WITH CTE as
(select inventory_id,stock from table_inventory),
cte_1 as(
SELECT
stock,
row_number() over (order by insert_time desc) rn
FROM audit_inventory where inventoryid in (select inventory_id from cte)
),cte_2 as(
SELECT stock
FROM CTE
WHERE rn = 2)
select * from cte,cte_1;
The above query retrns the second last value for single inventory_id but did not understand how to write query for getting second last row value for every inventory_id of table_inventory.
Thanks for your precious time.
Try doing this. I guess this is what you want:
WITH CTE as
( SELECT
stock,
inventory_id,
row_number() over (PARTITION BY inventoryid order by insert_time desc) rn
FROM audit.inventory
)
SELECT
CTE.stock,
ti.inventory_id,
ti.stock
FROM
table_inventory ti
inner join CTE on CTE.inventory_id=ti.inventory_id
WHERE
CTE.rn=2

SQL Server SUM() for DISTINCT records

I have a field called "Users", and I want to run SUM() on that field that returns the sum of all DISTINCT records. I thought that this would work:
SELECT SUM(DISTINCT table_name.users)
FROM table_name
But it's not selecting DISTINCT records, it's just running as if I had run SUM(table_name.users).
What would I have to do to add only the distinct records from this field?
Use count()
SELECT count(DISTINCT table_name.users)
FROM table_name
SQLFiddle demo
This code seems to indicate sum(distinct ) and sum() return different values.
with t as (
select 1 as a
union all
select '1'
union all
select '2'
union all
select '4'
)
select sum(distinct a) as DistinctSum, sum(a) as allSum, count(distinct a) as distinctCount, count(a) as allCount from t
Do you actually have non-distinct values?
select count(1), users
from table_name
group by users
having count(1) > 1
If not, the sums will be identical.
You can see for yourself that distinct works with the following example. Here I create a subquery with duplicate values, then I do a sum distinct on those values.
select DistinctSum=sum(distinct x), RegularSum=Sum(x)
from
(
select x=1
union All
select 1
union All
select 2
union All
select 2
) x
You can see that the distinct sum column returns 3 and the regular sum returns 6 in this example.
You can use a sub-query:
select sum(users)
from (select distinct users from table_name);
SUM(DISTINCTROW table_name.something)
It worked for me (innodb).
Description - "DISTINCTROW omits data based on entire duplicate records, not just duplicate fields." http://office.microsoft.com/en-001/access-help/all-distinct-distinctrow-top-predicates-HA001231351.aspx
;WITH cte
as
(
SELECT table_name.users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM table_name
)
SELECT SUM(users)
FROM cte
WHERE rn = 1
SQL Fiddle
Try here yourself
TEST
DECLARE #table_name Table (Users INT );
INSERT INTO #table_name Values (1),(1),(1),(3),(3),(5),(5);
;WITH cte
as
(
SELECT users , rn = ROW_NUMBER() OVER (PARTITION BY users ORDER BY users)
FROM #table_name
)
SELECT SUM(users) DisSum
FROM cte
WHERE rn = 1
Result
DisSum
9
If circumstances make it difficult to weave a "distinct" into the sum clause, it will usually be possible to add an extra "where" clause to the entire query - something like:
select sum(t.ColToSum)
from SomeTable t
where (select count(*) from SomeTable t1 where t1.ColToSum = t.ColToSum and t1.ID < t.ID) = 0
May be a duplicate to
Trying to sum distinct values SQL
As per Declan_K's answer:
Get the distinct list first...
SELECT SUM(SQ.COST)
FROM
(SELECT DISTINCT [Tracking #] as TRACK,[Ship Cost] as COST FROM YourTable) SQ