Loop over results from a query to use in another query - postgresql

I have a query which returns a specific data set
select user_id, county from users
Primary key is on user_id, county
Lets say this returns the following:
1 "HARRIS NORTH"
1 "HANOVER"
3 "MARICOPA"
4 "ADAMS"
5 "CUMBERLAND"
Next, I want to run a different set of query for all the records I obtain from the above query.
COPY( WITH myconstants (_id, _county) as (
values (1, 'HARRIS NORTH')
)
SELECT d.* FROM data d, myconstants
where user_id = _id and county = _county)
TO '/tmp/filename.csv' (format CSV);
How can I do this in a loop for all the records from my 1st query using postgres only?
A psuedocode of what I want to achieve:
for (a_id, a_county) in (select user_id, county from users):
COPY( WITH myconstants (_id, _county) as (
values (a_id, a_county))
SELECT d.* FROM data d, myconstants
where user_id = _id and county = _county)
TO '/tmp/filename.csv' (format CSV);

There is no need to loop in SQL:
COPY (SELECT d.*
FROM data d
JOIN users
ON d.user_id = users.user_id AND d.country = users.country)
TO '/tmp/filename.csv' (format CSV);

Related

Postgresql - select query with aggregated decisions column as json

I have table which contains specified columns:
id - bigint
decision - varchar(80)
type - varchar(258)
I want to make a select query which in result returns something like this(id, decisionsValues with counts as json, type):
id decisions type
1 {"firstDecisionsValue":countOfThisValue, "secondDecisionsValue": countOfThisValue} entryType
I heard that I can try play with json_agg but it does not allow COUNT method, tried to use json_agg with query:
SELECT ac.id,
json_agg(ac.decision),
ac.type
FROM myTable ac
GROUP BY ac.id, ac.type;
but ends with this(for entry with id 1 there are two occurences of firstDecisionsValue, one occurence of secondDecisionsValue):
id decisions type
1 {"firstDecisionsValue", "firstDecisionsValue", "secondDecisionsValue"} entryType
minimal reproducible example
CREATE TABLE myTable
(
id bigint,
decisions varchar(80),
type varchar(258)
);
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'firstDecisionsValue', 'myType');
INSERT INTO myTable
VALUES (1, 'secondDecisionsValue', 'myType');
Can you provide me any tips how to make it as expected?
1, {"fistDecisionsValue":2, "secondDecisionsValue":1}, entryType
You can try this
SELECT a.id, jsonb_object_agg(a.decisions, a.count), a.type
FROM
( SELECT id, type, decisions, count(*) AS count
FROM myTable
GROUP BY id, type, decisions
) AS a
GROUP BY a.id, a.type
see the result in dbfiddle.
First, you should calculate the count of id, type, decisions for each decisions after that, you should use jsonb_object_agg to create JSON.
Demo
with data as (
select
ac.id,
ac.type,
ac.decisions,
count(*)
from
myTable ac
group by
ac.id,
ac.type,
ac.decisions
)
select
d.id,
d.type,
json_object_agg(d.decisions, d.count)
from
data d
group by
d.id,
d.type

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Postgresql recursive CTE results ordering

I'm working on a query to pull data out of a hierarchy
e.g.
CREATE table org (
id INT PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT);
INSERT INTO org (id, name) VALUES (0, 'top');
INSERT INTO org (id, name, parent_id) VALUES (1, 'middle1', 0);
INSERT INTO org (id, name, parent_id) VALUES (2, 'middle2', 0);
INSERT INTO org (id, name, parent_id) VALUES (3, 'bottom3', 1);
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org;
It works as expected.
3 1 "bottom3"
1 0 "middle1"
0 "top"
It's also returning the data in the order that I expect, and it makes sense to me that it would do this because of the way that the results would be discovered.
The question is, can I count on the order being like this?
Yes, there is a defined order. In the Postgres WITH doc, they give the following example:
WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
SELECT g.id, g.link, g.data, 1,
ARRAY[ROW(g.f1, g.f2)],
false
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
path || ROW(g.f1, g.f2),
ROW(g.f1, g.f2) = ANY(path)
FROM graph g, search_graph sg
WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
About which they say in a Tip box (formatting mine):
The recursive query evaluation algorithm produces its output in
breadth-first search order. You can display the results in depth-first
search order by making the outer query ORDER BY a "path" column
constructed in this way.
You do appear to be getting breadth-first output in your case above based on the INSERT statements, so I would say you could, if you wanted, modify your outer SELECT to order it in another fashion.
I believe the analog for depth-first in your case would probably be this:
WITH RECURSIVE parent_org (id, parent_id, name) AS (
SELECT id, parent_id, name
FROM org
WHERE id = 3
UNION ALL
SELECT o.id, o.parent_id, o.name
FROM org o, parent_org po
WHERE po.parent_id = o.id)
SELECT id, parent_id, name
FROM parent_org
ORDER BY id;
As I would expect (running things through in my head) that to yield this:
0 "top"
1 0 "middle1"
3 1 "bottom3"

Ranking a record based on sort order of multiple related records in T-SQL?

I have two tables, plus a matching table. For argument's sake, let's call them Recipes and Ingredients. Each Recipe should have at least one Ingredient, but may have many. Each Ingredient can be used in many Recipes.
Recipes Ingredients Match
=============== =============== ===============
ID int ID int RecipeID int
Name varchar Name varchar IngredientID int
Sample data:
Recipes Ingredients Match (shown as CDL but stored as above)
=============== =============== ===============
Soup Chicken Soup: Chicken, Tomatoes
Pizza Tomatoes Pizza: Cheese, Chicken, Tomatoes
Chicken Sandwich Cheese C. Sandwich: Bread, Chicken, Tomatoes
Turkey Sandwich Bread T. Sandwich: Bread, Cheese, Tomatoes, Turkey
Turkey
Here's the problem: I need to sort the Recipes based on the name(s) of their Ingredients. Given the above sample data, I would need this sort order for recipes:
Turkey Sandwich (First ingredient bread, then cheese)
Chicken Sandwich (First ingredient bread, then chicken)
Pizza (First ingredient cheese)
Soup (First ingredient chicken)
Ranking the recipes by the first ingredient is straightforward:
WITH recipesranked AS (
SELECT Recipes.ID, Recipes.Name, Recipes.Description,
ROW_NUMBER() OVER (ORDER BY Ingredients.Name) AS SortOrder
FROM
Recipes
LEFT JOIN Match ON Match.RecipeID = Recipes.ID
LEFT JOIN Ingredients ON Ingredients.ID = Match.IngredientID
)
SELECT ID, Name, Description, MIN(SortOrder)
FROM recipesranked
GROUP BY ID, Name, Description;
Beyond that, I'm stuck. In my example above, this almost works, but leaves the two sandwiches in an ambiguous order.
I have a feeling that the MIN(SortOrder) should be replaced by something else, maybe a correlated subquery looking for the non-existence of another record in the same CTE, but haven't figured out the details.
Any ideas?
(It is possible for a Recipe to have no ingredients. I don't care what order they come out in, but the end would be ideal. Not my main concern at this point.)
I'm using SQL Server 2008 R2.
Update: I added an SQL Fiddle for this and updated the example here to match:
http://sqlfiddle.com/#!3/38258/2
Update: I have a sneaking suspicion that if there is a solution, it involves a cross-join to compare every combination of Recipe/Ingredient against every other, then filtering that somehow.
I think this will give you what you want (based on your supplied Fiddle)
-- Show recipes ranked by all their ingredients alphabetically
WITH recipesranked AS (
SELECT Recipes.ID, Recipes.Name, SortedIngredients.SortOrder
FROM
Recipes
LEFT JOIN Match ON Match.RecipeID = Recipes.ID
LEFT JOIN
(
SELECT ID, Name, POWER(2.0, ROW_NUMBER() OVER (ORDER BY Name Desc)) As SortOrder
FROM Ingredients) AS SortedIngredients
ON SortedIngredients.ID = Match.IngredientID
)
SELECT ID, Name, SUM(SortOrder)
FROM recipesranked
GROUP BY ID, Name
-- Sort by sum of the ingredients. Since the first ingredient for both kinds
-- of sandwiches is Bread, this gives both of them the same sort order, but
-- we need Turkey Sandwiches to come out first between them because Cheese
-- is it's #2 sorted ingredient, but Chicken is the #2 ingredient for
-- Chicken sandwiches.
ORDER BY SUM(SortOrder) DESC;
It just uses POWER to ensure that the most significant ingredients get weighted first.
This will work for any number of recipes and up to 120 ingredients (in total)
Will not work if recipes contain duplicate ingredients, though you could filter those out if they could occur
Binary Flag version:
;with IngredientFlag( IngredientId, Flag )
as
(
select
i.id Ingredient
, POWER( 2, row_number() over ( order by i.Name desc ) - 1 )
from
Ingredients i
)
, RecipeRank( RecipeId, Rank )
as
(
select
m.RecipeID
, row_number() /* or rank() */ over ( order by SUM( flag.Flag ) desc )
from
Match m
inner join IngredientFlag flag
on m.IngredientID = flag.IngredientId
group by
m.RecipeID
)
select
RecipeId
, Name
, Rank
from
RecipeRank rr
inner join Recipes r
on rr.RecipeId = r.id
Str Concat version:
-- order the ingredients per recipe
;with RecipeIngredientOrdinal( RecipeId, IngredientId, Name, Ordinal )
as
(
select
m.RecipeID
, m.IngredientID
, i.Name
, Row_Number() over ( partition by m.RecipeId order by i.Name ) Ordinal
from
Match m
inner join Ingredients i
on m.IngredientID = i.id
)
-- get ingredient count per recipe
, RecipeIngredientCount( RecipeId, IngredientCount )
as
(
select
m.RecipeID
, count(1)
from
Match m
group by
m.RecipeID
)
-- recursively build concatenated ingredient list per recipe
-- (note this will return incomplete lists which is why I include
-- 'generational' in the name)
, GenerationalConcatenatedIngredientList( RecipeId, Ingredients, IngredientCount )
as
(
select
rio.RecipeID
, cast( rio.Name as varchar(max) )
, rio.Ordinal
from
RecipeIngredientOrdinal rio
where
rio.Ordinal = 1
union all
select
rio.RecipeID
, cil.Ingredients + rio.Name
, rio.Ordinal
from
RecipeIngredientOrdinal rio
inner join GenerationalConcatenatedIngredientList cil
on rio.RecipeID = cil.RecipeId and rio.Ordinal = cil.IngredientCount + 1
)
-- return row_number or rank ordered by the concatenated ingredients list
-- (don't need to return Ingredients but shown for demonstrative purposes)
, RecipeRankByIngredients( RecipeId, Rank, Ingredients )
as
(
select
cil.RecipeId
, row_number() over ( order by cil.Ingredients ) -- or rank()
, cil.Ingredients
from
GenerationalConcatenatedIngredientList cil
inner join RecipeIngredientCount ric
on cil.RecipeId = ric.RecipeId
-- don't forget to filter for only the completed ingredient lists
-- and ignore all intermediate values
and cil.IngredientCount = ric.IngredientCount
)
select * from RecipeRankByIngredients
This should get you what you need:
WITH recipesranked AS (
SELECT Recipes.ID, Recipes.Name, ROW_NUMBER() OVER (ORDER BY Ingredients.Name) AS SortOrder,
Rank () OVER (partition by Recipes.Name ORDER BY Ingredients.Name) as RankOrder
FROM
Recipes
LEFT JOIN Match ON Match.RecipeID = Recipes.ID
LEFT JOIN Ingredients ON Ingredients.ID = Match.IngredientID
)
SELECT ID, Name,SortOrder, RankOrder
FROM recipesranked
Where RankOrder = 1
ORDER BY SortOrder;
The only alternative way I can think of to do it, is to use dynamic sql to generate a pivot
This doesn't have the limitation on the number of ingredients that my alternative has, but doesn't exactly feel elegant!
DECLARE #MaxIngredients INT
SELECT #MaxIngredients = MAX(IngredientCount)
FROM
(
SELECT COUNT(*) AS IngredientCount
FROM Match
GROUP BY RecipeID
) A
DECLARE #COLUMNS nvarchar(max)
SELECT #COLUMNS = N'[1]'
DECLARE #COLUMN INT
SELECT #COLUMN = 2
WHILE (#COLUMN <= #MaxIngredients)
BEGIN
SELECT #COLUMNS = #COLUMNS + N',[' + CAST(#COLUMN AS varchar(19)) + N']', #COLUMN = #COLUMN + 1
END
DECLARE #SQL nvarchar(max)
SELECT #SQL =
N'WITH recipesranked as(
SELECT *
FROM
(
SELECT M.RecipeID,
ROW_NUMBER() OVER (PARTITION BY M.RecipeID ORDER BY I.SortOrder) AS IngredientIndex,
I.SortOrder
FROM Match M
LEFT
JOIN
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Name) As SortOrder
FROM Ingredients
) I
ON I.ID = M.IngredientID
) AS SourceTable
PIVOT
(
MIN(SortOrder) --min here is just for the syntax, there will only be one value
FOR IngredientIndex IN (' + #COLUMNS + N')
) AS PivotTable)
SELECT R.Name
FROM RecipesRanked RR
JOIN Recipes R
ON RR.RecipeID = R.ID
ORDER BY ' + #COLUMNS
EXEC SP_EXECUTESQL #SQL
Create a function and use that.
CREATE FUNCTION GetIngredients(#RecipeName varchar(200))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Ingredients VARCHAR(MAX)
SET #Ingredients=NULL
SELECT TOP 9999999
#Ingredients = COALESCE(#Ingredients + ', ', '') + Ingredients.Name
FROM Recipes
LEFT JOIN Match ON Match.RecipeID = Recipes.ID
LEFT JOIN Ingredients ON Ingredients.ID = Match.IngredientID
WHERE Recipes.Name=#RecipeName
ORDER BY Ingredients.Name ASC
return #Ingredients
END
GO
SELECT
Recipes.Name AS RecipeName, dbo.GetIngredients(Recipes.Name) [Ingredients]
FROM Recipes
ORDER BY [Ingredients]

Postgresql Update Based on count, min and group by

Thank you for taking the time to look at my question.
I've seen similar questions, but not the same depth. Please help!
I would like to update a column all rows in a table that holds user_id and date_created with the lowest date_created for the user_id.
The following select gives me all the rows I would like to update:
select user_id, min(date_created) from mytable s1 where
(select count(1) from mytable s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id order by user_id;
I would have expected this update to work:
update mytable set join_status = 1 where date_created =
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
But is gave the following error:
ERROR: more than one row returned by a subquery used as an expression
I've tried a few different solutions, but nothing seems to help.
Does anyone have any ideas fro me?
Thanks again.
Change your SQL to:
update mytable set join_status = 1 where date_created IN
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
Read more on row comparison in the docs.
EDIT:
In the subquery you're performing GROUP BY user_id. This means that you will receive many rows, based on the number of unique user_id values in your simplepay_payment table.
To make your query working as expected, you should join using 2 columns: user_id and date_created. As you've mentioned, you already have the query that gives you the correct results, so you can use it like this:
WITH desired AS (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id)
UPDATE mytable m SET join_status = 1 FROM desired d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;
I've wrapped in your query into the Common Table Expression and used it in the UPDATE statement. You can add RETURNING m.* at the end of the query to see the rows that had been updated and their new values.
You can test this query on SQL Fiddle.
EDIT2:
Common Table Expressions (WITH-queries) are not available before version 9.1 for UPDATE statements. You can simply move the CTE subquery into the update, like this:
UPDATE mytable m SET join_status = 1 FROM (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id) d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;