Postgres id to name mapping in an array while creating CSV file - postgresql

I have a table with id to group name mapping.
1. GroupA
2. GroupB
3. GroupC
.
.
.
15 GroupO
And I have user table with userId to group ID mapping, group ID is defined as array in user table
User1 {1,5,7}
User2 {2,5,9}
User3 {3,5,11,15}
.
.
.
I want to combine to table in such a way to retrieve userID and groupName mapping in CSV file.
for example: User1 {GroupA, GroupE, GroupG}
Essentially group ID should get replace by group name while creating CSV file.

Setup:
create table mapping(id int, group_name text);
insert into mapping
select i, format('Group%s', chr(i+ 64))
from generate_series(1, 15) i;
create table users (user_name text, user_ids int[]);
insert into users values
('User1', '{1,5,7}'),
('User2', '{2,5,9}'),
('User3', '{3,5,11,15}');
Step by step (to understand the query, see SqlFiddle):
Use unnest() to list all single user_id in a row:
select user_name, unnest(user_ids) user_id
from users
Replace user_id with group_name by joining to mapping:
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
Aggregate group_name into array for user_name:
select user_name, array_agg(group_name)
from (
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
) m
group by 1
Use the last query in copy command:
copy (
select user_name, array_agg(group_name)
from (
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
) m
group by 1
)
to 'c:/data/example.txt' (format csv)

Say you have two tables in this form:
Table groups
Column | Type
-----------+---------
groupname | text
groupid | integer
Table users
Column | Type
----------+----------
username | text
groupids | integer[] <-- group ids as inserted in table groups
You can query the users replacing the group id with group names with this code:
WITH users_subquery AS (select username,unnest (groupids) AS groupid FROM users)
SELECT username,array_agg(groupname) AS groups
FROM users_subquery JOIN groups ON users_subquery.groupid = groups.groupid
GROUP BY username
If you need the groups as string (useful for the csv export), surround the query with a array_to_string statement:
SELECT username, array_to_string(groups,',') FROM
(
WITH users_subquery AS (select username,unnest (groupids) AS groupid FROM users)
SELECT username,array_agg(groupname) AS groups
FROM users_subquery JOIN groups ON users_subquery.groupid = groups.groupid
GROUP BY username
) as foo;
Result:
username | groups
----------+-----------------
user1 | group1,group2
user2 | group2,group3

Related

Loop over results from a query to use in another query

I have a query which returns a specific data set
select user_id, county from users
Primary key is on user_id, county
Lets say this returns the following:
1 "HARRIS NORTH"
1 "HANOVER"
3 "MARICOPA"
4 "ADAMS"
5 "CUMBERLAND"
Next, I want to run a different set of query for all the records I obtain from the above query.
COPY( WITH myconstants (_id, _county) as (
values (1, 'HARRIS NORTH')
)
SELECT d.* FROM data d, myconstants
where user_id = _id and county = _county)
TO '/tmp/filename.csv' (format CSV);
How can I do this in a loop for all the records from my 1st query using postgres only?
A psuedocode of what I want to achieve:
for (a_id, a_county) in (select user_id, county from users):
COPY( WITH myconstants (_id, _county) as (
values (a_id, a_county))
SELECT d.* FROM data d, myconstants
where user_id = _id and county = _county)
TO '/tmp/filename.csv' (format CSV);
There is no need to loop in SQL:
COPY (SELECT d.*
FROM data d
JOIN users
ON d.user_id = users.user_id AND d.country = users.country)
TO '/tmp/filename.csv' (format CSV);

Join two postgresql queries

I have the following query
SELECT role_uuid FROM users WHERE email = 'email#domain.com'
I also have a roles table the following fields:
uuid
name
created_at
I'm hoping to have 1 query that gives lets me select the role by email and get the name and created_at field from the roles table.
I've tried things like this but I can't quite figure it out.
SELECT *
FROM ( SELECT * FROM users WHERE email = 'email#domain.com') AS A
JOIN ( SELECT * FROM roles WHERE uuid = A.role_uuid) AS B
WHERE A.role_uuid = B.uuid
You JOIN the two tables which gives you a table with all the fields from both source tables. Then you use WHERE to filter and SELECT to specify the fields that you want to be returned.
SELECT r.name, r.created_at
FROM users u JOIN roles r ON (u.role_uuid = r.uuid)
WHERE u.email = 'email#domain.com'
If you run into naming conflicts because of fields from both tables sharing the same name you can use AS to define fieldnames for the output columns:
SELECT r.name AS rolename, u.name AS username, r.created_at
FROM users u JOIN roles r ON (u.role_uuid = r.uuid)
WHERE u.email = 'email#domain.com'

UPDATE statement using two arrays at the same index in WHERE clause

I am trying to update a table, entities with a column, contacts that is an array of ids from another table, contacts. The contacts table has the columns first_name and last_name, and I have an array of first names, firstNames and last names, lastNames to pass in.
How would you update the contacts column in the entities table with one query that properly gets all of the contacts with first name firstNames[0] AND last name lastNames[0], and all of the contacts with first name firstNames[1] AND last name lastNames[1], and [...] all of the contacts with first name firstNames[n] AND last name lastNames[n]?
My initial thought was something like UPDATE entities SET contacts = (SELECT id FROM contacts WHERE first_name = ANY(firstNames) AND last_name = ANY(lastNames).
The problem with this arrises when the contacts table is like this:
first_name | last_name
----------------------
Bob | Jones
Bob | Miller
David | Miller
If I wanted to set the contacts column to the Ids for Bob Jones and David Miller, but NOT Bob Miller, and I passed in ['Bob', 'David'] for firstNames and ['Jones', 'Miller'] for lastNames in the above query, Bob Miller would also get added to the contacts column.
May be you look for something like this:
WITH x AS (
SELECT 'Bob'::text AS firstName, 'Jones'::text AS lastName
UNION SELECT 'David', 'Miller'
UNION SELECT 'Bob', 'Miller'
)
SELECT *
FROM x
WHERE (firstName, lastName) = ANY (ARRAY [
('Bob'::text, 'Jones'::text),
('David'::text, 'Miller'::text)
]);
Yet another way:
WITH x AS (
SELECT 'Bob'::text AS firstName, 'Jones'::text AS lastName
UNION SELECT 'David', 'Miller'
UNION SELECT 'Bob', 'Miller'
)
SELECT *
FROM x
WHERE EXISTS (
SELECT 1
FROM (SELECT ARRAY [
['Bob', 'Jones'],
['David', 'Miller']]::text[][] AS n
) AS n
JOIN LATERAL generate_series(1, array_upper(n, 1)) AS i ON true
WHERE firstName = n[i][1]
AND lastName = n[i][2]
);

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Fetch matching results from the integer array satisfying the condition which is given as text

I've an array of integer data stored in a particular field in the user table. This array represents the groups in which the user belongs. A user can have any number of groups.
ie,
Table: user
user_id | user_name | user_groups
---------+-------------+-------------
1 | harry | {1,2,3}
2 | John | {4,5,6}
Table: Groups
group_id | group_name
------------+--------------
1 | Arts
2 | Science
3 | Security
4 | Sports
(Pardon, It should have been an 1-N relationship). I need to execute a query as follows,
SELECT * from user where user_groups = ANY(x);
where x will be text values Arts,Science,Security,Sports.
So when x= Arts, the result of harry is returned. The database that I'm using is Postgresql8.4
You can use #> contains operator:
SELECT *
FROM Users
WHERE user_groups #> (SELECT ARRAY[group_id]
FROM Groups
WHERE group_name = 'Arts')
SqlFiddleDemo
EDIT:
Is there any way by which I could display user_groups like
{Arts,Science,Security}, instead of {1,2,3}
You could use correlated subquery:
SELECT user_id, user_name, (SELECT array_agg(g.group_name)
FROM Groups g
WHERE ARRAY[g.group_id] <# u.user_groups) AS user_groups
FROM Users u
WHERE user_groups #> (SELECT ARRAY[group_id]
FROM Groups
WHERE group_name = 'Arts')
SqlFiddleDemo2