I want to get unique rows on joining tables - postgresql

I am running the following query while trying to join 3 tables :
select
a.project_id, a.acc_name, a.project_name, a.iot,a.acc_id, a.active,
b.app_fte, b.contact_person, c.cost_call_date
from
Account a, Application b, account_version c
where
a.acc_id in (Select acc_id from account where acc_name='GGG') and
EXTRACT(MONTH FROM c.cost_call_date) = 3;
Sample data from the tables are as follows :
Account :
acc_id acc_name iot acc_contact project_id project_name ilc_code license_no active
2 GGG NA YYY 7777 HHH TTR 766 false
Application :
app_id app_name app_fte contact_person acc_id
1 sfsf 4 sdsdff 2
Account_version :
line_id acc_id version_no chargable_fte cost_call_date is_approved
9 2 7 4 2018-03-20
Here acc_id is the primary key for the Account table and the foreign key for the Application and Account_version tables. When I am running the above query I am getting 30 rows I have also tried using the distinct keyword but still I get 10 rows. Please help me in getting unique rows.

Try something like this
SELECT DISTINCT a.project_id, a.acc_name, a.project_name, a.iot,a.acc_id, a.active, b.app_fte, b.contact_person, c.cost_call_date
FROM Account a
INNER JOIN Application b
USING (acc_id)
INNER JOIN account_version c
USING (acc_id)
WHERE a.acc_name = 'GGG'
AND EXTRACT(MONTH FROM c.cost_call_date) = 3
For reference as to why your query was giving you more rows than expected, try running this:
SELECT a.*, b.*
FROM generate_series(1, 10) a, generate_series(1, 10) b
What you are doing by selecting from multiple tables as you did is a cross join. What you should actually be doing is an inner join to get the rows you want, and a DISTINCT if required to get only distinct rows from the results.

Related

How to include and exclude ids in once query postgresql

I use PostgreSQL 13.3
I'm trying to think how I can make include/exclude in query at the same time
I have include_system_ids [1,5] and exclude_system_ids [3]
There's one big table - records
system_records table
record
system_id
1
1
1
5
1
3
2
1
2
5
If a record contains an exclusive identifier, then it should not be included in the final selection. I had some several tries, but I didn't get a necessary result
Awaiting result: record with id 2
Fact result: 1, 2
My variants
select r.id from records r
left join (select record_id from system_records
where system_id in (1,5)
) include_ids on r.id = include_ids
left join (select record_id from system_records
where system_id not in (3)
) exclude_ids on r.id = exclude_ids.id
Honestly, I don't understand how I can do it((
Is there anyone who can help me
Maybe this query could be a solution (result here)
with x as (select record,string_agg(system_id::varchar,',') as sys_id from records group by record)
select records.*
from records,x
where records.record = x.record
and x.sys_id = '1,5'

Query users on filter applied to a one-to-many relationship table postgresql

We currently have a users table with a one-to-many relationship on a table called steps. Each user can have either four steps or seven steps. The steps table schema is as follows:
id | user_id | order | status
-----------------------------
# | # |1-7/1-4| 0 or 1
I am trying to query all of the users who have a status of 1 on all of their steps. So if they have either 4 or 7 steps, they must all have a status of 1.
I tried a join with a check on step 4 (since a step cannot be complete without the previous one being complete as well) but this has issues if someone with 7 steps completed step 4 but not 7.
select u.first_name, u.last_name, u.email, date(s.updated_at) as completed_date
from users u
join steps s on u.id = s.user_id
where s.order = 4 and s.status = 1;
The bool_and aggregate function should help you to identify the users with all their steps at status = 1 whatever the number of steps.
Then the array_agg aggregate function can help to find the updated_at date associated to the last step for each user by ordering the dates according to order DESC and selecting the first value in the resulting array [1] :
SELECT u.first_name, u.last_name, u.email
, s.completed_date
FROM users u
INNER JOIN
( SELECT user_id
, (array_agg(updated_at ORDER BY order DESC))[1] :: date as completed_date
FROM steps
GROUP BY user_id
HAVING bool_and(status :: boolean) -- filter the users with all their steps status = 1
) AS s
ON u.id = s.user_id

How to simplify a join of 2 tables in HIVE and count values

I have two tables in HIVE, "orders" and "customers". I want to get top n user names of users who placed most orders (in status "CLOSED"). Orders table has key order_customer_id, column order_status and customers has key customer_id and name consists of 2 columns customer_fname and customer_lname.
ORDERS
order_customer_id, order_status
1,CLOSED
2,CLOSED
3,INPROGRESS
1,INPROGRESS
1,CLOSED
2,CLOSED
CUSTOMERS
customer_id, customer_fname, customer_lname
1,Mickey, Mouse
2,Henry, Ford
3,John, Doe
I tried this code:
select c.customer_id, count(o.order_customer_id) as COUNT, concat(c.customer_fname," ",c.customer_lname) as FULLNAME from customers c join orders o on c.customer_id=o.order_customer_id where o.order_status='CLOSED' group by c.customer_id,FULLNAME order by COUNT desc limit 10;
this does not work - returns error.
I was able to get the result by first creating a 3rd table:
create table id_sum as select o.order_customer_id,count(o.order_id) as COUNT from orders o join customers c on c.customer_id=o.order_customer_id where order_status='CLOSED' group by o.order_customer_id;
1833 6
5493 5
1363 5
1687 5
569 4
1764 4
1345 4
Then I joined the tables:
select s.*,concat(c.customer_fname," " ,c.customer_lname) from id_sum s join customers c on s.order_customer_id = c.customer_id order by count desc limit 20;
This resulted in desired output:
customer_id, order_count, full_name
1833 6 Ronald Smith
5493 5 Mary Cochran
1363 5 Kathy Rios
1687 5 Jerry Ellis
569 4 Mary Frye
1764 4 Megan Davila
1345 4 Adam Wilson
Is there a way how to write it in one command or more effectively?
The subquery with alias sq creates a relation with two columns order_count and customer_id calculating for each customer_id the total number of orders. This is then joined with the CUSTOMERS table. The result is sorted descending and limited to (the top) 10 rows.
SELECT c.customer_id, sq.order_count, concat(c.customer_fname," " ,c.customer_lname) as full_name
FROM CUSTOMERS c JOIN (
SELECT COUNT(*) as order_count, order_customer_id FROM ORDERS
WHERE order_status = 'CLOSED'
GROUP BY order_customer_id
) sq on c.customer_id = sq.order_customer_id
ORDER BY sq.order_count desc LIMIT 10
;
The idea is to use a subquery instead of a third table.

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

Using recursive CTE with Ecto

How would I go about using the result of a recursive CTE in a query I plan to run with Ecto? For example let's say I have a table, nodes, structured as so:
-- nodes table example --
id parent_id
1 NULL
2 1
3 1
4 1
5 2
6 2
7 3
8 5
and I also have another table nodes_users structured as so:
-- nodes_users table example --
node_id user_id
1 1
2 2
3 3
5 4
Now, I want to grab all the users with a node at or above a specific node, for the sake of an example let's choose the node w/ the id 8.
I could use the following recursive postgresql query to do so:
WITH RECURSIVE nodes_tree AS (
SELECT *
FROM nodes
WHERE nodes.id = 8
UNION ALL
SELECT n.*
FROM nodes n
INNER JOIN nodes_tree nt ON nt.parent_id = n.id
)
SELECT u.* FROM users u
INNER JOIN users_nodes un ON un.user_id = u.id
INNER JOIN nodes_tree nt ON nt.id = un.node_id
This should return users.* for the users w/ id of 1, 2, and 4.
I'm not sure how I could run this same query using ecto, ideally in a manner that would return a chainable output. I understand that I can insert raw SQL into my query using the fragment macro, but I'm not exactly sure where that would go for this use or if that would even be the most appropriate route to take.
Help and/or suggestions would be appreciated!
I was able to accomplish this using a fragment. Here's an example of the code I used. I'll probably move this method to a stored procedure.
Repo.all(MyProj.User,
from u in MyProj.User,
join: un in MyProj.UserNode, on: u.id == un.user_id,
join: nt in fragment("""
(
WITH RECURSIVE node_tree AS (
SELECT *
FROM nodes
WHERE nodes.id = ?
UNION ALL
SELECT n.*
FROM nodes n
INNER JOIN node_tree nt ON nt.parent_id == n.id
)
) SELECT * FROM node_tree
""", ^node_id), on: un.node_id == nt.id
)