fetching a table with another tables "conditions" - postgresql

I have two tables like this:
Table Name: users
emx | userid
---------------
1 | 1
2 | 2
and another table called bodies
id | emx | text
--------------------------
1 | 1 | Hello
2 | 2 | How are you?
As you can see, bodies table has emx which is id numbers of users table. Now, when i want to fetch message that contains Hello i just search it on bodies and get the emx numbers and after that i fetch users table with these emx numbers. So, i am doing 2 sql queries to find it.
So, all i want to do is make this happen in 1 SQL query.
I tried some queries which is not correct and also i tried JOIN too. No luck yet. I just want to fetch users table with message contains 'Hello' in bodies table.
Note: I am using PostgreSQL 9.1.3.
Any idea / help is appreciated.

Read docs on how to join tables.
Try this:
SELECT u.emx, u.userid, b.id, b.text
FROM bodies b
JOIN users u USING (emx)
WHERE b.text ~ 'Hello';

This is how I'd do the join. I've left out the exact containment test.
SELECT users.userid
FROM users JOIN bodies ON (users.emx = bodies.emx)
WHERE ⌜true if bodies.text contains ?⌟

Related

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.
Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Why does using DISTINCT ON () at different points in a query return different (unintuitive) results?

I’m querying from a table that has repeated uuids, and I want to remove duplicates. I also want to exclude some irrelevant data which requires joining on another table. I can remove duplicates and then exclude irrelevant data, or I can switch the order and exclude then remove duplicates. Intuitively, I feel like if anything, removing duplicates then joining should produce more rows than joining and then removing duplicates, but that is the opposite of what I’m seeing. What am I missing here?
In this one, I remove duplicates in the first subquery and filter in the second, and I get 500k rows:
with tbl1 as (
select distinct on (uuid) uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
)
select * from tbl2
If I filter then remove duplicates, I get 550k rows:
with tbl1 as (
select uuid, foreign_key
from original_data
where date > some_date
),
tbl2 as (
select uuid
from tbl1
left join other_data
on tbl1.foreign_key = other_data.id
where other_data.category <> something
),
tbl3 as (
select distinct on (uuid) uuid
from tbl2
)
select * from tbl3
Is there an explanation here?
Does original_data.foreign_key have a foreign key constraint referencing other_data.id allowing for foreign_keys that don't link to any id in other_data?
Isn't other_data.category or original_data.foreign_key column missing a NOT NULL constraint?
In either of these cases postgres would filter out all records with
a missing link (foreign_key=null)
a broken link (foregin_key doesn't match any id in other_data)
linking to an other_data record with a category set o null
in both of your approaches - regardless of whether they're a duplicate or not - as other_data.category <> something evaluates to null for them which does not satisfy the WHERE clause. That, combined with missing ORDER BY causing DISTINCT ON to drop different duplicates randomly each time, could result in dropping the duplicates that then get filtered out in tbl2 in the first approach, but not in the second.
Example:
pgsql122=# select * from original_data;
uuid | foreign_key | comment
------+-------------+---------------------------------------------------
1 | 1 | correct, non-duplicate record with a correct link
3 | 2 | duplicate record with a broken link
3 | 1 | duplicate record with a correct link
4 | null | duplicate record with a missing link
4 | 1 | duplicate record with a correct link
5 | 3 | duplicate record with a correct link, but a null category behind it
5 | 1 | duplicate record with a correct link
6 | null | correct, non-duplicate record with a missing link
7 | 2 | correct, non-duplicate record with a broken link
8 | 3 | correct, non-duplicate record with a correct link, but a null category behind it
pgsql122=# select * from other_data;
id | category
----+----------
1 | a
3 | null
Both of your approaches keep uuid 1 and eliminate uuid 6, 7 and 8 even though they're unique.
Your first approach randomly keeps between 0 and 3 out of the 3 pairs of duplicates (uuid 3, 4 and 5), depending on which one in each pair gets discarded by DISTINCT ON.
Your second approach always keeps one record for each uuid 3, 4 and 5. Each clone with missing link, a broken link or a link with a null category behind it is already gone by the time you discard duplicates.
As #a_horse_with_no_name suggested, ORDER BY should make DISTINCT ON consistent and predictable but only as long as records vary on the columns used for ordering. It also won't help if you have other issues, like the one I suggest.

Does PostgreSQL have a way of creating metadata about the data in a particular table?

I'm dealing with a lot of unique data that has the same type of columns, but each group of rows have different attributes about them and I'm trying to see if PostgreSQL has a way of storing metadata about groups of rows in a database or if I would be better off adding custom columns to my current list of columns to track these different attributes. Microsoft Excel for instance has a way you can merge multiple columns into a super-column to group multiple columns into one, but I don't know how this would translate over to a PostgreSQL database. Thoughts anyone?
Right, can't upload files. Hope this turns out well.
Section 1 | Section 2 | Section 3
=================================
Num1|Num2 | Num1|Num2 | Num1|Num2
=================================
132 | 163 | 334 | 1345| 343 | 433
......
......
......
have a "super group" of columns (In SQL in general, not just postgreSQL), the easiest approach is to use multiple tables.
Example:
Person table can have columns of
person_ID, first_name, last_name
employee table can have columns of
person_id, department, manager_person_id, salary
customer table can have columns of
person_id, addr, city, state, zip
That way, you can join them together to do whatever you like..
Example:
select *
from person p
left outer join student s on s.person_id=p.person_id
left outer join employee e on e.person_id=p.person_id
Or any variation, while separating the data into different types and PERHAPS save a little disk space in the process (example if most "people" are "customers", they don't need a bunch of employee data floating around or have nullable columns)
That's how I normally handle this type of situation, but without a practical example, it's hard to say what's best in your scenario.

Psql pivot table with related data

I'm trying to create a pivot table from related data. I am planning on using this data to better understand which tags are correlated to high frequency users.
I have a users table, a tag_users and a tags table.
EDIT
The non-pivoted query is
SELECT users.id, tags.title
FROM users
INNER JOIN tag_users ON tag_users.user_id = users.id
INNER JOIN tags ON user_tags.tag_id = tags.id
I would like to create an output with the following format:
user_id | tag_1_name | tag_2_name | ...tag_{n}_name
1 1 0 1
I have ~ 196 tags and ~ 40,000 users and downloading tags a a comma separated field and then doing a lookup on excel breaks my computer.
I am aware of crosstab but I can't find any examples where it uses related data.
What do you suggest as a suitable solution?

PostgreSQL: Aggregate into array and join

Kind of new to pgsql, what i'm looking for is a bit of common need, I think, but I still wasn't able to find a solution.
I have 3 tables:
users
user_id|username|password
-------|--------|--------
1 | guest | blabla
2 | admin | blabla
roles
role_id|name | descr
-------|-----|--------
1 |role1|role one
2 |role2|role two
user_roles
user_id|role_id
-------|-------
2 | 1
2 | 2
I want to display a table of user along with all its' roles, my guess (feel free to correct me), the way to do that would be:
Group user roles into array:
select user_id,array_agg(role_id) from user_roles group by user_id where user_id = 2;
Somehow join the array of array_agg(role_id) into roles to select the role names.
As you can see, I'm a bit confused on how to do #2. Is this the best way to do what I want? Is there something wrong with how I've build my db tables?
Thanks!
To achieve your query, you would first do the join between the tables and at last aggregate on the name column.
Note that it may be prettier using string_agg(r.name,',')
select ur.user_id,array_agg(r.name)
from user_roles ur,
roles r
where ur.role_id = r.role_id
and ur.user_id = 2
group by ur.user_id;