Psql pivot table with related data - postgresql

I'm trying to create a pivot table from related data. I am planning on using this data to better understand which tags are correlated to high frequency users.
I have a users table, a tag_users and a tags table.
EDIT
The non-pivoted query is
SELECT users.id, tags.title
FROM users
INNER JOIN tag_users ON tag_users.user_id = users.id
INNER JOIN tags ON user_tags.tag_id = tags.id
I would like to create an output with the following format:
user_id | tag_1_name | tag_2_name | ...tag_{n}_name
1 1 0 1
I have ~ 196 tags and ~ 40,000 users and downloading tags a a comma separated field and then doing a lookup on excel breaks my computer.
I am aware of crosstab but I can't find any examples where it uses related data.
What do you suggest as a suitable solution?

Related

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.
Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Does PostgreSQL have a way of creating metadata about the data in a particular table?

I'm dealing with a lot of unique data that has the same type of columns, but each group of rows have different attributes about them and I'm trying to see if PostgreSQL has a way of storing metadata about groups of rows in a database or if I would be better off adding custom columns to my current list of columns to track these different attributes. Microsoft Excel for instance has a way you can merge multiple columns into a super-column to group multiple columns into one, but I don't know how this would translate over to a PostgreSQL database. Thoughts anyone?
Right, can't upload files. Hope this turns out well.
Section 1 | Section 2 | Section 3
=================================
Num1|Num2 | Num1|Num2 | Num1|Num2
=================================
132 | 163 | 334 | 1345| 343 | 433
......
......
......
have a "super group" of columns (In SQL in general, not just postgreSQL), the easiest approach is to use multiple tables.
Example:
Person table can have columns of
person_ID, first_name, last_name
employee table can have columns of
person_id, department, manager_person_id, salary
customer table can have columns of
person_id, addr, city, state, zip
That way, you can join them together to do whatever you like..
Example:
select *
from person p
left outer join student s on s.person_id=p.person_id
left outer join employee e on e.person_id=p.person_id
Or any variation, while separating the data into different types and PERHAPS save a little disk space in the process (example if most "people" are "customers", they don't need a bunch of employee data floating around or have nullable columns)
That's how I normally handle this type of situation, but without a practical example, it's hard to say what's best in your scenario.

SQL: find overlapping cell values across the same column of many tables

I'm working with postgresql and I'm a bit of a newbie to SQL, generally.
I'm attempting to write a query that checks if a values overlap between a master list and multiple tables. The values in question are usernames and the multiple tables (30 in all) each represent event data for different games.
Each game has its own table with identical column headings. 30 tables that have identical columns something like this...:
table name: game1...game30
USERNAME EVENT_TIMESTAMP OTHER_FELIDS
2592761928AF756E45891527ED49A7A9 2016-02-01 02:38:05 ...
79460FE440ADB429F542D2F08A763D50 2016-02-01 02:38:35 ...
3945B26DD9F6FD2D49574856ECF9FA7D 2016-02-01 02:44:12 ...
A597AE2CF6E15497EE7AC2A02CEEB32E 2016-02-01 02:46:57 ...
65DE308FC39980CCD37DBDE8A432F221 2016-02-01 02:46:57 ...
...
I have a list of specified user_ids that I've used to create a "key table" I'm attempting to write a query that will tell me whether or not any of the users in my key table list show up in the game's event data.
My key table is only two columns and looks something like this:
table name: username_key
EMAIL HASHED_EMAIL
asd0#asd.com 79460FE440ADB429F542D2F08A763D50
asd1#asd.com 0C450FAC330D69A315604CDE61C7A65E
asd2#asd.com F2D7714CBA1048A940231087549F1D95
bob#asd.com FE793A075E0633441B5EE5535FAAEDD2
asd7#asd.com 47FAFD07C174B81BADD28AD9BE64E26B
...
(Note: the username in both the games tables and the key table are hash encrypted emails, hence the name "HASHED_EMAILS")
My query currently looks like this:
create temp table players as select ky.hashed_email from username_key as ky
inner join game1 g1 on ky.hashed_email = g1.username
inner join game2 g2 on ky.hashed_email = g2.username
inner join game3 g3 on ky.hashed_email = g3.username
inner join game4 g4 on ky.hashed_email = g4.username
...
inner join game30 g30 on ky.hashed_email = g30.username
When I try to run this query it hangs for a long long time... Hours and eventually times out.
I'm hoping to return a list of users that show up in one or more of the game events tables, or return an empty list (which would tell me that no one in my key table list has played the games).
Am I on the right track with my query?
Is there a faster/more efficient way to accomplish this task then the way I'm doing it?
How would you, a postgresql expert, solve this problem (finding specific occurrences of usernames across many different tables)?
If you care for a user being in any of the tables, not multiple ones, you have the following alternatives:
IN with UNION:
SELECT * FROM players WHERE hashed_email IN (
SELECT username FROM game1
UNION SELECT username FROM game2
UNION SELECT username FROM game3
...
)
IN with OR:
SELECT * FROM player WHERE hashed_email IN (SELECT username FROM game1)
OR hashed_email IN (SELECT username FROM game2)
OR hashed_email IN (SELECT username FROM game3)
...
EXISTS:
SELECT * FROM player WHERE EXISTS (SELECT 1 FROM game1 WHERE username=hashed_email)
OR EXISTS (SELECT 1 FROM game2 WHERE username=hashed_email)
OR EXISTS (SELECT 1 FROM game3 WHERE username=hashed_email)
...
There are probably quite a few other alternatives. You should probably use EXPLAIN or EXPLAIN ANALYZE to find out which is the more efficient, though I wouldn't be surprised if all three resulted in a substantially similar query plan.
Note that having an appropriate index on username in each of the game* tables would of course help a lot.

Alternative when IN clause is inputed A LOT of values (postgreSQL)

I'm using the IN clause to retrieve places that contains certain tags. For that I simply use
select .. FROM table WHERE tags IN (...)
For now the number of tags I provide in the IN clause is around 500) but soon (in the near future) number tags will probably jump off to easily over 5000 (maybe even more)
I would guess there is some kind of limition in both the size of the query AND in the number values in the IN clause (bonus question for curiosity what is this value?)
So my question is what is a good alternative query that would be future proof even if in the future I would be matching against let's say 10'000 tags ?
ps: I have looked around and see people mentioning "temporary table". I have never used those. How will they be used in my case? Will i need to create a temp table everytime I make a query ?
Thanks,
Francesco
One option is to join this to a values clause
with parms (tag) as (
values ('tag1'), ('tag2'), ('tag3')
)
select t.*
from the_table t
join params p on p.tag = t.tag;
You could create a table using:
tablename
id | tags
----+----------
1 | tag1
2 | tag2
3 | tag3
And then do:
select .. FROM table WHERE tags IN (SELECT * FROM tablename)

fetching a table with another tables "conditions"

I have two tables like this:
Table Name: users
emx | userid
---------------
1 | 1
2 | 2
and another table called bodies
id | emx | text
--------------------------
1 | 1 | Hello
2 | 2 | How are you?
As you can see, bodies table has emx which is id numbers of users table. Now, when i want to fetch message that contains Hello i just search it on bodies and get the emx numbers and after that i fetch users table with these emx numbers. So, i am doing 2 sql queries to find it.
So, all i want to do is make this happen in 1 SQL query.
I tried some queries which is not correct and also i tried JOIN too. No luck yet. I just want to fetch users table with message contains 'Hello' in bodies table.
Note: I am using PostgreSQL 9.1.3.
Any idea / help is appreciated.
Read docs on how to join tables.
Try this:
SELECT u.emx, u.userid, b.id, b.text
FROM bodies b
JOIN users u USING (emx)
WHERE b.text ~ 'Hello';
This is how I'd do the join. I've left out the exact containment test.
SELECT users.userid
FROM users JOIN bodies ON (users.emx = bodies.emx)
WHERE ⌜true if bodies.text contains ?⌟