PostgreSQL how do I COUNT with a condition? - postgresql

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.

Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

Related

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

Loop a result set and feed two tables

I have a select query that returns a huge result set (500k records). But for this example let's say it has only two records:
SELECT * FROM INVENTORY I
INNER JOIN PARTS P
ON I.partcode = P.partcode
ORDER BY I.partcode
The result will look more or less like this:
pk partcode genericname partname stock
1 001 mouse logitech 10
2 002 keyboard genius 8
I have to loop the result above and feed two tables (product and variant).
I first have to insert two of the columns into 'product' table, like this:
INSERT INTO PRODUCT
(p_code,product_name) values (partcode,genericname)
pk p_code product_name
5 001 mouse
6 001 keyboard
Then I have to grab the pk that was automatically generated into the table above (say ppk) and then insert it together with the other two columns into the 'variant' table, like this:
INSERT INTO VARIANT
(product_pk,variant_name,in_stock) values (ppk,partname,stock)
pk product_pk variant_name in_stock
10 5 logitech 10
11 6 genius 8
At the end I should have the product and the variant tables with 2 records each.
I could write a VB code to do that but I think that it can de done in pure SQL, and I just am not sure the best approach.
Someone could give me some help with this?
Thank you!
You could use a SQL cursor to loop through and insert a row at a time into PRODUCT and then use SCOPE_IDENTITY() to get the newly assigned identity value to insert a corresponding row into VARIANT, but best practice is to avoid cursors if there's another way. (There usually is, but not always.)
If the partcode/genericname combination will uniquely identify 1 record in PRODUCT, you could do this:
INSERT INTO PRODUCT (p_code,product_name)
SELECT partcode, genenricname
FROM INVENTORY I INNER JOIN PARTS P ON I.partcode = P.partcode
(I would eliminate the ORDER BY from your query unless you care about the order the identity values are assigned.)
Then, run this:
INSERT INTO VARIANT
(product_pk,variant_name,in_stock)
SELECT pr.ppk, i.partname, i.stock
FROM inventory i INNER JOIN parts p ON i.partcode = p.partcode
INNER JOIN product pr on i.partcode = pr.p_code and i.genericname = pr.product_name
You may have to clean up the aliases between i and p in the 2nd query. I can't tell which table (inventory or parts) the variant_name and in_stock fields are coming from so I just used i.
Again - this assumes that partcode/genericname combination is unique in the PRODUCT table.

How to sort table alphabetically by name initial?

I have a table contains columns 'employeename' and 'id', how can I sort the 'employeename' column following alphabetical order of the names initial?
Say the table is like this now:
employeename rid eid
Dave 1 1
Ben 4 2
Chloe 6 6
I tried the command ORDER BY, it shows what I want but when I query the data again by SELECT, the showed table data is the same as original, indicting ORDER BY does not modify the data, is this correct?
SELECT *
FROM employee
ORDER BY employeename ASC;
I expect the table data to be modified (sorted by names alphabetical order) like this:
employeename rid eid
Ben 4 2
Chloe 6 6
Dave 1 1
the showed table data is the same as original, indicting ORDER BY does not modify the data, is this correct?
Yes, this is correct. A SELECT statement does not change the data in a table. Only UPDATE, DELETE, INSERT or TRUNCATE statements will change the data.
However, your question shows a misconception on how a relational database works.
Rows in a table (of a relational database) are not sorted in any way. You can picture them as balls in a basket.
If you want to display data in a specific sort order, the only (really: the only) way to do that is to use an ORDER BY in your SELECT statement. There is no alternative to that.
Postgres allows to define a VIEW that includes an ORDER BY which might be an acceptable workaround for you:
CREATE VIEW sorted_employee;
AS
SELECT *
FROM employee
ORDER BY employeename ASC;
Then you can simply use
select *
from sorted_employees;
But be aware of the drawbacks. If you run select * from sorted_employees order by id then the data will be sorted twice. Postgres is not smart enough to remove the (useless) order by from the view's definition.
Some related questions:
Default row order in SELECT query - SQL Server 2008 vs SQL 2012
What is the default SQL result sort order with 'select *'?
Is PostgreSQL order fully guaranteed if sorting on a non-unique attribute?
Why do results from a SQL query not come back in the order I expect?

Psql pivot table with related data

I'm trying to create a pivot table from related data. I am planning on using this data to better understand which tags are correlated to high frequency users.
I have a users table, a tag_users and a tags table.
EDIT
The non-pivoted query is
SELECT users.id, tags.title
FROM users
INNER JOIN tag_users ON tag_users.user_id = users.id
INNER JOIN tags ON user_tags.tag_id = tags.id
I would like to create an output with the following format:
user_id | tag_1_name | tag_2_name | ...tag_{n}_name
1 1 0 1
I have ~ 196 tags and ~ 40,000 users and downloading tags a a comma separated field and then doing a lookup on excel breaks my computer.
I am aware of crosstab but I can't find any examples where it uses related data.
What do you suggest as a suitable solution?

fetch data from and to date to get all matching results

Hello everyone I have to get data from and to date, I tried using between clause which fails to retrieve data what I need. Here is what I need.
I have table called hall_info which has following structure
hall_info
id | hall_name |address |contact_no
1 | abc | India |XXXX-XXXX-XX
2 | xyz | India |XXXX-XXXX-XX
Now I have one more table which is events, that contains data about when and which hall is booked on what date, the structure is as follows.
id |hall_info_id |event_date(booked_date)| event_name
1 | 2 | 2015-10-25 | Marriage
2 | 1 | 2015-10-28 | Marriage
3 | 2 | 2015-10-26 | Marriage
So what I need now is I wanna show hall_names that are not booked on selected dates, suppose if user chooses from 2015-10-23 to 2015-10-30 so I wanna list all halls that are not booked on selected dates. In above case both the halls of hall_info_id 1 and 2 ids booked in given range but still I wanna show them because they are free on 23,24,27 and on 29 date.
In second case suppose if user chooses date from 2015-10-25 and 2015-10-26 then only hall_info_id 2 is booked on both the dates 25 and 26 so in this case i wanna show only hall_info_id 1 as hall_info_id 2 is booked.
I tried using inner query and between clause but I am not getting required result to simply i have given only selected fields I have more tables to join so i cant paste my query please help with this. Thanks in advance for all who are trying.
Some changes in Yasen Zhelev's code:
SELECT * FROM hall_info
WHERE id not IN (
SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(DISTINCT event_date) > DATE_PART('day', '2015-10-30'::timestamp - '2015-10-23'::timestamp))
I have not tried it but how about checking if the number of bookings per hall is less than the actual days in the selected period.
SELECT * FROM hall_info WHERE id NOT IN
(SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(id) < DATEDIFF(day, '2015-10-30', '2015-10-23')
);
That will only work if you have one booking per day per hall.
To get the "available dates" for the hall returned, your query needs a row source of all possible dates. For example, if you had a calendar table populated with possible date values, e.g.
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) Engine=InnoDB
;
INSERT INTO cal (dt) VALUES ('2015-10-23')
,('2015-10-24'),('2015-10-25'),('2015-10-26'),('2015-10-27')
,('2015-10-28'),('2015-10-29'),('2015-10-30'),('2015-10-31')
;
The you could use a query that performs a cross join between the calendar table and hall_info... to get every hall on every date... and an anti-join pattern to eliminate rows that are already booked.
The anti-join pattern is an outer join with a restriction in the WHERE clause to eliminate matching rows.
For example:
SELECT cal.dt, h.id, h.hall_name, h.address
FROM cal cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = cal.dt
WHERE e.id IS NULL
AND cal.dt >= '2015-10-23'
AND cal.dt <= '2015-10-30'
The cross join between cal and hall_info gets all halls for all dates (restricted in the WHERE clause to a specified range of dates.)
The outer join to events find matching rows in the events table (matching on hall_id and event_date. The trick is the predicate (condition) in the WHERE clause e.id IS NULL. That throws out any rows that had a match, leaving only rows that don't have a match.
This type of problem is similar to other "sparse data" problems. e.g. How do you return a zero total for sales by a given store on a given date, when there are no rows with that store and date...
In your case, the query needs a source of rows with available date values. That doesn't necessarily have to be a table named calendar. (Other databases give us the ability to dynamically generate a row source; someday, MySQL may have similar features.)
If you want the row source to be dynamic in MySQL, then one approach would be to create a temporary table, and populate it with the dates, run the query referencing the temporary table, and then dropping the temporary table.
Another approach is to use an inline view to return the rows...
SELECT cal.dt, h.id, h.hall_name, h.address
FROM (
SELECT '2015-10-23'+INTERVAL 0 DAY AS dt
UNION ALL SELECT '2015-10-24'
UNION ALL SELECT '2015-10-25'
UNION ALL SELECT '2015-10-26'
UNION ALL SELECT '2015-10-27'
UNION ALL SELECT '2015-10-28'
UNION ALL SELECT '2015-10-29'
UNION ALL SELECT '2015-10-30'
) cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = c.dt
WHERE e.id IS NULL
FOLLOWUP: When this question was originally posted, it was tagged with mysql. The SQL in the examples above is for MySQL.
In terms of writing a query to return the specified results, the general issue is still the same in PostgreSQL. The general problem is "sparse data".
The SQL query needs a row source for the "missing" date values, but the specification doesn't provide any source for those date values.
The answer above discusses several possible row sources in MySQL: 1) a table, 2) a temporary table, 3) an inline view.
The answer also mentions that some databases (not MySQL) provide other mechanisms that can be used as a row source.
For example, PostgreSQL provides a nifty generate_series function (Reference: http://www.postgresql.org/docs/9.1/static/functions-srf.html.
It should be possible to use the generate_series function as a row source, to supply a set of rows containing the date values needed by the query to produced the specified result.
This answer demonstrates the approach to solving the "sparse data" problem.
If the specification is to return just the list of halls, and not the dates they are available, the queries above can be easily modified to remove the date expression from the SELECT list, and add a GROUP BY clause to collapse the rows into a distinct list of halls.