Reshaping data in a postgres query? - postgresql

I have two tables in CartoDB, one of community district polygons, and one of sites that are in those community districts.
I know the district (borocd) of each site, so I can get a list of counts of sites of each type with:
SELECT borocd, type, count(*) FROM sites GROUP BY borocd, type
But I'm having a hard time wrapping my head around how I'd update my "districts" table with columns for count of type1 and count of type2 in a single query. I wound up doing this:
UPDATE districts
SET type1_sites = (
SELECT count(*) FROM sites
WHERE type='type1' AND districts.borocd = sites.borocd
GROUP BY borocd
)
And repeating that for type 2. But could I have done that more cleanly?

UPDATE districts
SET type1_sites = (
SELECT count(*) FROM sites WHERE type='type1' AND districts.borocd = sites.borocd
),
type2_sites = (
SELECT count(*) FROM sites WHERE type='type2' AND districts.borocd = sites.borocd
);
Assuming you have separate columns for type1 and type2.

Related

Use postgresql query results to form another query

I am trying to select from one table using the select result from another table. I can run this in two queries but would like to optimize it into just one.
First query.. Select ids where matching other id
select id from lookuptable where paid = '547'
This results in something like this
6316352
6316353
6318409
6318410
6320468
6320469
6320470
6322526
6322527
6324586
6324587
6326648
I would like to then use this result to make another selection. I can do it manually like below. Note, there could be many rows with these values so I've been using a IN statement
select * from "othertable" where id in (6316352,6316353,6318409,6318410,6320468,6320469,6320470,6322526,6322527,6324586,6324587,6326648);
select
ot.*
from
"othertable" as ot
join
lookuptable as lt
on
ot.id = lt.id
where
lt.paid = '547'
The IN operator supports not just value lists but also subqueries, so you can literally write
select * from "othertable" where id in (select id from lookuptable where paid = '547');

Postgresql query with double has_many relationships

I have this complex data relationship.
POSTGRESQL FIDDLE: https://www.db-fiddle.com/f/vm2z8qLuddzcHEgyaMnCbc/3
"Item Group" has many "items" through "item_ads" table.
So an Item Group has many part_number.
reports table contains the number of clicks for each day for each adgroupid.
Each adgroupid has_many part_numbers. (table: product_ads)
Now, I want to SUM all reports.clicks for each item_groups.id using the part_number to linked the tables.
After this, I have to consider only reports.adgroupid which are included in the part_numbers of item_group. So if "Item group" has three part_number (A, B, C) can be considered all adgroupid that contains A,B, or C but nothing more. If adgroupid contains part_number D it cannot be considered for clicks sum.
Expected results
I have to have a table with lots of item_group_ids.
I am looking for the PostgreSQL query to achieve this table.
First, let's build the query up in parts. It sounds like you already know how to get from item_group and adgroup to part_number, just not about how to join them. I've added a query that removes duplicates for part 1 of your question, but putting them into a CTE:
WITH unique_part_numbers AS (
SELECT DISTINCT item_groups.id AS item_group_id,
part_number
FROM item_groups
JOIN item_ads ON item_group_id = item_groups.id
JOIN items ON items.id = item_ads.item_id
)
SELECT unique_part_numbers.item_group_id, SUM(clicks)
FROM unique_part_numbers
JOIN product_ads ON product_ads.part_number = unique_part_numbers.part_number
JOIN reports ON product_ads.adgroupid = reports.adgroupid
GROUP BY item_group_id
About the second part - it's not possible to do it as you want, because you can have multiple adgroups per item_group - so I added adgroupid as an extra column. I create an array of part_numbers for the adgroup and check, using the #> operator, that all parts that are from the adgroupid are also from the unique_part_numbers.item_group_id.
WITH unique_part_numbers AS (
SELECT DISTINCT item_groups.id AS item_group_id,
part_number
FROM item_groups
JOIN item_ads ON item_group_id = item_groups.id
JOIN items ON items.id = item_ads.item_id
)
SELECT unique_part_numbers.item_group_id,
product_ads.adgroupid,
array_agg(unique_part_numbers.part_number),
SUM(clicks)
FROM unique_part_numbers
JOIN product_ads ON product_ads.part_number = unique_part_numbers.part_number
JOIN reports ON product_ads.adgroupid = reports.adgroupid
GROUP BY item_group_id, product_ads.adgroupid
HAVING array_agg(product_ads.part_number) #> (
SELECT ARRAY_AGG(other_product_ads.part_number)
FROM product_ads AS other_product_ads
WHERE other_product_ads.adgroupid = product_ads.adgroupid
)

how to filter data array_agg from postgresql

I have view at postgres db from this query
SELECT order_product.order_id,
array_agg(order_product.product_id) AS itemset
FROM order_product
GROUP BY order_product.order_id
ORDER BY order_product.order_id;
and this is the structure look like:
And the question is, how can U filter data at (itemset) just show where the value is more than 1 (example: don't show = {8}, just show the value when containing 2 data or more like this = {8,10})
Use the having() clause:
SELECT op.order_id,
array_agg(op.product_id) AS itemset
FROM order_product op
GROUP BY op.order_id
HAVING count(*) > 1 --<< here
ORDER BY op.order_id;

SQL Select all nodes in a three level tree that match a criterion

I have an PostgreSQL model (generated in the context of Django) that looks something like this:
CREATE TABLE org (
id INTEGER NOT NULL,
parent_id INTEGER,
name CHARACTER VARYING(24),
org_type CHARACTER VARYING(8),
country CHARACTER VARYING(2)
)
CREATE TABLE rate (
id INTEGER NOT NULL,
org_id INTEGER NOT NULL,
rate DOUBLE PRECISION NOT NULL,
currency CHARACTER VARYING(3)
)
where org_type is one of "group", "company" and "branch". Every branch has a company, and only companies belong to a group. Given an arbitrary company or branch, and a country, I need to find all of the rates whose org_id is that of a company and branch that belongs to the same group and that are in the specified country. So, in the following diagram, for either company 123 (in Canada) or branch 124 (in Toronto), a search for rates with country = "US" would find rates belonging to the companies or branches in the box labeled "Selected":
I'm trying something like the following for companies, where $1 is a country code and $2 is an org ID:
SELECT rate.org_id, rate.rate, rate.currency
FROM rate, org
WHERE (
org.country = $1 AND
rate.org_id=org.id AND
org.parent_id = $2
) OR (
...
and then I'm stuck, trying to figure out how to ask for the branches that belong to one of the companies that I just found. I'd really prefer one big WHERE clause that brings in all of the rates by any relevant organization, so that I don't have to hammer the DB with a whole bunch of queries.
Edit
Based on lau's answer, I've tried an example (SQL fiddle), but it's only returning rates for the organization that I'm starting with.
You can:
Apply the criteria on org just like you did
Browse all the way up/down using your initial org(s) as the starting point
Combine the 2 sets with a UNION
JOIN with rates (below I have done a LEFT OUTER JOIN to make it clear what is exactly included in the recursion).
Example:
WITH RECURSIVE SelectedOrg AS (
SELECT * FROM org WHERE id = 4
),
BrowseOrg AS (
SELECT 1 AS Direction, * FROM SelectedOrg
UNION ALL
SELECT -1, * FROM SelectedOrg
UNION ALL
SELECT Direction, org.* FROM org JOIN BrowseOrg ON (direction = 1 and org.parent_id = BrowseOrg.id) OR (direction = -1 and org.id = BrowseOrg.parent_id)
)
SELECT DISTINCT rates.id, BrowseOrg.id, rates.rate FROM BrowseOrg LEFT OUTER JOIN rates ON org_id = BrowseOrg.id
This will be flexible enough to handle cases where you do not know what level of org you have selected (group, company or branch).
In the future, this also should be able to cope with a deeper hierarchy (if you ever add levels to it).
You can do it in 2 queries with some sub query.
Let company_org_ids = select id from org where country code = $1 and parent_id = $2
Select distinct rate.* from rate where orgid in ($3) or where orgid in (select id from org where parent_id in $3 and country_code =$1)
Here $3 is company_org_ids result of first query.
Also this can be done with single query by replacing $3 variable with first query.

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.