How can i compare 2 tables in postgresql? - postgresql

i have a table named hotel with 2 columns named : hotel_name , hotel_price
hotel_name | hotel_price
hotel1 | 5
hotel2 | 20
hotel3 | 100
hotel4 | 50
and another table named city that contains the column : city_name , average_prices
city_name | average_prices
paris | 20
london | 30
rome | 75
madrid | 100
I want to find which hotel has a price that's more expensive than average prices in the cities.For example i want in the end to find something like this:
hotel_name | city_name
hotel3 | paris --hotel3 is more expensive than the average price in paris
hotel3 | london --hotel3 is more expensive than the average price in london etc.
hotel3 | rome
hotel4 | paris
hotel4 | london
(I found the hotels that are more expensive than the average prices of the cities)
Any help would be valuable thank you .

A simple join is all that is needed. Typically tables are joined on a defined relationship (PK/FK) but there is nothing requiring that. See fiddle.
select h.hotel_name, c.city_name
from hotels h
join cities c
on h.hotel_price > c.average_prices;
However, while you can get the desired results, it's pretty meaningless. You cannot tell whether a particular hotel is even in a given city.

Related

Calculate Average of Price per Items per Month in a Few Years Postgresql

I have this table inside my postgresql database,
item_code | date | price
==============================
aaaaaa.1 |2019/12/08 | 3.04
bbbbbb.b |2019/12/08 | 19.48
261893.c |2019/12/08 | 7.15
aaaaaa.1 |2019/12/17 | 4.15
bbbbbb.2 |2019/12/17 | 20
xxxxxx.5 |2019/03/12 | 3
xxxxxx.5 |2019/03/18 | 4.5
how can i calculate the average per item, per month over the year. so i get the result something like:
item_code | month | price
==============================
aaaaaa.1 | 2019/12 | 3.59
bbbbbb.2 | 2019/12 | 19.74
261893.c | 2019/12 | 7.15
xxxxxx.5 | 2019/03 | 3.75
I have tried to look and apply many alternatives but i am still not get the point, would really appreciate your help because i am new to postgresql.
I don't see how the question relates to a moving average. It seems you just want group by:
select item_code, date_trunc('month', date) as date_month, avg(price) as price
from mytable
group by item_code, date_month
This gives date_month as a date, truncated to the first day of the month - which I find more useful that the format you suggested. But it you do want that:
to_char(date, 'YYYY/MM') as date_month

Cognos force 0 on group by

I've got a requirement to built a list report to show volume by 3 grouped by columns. The issue i'm having is if nothing happened on specific days for the specific grouped columns, i cant force it to show 0.
what i'm currently getting is something like:
ABC | AA | 01/11/2017 | 1
ABC | AA | 03/11/2017 | 2
ABC | AA | 05/11/2017 | 1
what i need is:
ABC | AA | 01/11/2017 | 1
ABC | AA | 02/11/2017 | 0
ABC | AA | 03/11/2017 | 2
ABC | AA | 04/11/2107 | 0
ABC | AA | 05/11/2017 | 1
ive tried going down the route of unioning a "dummy" query with no query filters, however there are days where nothing has happened, at all, for those first 2 columns so it doesn't always populate.
Hope that makes sense, any help would be greatly appreciated!
to anyone who wanted an answer i figured it out. Query 1 for just the dates, as there will always be some form of event happening daily so will always give a unique date range.
query 2 for the other 2 "grouped by" columns.
Create a data item in each with "1" as the result (but would work with anything as long as they are the same).
Query 1, left join to Query 2 on this new data item.
This then gives a full combination of all 3 columns needed. The resulting "Query 3" can then be left joined again to get the measures. Final query (depending on aggregation) may need to have the measure data item wrapped with a COALESCE/ISNULL to create a 0 on those days nothing happened.

How to find out the keywords in two hadoop tables with Spark?

I have two tables in HDFS. One table (Table-1) has some keywords as you can see below. Another table (Table-2) has a text column. Every row could have more than one keyword in Table-1. I need to find out all the matched keywords in Table-1 for the text column in Table-2, and output the keyword list for every row in Table-2.
Example :
Table-1:
ID | Name | Age | City | Gender
---------------------------------
111 | Micheal | 19 | NY | male
222 | George | 23 | CA | male
333 | Linda | 22 | LA | female
Table-2:
Text_Description
------------------------------------------------------------------------
1-Linda and my cousin left the house.
2-Michael who is 19 year old, and George are going to rock concert in CA.
3-Shopping card is ready at the NY for male persons.
Output:
1- Linda
2- Micheal, 19, George, CA
3- NY, male

sql - aggregate count and share by group

with table t1 like below, need to get the count by each make and the share by each make
+--------+
| make |
+--------+
| toyota |
| audi |
| bmw |
| bmw |
| audi |
+--------+
with below I can get get the car_cnt per make
select
make
, count (*) as car_cnt
from t1
group by make
how do I get the share (%) for each make ?
Using COUNT as an analytic function, we can make a single pass over your table and compute the market share for each car.
select distinct
make,
count(*) over (partition by make) as car_cnt,
100.0 * count(*) over (partition by make) / count(*) over () as car_pct
from t1
Output:
make car_cnt car_pct
1 audi 2 40
2 bmw 2 40
3 toyota 1 20
Demo here:
Rextester

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.
This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table