Query join result appears to be incorrect

Query join result appears to be incorrect - postgresql

I have no idea what's going on here. Maybe I've been staring at this code for too long.
The query I have is as follows:
CREATE VIEW v_sku_best_before AS
SELECT
sw.sku_id,
sw.sku_warehouse_id "A",
sbb.sku_warehouse_id "B",
sbb.best_before,
sbb.quantity
FROM SKU_WAREHOUSE sw
LEFT OUTER JOIN SKU_BEST_BEFORE sbb
ON sbb.sku_warehouse_id = sw.warehouse_id
ORDER BY sbb.best_before
I can post the table definitions if that helps, but I'm not sure it will. Suffice to say that SKU_WAREHOUSE.sku_warehouse_id is an identity column, and SKU_BEST_BEFORE.sku_warehouse_id is a child that uses that identity as a foreign key.
Here's the result when I run the query:
+--------+-----+----+-------------+----------+
| sku_id | A | B | best_before | quantity |
+--------+-----+----+-------------+----------+
| 20251 | 643 | 11 | <<null>> | 140 |
+--------+-----+----+-------------+----------+
(1 row)
The join specifies that the sku_warehouse_id columns have to be equal, but when I pull the ID from each table (labelled as A and B) they're different.
What am I doing wrong?

Perhaps just sw.sku_warehouse_id instead of sw.warehouse_id?

Related

How to query parent child in PostgreSQL?

I have the following table structure :
place_id | parent_place_id | name
---------|-----------------|------------
1 | 2 | child
---------|-----------------|------------
2 | 3 | dad
---------|-----------------|------------
3 | | grandfather
......
I am trying to write a query so that my output data is as follows :
id_Grandfather | name_Grandfather | id_Dad | name_Dad | id_Child | name_child
----------------|------------------|--------|----------|----------|-----------
3 | grandfather | 2 | dad | 1 | child
I have tried many ways but not getting the expected result. Can anyone help me to solve it? Thank !

There is a way to do it with double join. But does it make any sense is totally different question.
SELECT
gf.place_id as id_Grandfather,
gf.name as name_Grandfather,
d.place_id as id_Dad,
d.name as name_Dad,
c.place_id as id_Child,
c.name as name_Child
FROM
your_table c
LEFT JOIN your_table d ON c.parent_place_id = d.place_id
LEFT JOIN your_table gf ON d.parent_lace_id = gf.place_id
-- Add this if you want to have only lines which has Dad and Grandfather fields populated
WHERE d.place_id IS NOT NULL
;

How to get back aggregate values across 2 dimensions using Python Cubes?

Situation
Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5.
These are my datatables in pictorial form:
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to use cubes to be able to do a display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
Note the following:
Even though there were no records in sales for Saucer 12 under Store S3, I displayed 0 instead of null or none.
I want to be able to do sort by store, say descending order for, S3.
The cells indicate the SUM total of that particular product spent in that particular store.
I also want to have pagination.
What I tried
This is the configuration I used:
"cubes": [
{
"name": "sales",
"dimensions": ["product", "store"],
"joins": [
{"master":"product_id", "detail":"product.id"},
{"master":"store_id", "detail":"store.id"}
]
}
],
"dimensions": [
{ "name": "product", "attributes": ["code", "name"] },
{ "name": "store", "attributes": ["code", "address"] }
]
This is the code I used:
result = browser.aggregate(drilldown=['Store','Product'],
order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])
I didn't get what I want.
I got it like this:
----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1 | 1 | 7.05 |
| 1 | 2 | 9 |
| 2 | 3 | 2.00 |
| and many more .... |
|---------------------------------------------|
which is the whole table with no pagination and if the products not sold in that store it won't show up as zero.
My question
How do I get what I want?
Do I need to create another data table that aggregates everything by store and product before I use cubes to run the query?
Update
I have read more. I realised that what I want is called dicing as I needed to go across 2 dimensions. See: https://en.wikipedia.org/wiki/OLAP_cube#Operations
Cross-posted at Cubes GitHub issues to get more attention.

This is a pure SQL solution using crosstab() from the additional tablefunc module to pivot the aggregated data. It typically performs better than any client-side alternative. If you are not familiar with crosstab(), read this first:
PostgreSQL Crosstab Query
And this about the "extra" column in the crosstab() output:
Pivot on Multiple Columns using Tablefunc
SELECT product_id, product
, COALESCE(s1, 0) AS s1 -- 1. ... displayed 0 instead of null
, COALESCE(s2, 0) AS s2
, COALESCE(s3, 0) AS s3
, COALESCE(s4, 0) AS s4
, COALESCE(s5, 0) AS s5
FROM crosstab(
'SELECT s.product_id, p.name, s.store_id, s.sum_amount
FROM product p
JOIN (
SELECT product_id, store_id
, sum(amount) AS sum_amount -- 3. SUM total of product spent in store
FROM sales
GROUP BY product_id, store_id
) s ON p.id = s.product_id
ORDER BY s.product_id, s.store_id;'
, 'VALUES (1),(2),(3),(4),(5)' -- desired store_id's
) AS ct (product_id int, product text -- "extra" column
, s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER BY s3 DESC; -- 2. ... descending order for S3
Produces your desired result exactly (plus product_id).
To include products that have never been sold replace [INNER] JOIN with LEFT [OUTER] JOIN.
SQL Fiddle with base query.
The tablefunc module is not installed on sqlfiddle.
Major points
Read the basic explanation in the reference answer for crosstab().
I am including with product_id because product.name is hardly unique. This might otherwise lead to sneaky errors conflating two different products.
You don't need the store table in the query if referential integrity is guaranteed.
ORDER BY s3 DESC works, because s3 references the output column where NULL values have been replaced with COALESCE. Else we would need DESC NULLS LAST to sort NULL values last:
PostgreSQL sort by datetime asc, null first?
For building crosstab() queries dynamically consider:
Dynamic alternative to pivot with CASE and GROUP BY
I also want to have pagination.
That last item is fuzzy. Simple pagination can be had with LIMIT and OFFSET:
Displaying data in grid view page by page
I would consider a MATERIALIZED VIEW to materialize results before pagination. If you have a stable page size I would add page numbers to the MV for easy and fast results.
To optimize performance for big result sets, consider:
SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
Optimize query with OFFSET on large table

Query to combine two tables into one based on timestamp

I have three tables in Postgres. They are all about a single event (an occurrence, not "sports event"). Each table is about a specific item during the event.
table_header columns
gid, start_timestamp, end_timestamp, location, positions
table_item1 columns
gid, side, visibility, item1_timestamp
table_item2 columns
gid, position_id, name, item2_timestamp
I've tried the following query:
SELECT h.gid, h.location, h.start_timestamp, h.end_timestamp, i1.side,
i1.visibility, i2.position_id, i2.name, i2.item2_timestamp AS timestamp
FROM tablet_header AS h
LEFT OUTER JOIN table_item1 i1 on (i1.gid = h.gid)
LEFT OUTER JOIN table_item2 i2 on (i2.gid = i1.gid AND
i1.item1_timestamp = i2.item2_timestamp)
WHERE h.start_timestamp BETWEEN '2016-03-24 12:00:00'::timestamp AND now()::timestamp
The problem is that I'm losing some data from rows when item1_timestamp and item2_timestamp do not match.
So if I have in table_item1 and table_item2:
gid | item1_timestamp | side gid | item2_timestamp | name
---------------------------- -----------------------------------
1 | 17:00:00 | left 1 | 17:00:00 | charlie
1 | 17:00:05 | right 1 | 17:00:03 | frank
1 | 17:00:10 | left 1 | 17:00:06 | dee
I would want the final output to be:
gid | timestamp | side | name
-----------------------------
1 | 17:00:00 | left | charlie
1 | 17:00:03 | | frank
1 | 17:00:05 | right |
1 | 17:00:06 | | dee
1 | 17:00:10 | left |
based purely on the timestamp (and gid). Naturally I would have the header info in there too, but that's trivial.
I tried playing around with the query I posted used different JOINs and UNIONs, but I cannot seem to get it right. The one I posted gives the best results I could manage, but it's incomplete.
Side note: every minute or so there will be a new "event". So the gid will be unique to each event and the query needs to ensure that each dataset is paired with data from the same gid. Which is the reason for my i1.gid = h.gid lines. Data between different events should not be compared.

select t1.gid, t1.timestamp, t1.side, t2.name
from t1
left join t2 on t2.timestamp=t1.timestamp and t2.gid=t1.gid
union
select t1.gid, t1.timestamp, t1.side, t2.name
from t2
left join t1 on t2.timestamp=t1.timestamp and t2.gid=t1.gid

joining with a DISTINCT ON on an ordered subquery in sqlalchemy

Here is (an extremely simplified version of) my problem.
I'm using Postgresql as the backend and trying to build a sqlalchemy query
from another query.
Table setup
Here are the tables with some random data for the example.
You can assume that each table was declared in sqlalchemy declaratively, with
the name of the mappers being respectively Item and ItemVersion.
At the end of the question you can find a link where I put the code for
everything in this question, including the table definitions.
Some items.
item
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
A table containing versions of each item. Each has at least one.
item_version
+----+---------+---------+-----------+
| id | item_id | version | text |
+----+---------+---------+-----------+
| 1 | 1 | 0 | item_1_v0 |
| 2 | 1 | 1 | item_1_v1 |
| 3 | 2 | 0 | item_2_v0 |
| 4 | 3 | 0 | item_3_v0 |
+----+---------+---------+-----------+
The query
Now, for a given sqlalchemy query over Item, I want a function that returns
another query, but this time over (Item, ItemVersion), where the Items are
the same as in the original query (and in the same order!), and where the
ItemVersion are the corresponding latest versions for each Item.
Here is an example in SQL, which is pretty straightforward:
First a random query over the item table
SELECT item.id as item_id
FROM item
WHERE item.id != 2
ORDER BY item.id DESC
which corresponds to
+---------+
| item_id |
+---------+
| 3 |
| 1 |
+---------+
Then from that query, if I want to join the right versions, I can do
SELECT sq2.item_id AS item_id,
sq2.item_version_id AS item_version_id,
sq2.item_version_text AS item_version_text
FROM (
SELECT DISTINCT ON (sq.item_id)
sq.item_id AS item_id,
iv.id AS item_version_id,
iv.text AS item_version_text
FROM (
SELECT item.id AS item_id
FROM item
WHERE id != 2
ORDER BY id DESC) AS sq
JOIN item_version AS iv
ON iv.item_id = sq.item_id
ORDER BY sq.item_id, iv.version DESC) AS sq2
ORDER BY sq2.item_id DESC
Note that it has to be wrapped in a subquery a second time because the
DISTINCT ON discards the ordering.
Now the challenge is to write a function that does that in sqlalchemy.
Here is what I have so far.
First the initial sqlalchemy query over the items:
session.query(Item).filter(Item.id != 2).order_by(desc(Item.id))
Then I'm able to build my second query but without the original ordering. In
other words I don't know how to do the second subquery wrapping that I did in
SQL to get back the ordering that was discarded by the DISTINCT ON.
def join_version(session, query):
sq = aliased(Item, query.subquery('sq'))
sq2 = session.query(sq, ItemVersion) \
.distinct(sq.id) \
.join(ItemVersion) \
.order_by(sq.id, desc(ItemVersion.version))
return sq2
I think this SO question could be part of the answer but I'm not quite
sure how.
The code to run everything in this question (database creation, population and
a failing unit test with what I have so far) can be found here. Normally
if you can fix the join_version function, it should make the test pass!

Ok so I found a way. It's a bit of a hack but still only queries the database twice so I guess I will survive! Basically I'm querying the database for the Items first, and then I do another query for the ItemVersions, filtering on item_id, and then reordering with a trick I found here (this is also relevant).
Here is the code:
def join_version(session, query):
items = query.all()
item_ids = [i.id for i in items]
items_v_sq = session.query(ItemVersion) \
.distinct(ItemVersion.item_id) \
.filter(ItemVersion.item_id.in_(item_ids)) \
.order_by(ItemVersion.item_id, desc(ItemVersion.version)) \
.subquery('sq')
sq = aliased(ItemVersion, items_v_sq)
items_v = session.query(sq) \
.order_by('idx(array{}, sq.item_id)'.format(item_ids))
return zip(items, items_v)

Replacing a comma seperate value in table with another in select query (postgres)

I have two tables, table A has ID column whose values are comma separated, each of those ID value has a representation in table B.
Table A
+-----------------+
| Name | ID |
+------------------
| A1 | 1,2,3|
| A2 | 2 |
| A3 | 3,2 |
+------------------
Table B
+-------------------+
| ID | Value |
+-------------------+
| 1 | Apple |
| 2 | Orange |
| 3 | Mango |
+-------------------+
I was wondering if there is an efficient way to do a select where the result would as below,
Name, Value
A1 Apple, Orange, Mango
A2 Orange
A3 Mango, Orange
Any suggestions would be welcome. Thanks.

You need to first "normalize" table_a into a new table using the following:
select name, regexp_split_to_table(id, ',') id
from table_a;
The result of this can be joined to table_b and the result of the join then needs to be grouped in order to get the comma separated list of the names:
select a.name, string_agg(b.value, ',')
from (
select name, regexp_split_to_table(id, ',') id
from table_a
) a
JOIN table_b b on b.id = a.id
group by a.name;
SQLFiddle: http://sqlfiddle.com/#!12/77fdf/1

There are two regex related functions that can be useful:
http://www.postgresql.org/docs/current/static/functions-string.html
regexp_split_to_table()
regexp_split_to_array()
Code below is untested, but you'd use something like it to match A and B:
select name, value
from A
join B on B.id = ANY(regexp_split_to_array(A.id, E'\\s*,\\s*', 'g')::int[]))
You can then use array_agg(value), grouping by name, and format using array_to_string().
Two notes, though:
It won't be as efficient as normalizing things.
The formatting itself ought to be done further down, in your views.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Query join result appears to be incorrect - postgresql

Perhaps just sw.sku_warehouse_id instead of sw.warehouse_id?

Related

How to query parent child in PostgreSQL?

How to get back aggregate values across 2 dimensions using Python Cubes?

Query to combine two tables into one based on timestamp

joining with a DISTINCT ON on an ordered subquery in sqlalchemy

Replacing a comma seperate value in table with another in select query (postgres)

Categories

Resources