Postresql: How do I UPDATE null values in column with values from another table? - postgresql

Using PostgreSQL 9.2.4 ... I have three tables:
sales:
inventoryid, batchno, quantity
00001, A40-0007, 1
00002, NULL, 1
00003, C20-0039, 2
parts:
partid, name
8, A40
5, B30
3, C20
inventory:
inventoryid, partid, batcho
00001, 8, A40-0007
00002, 5, NULL
00003, 3, C20-0039
How do I UPDATE all of the NULL values in sales.batchno with the matching part name? In the end, I want the sales table to look like this:
sales:
inventoryid, batchno, quantity
00001, A40-0007, 1
00002, B30, 1
00003, C20-0039, 2

You can use update ... from:
update sales s
set batchno = p.name
from inventory i
inner join parts p on p.partid = i.partid
where i.inventoryid = s.inventoryid and s.batchno is null
Note that your design has duplicated information, so it does not look optimal.

Related

Sum of one column grouped by 2nd column with groups made based on 3rd column

Data
So my data looks like:
product user_id value id
pizza 1 50 1
burger 1 30 2
pizza 2 50 3
fries 1 10 4
pizza 3 50 5
burger 1 30 6
burger 2 30 7
Problem Statement
And I wanted to compute Lifetime values of customers of each product as a metric to know which product is doing great in terms of user retention.
Desired Output
My desired output is:
product
value_by_customers_of_these_products
total_customers
ltv
pizza
250
3
250/3 = 83.33
burger
200
2
200/2 = 100
fries
120
1
120/1 = 120
Columns Description:
value_by_customers_of_these_products : Total value generated by
customers of each product including orders which do not contain the
product
total_customers : Simple COUNT(DISTINCT user_id) GROUP BY product
Current Workaround
Currently I am doing this:
SELECT "pizza" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "pizza")
UNION ALL
SELECT "burger" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "burger")
UNION ALL
SELECT "fries" AS product, SUM(value) value_by_customers_of_these_products, COUNT(DISTINCT user_id) users FROM orders WHERE user_id in (SELECT user_id FROM orders WHERE product = "fries")
I have a python script obtaining DISTINCT product names from my table and then repeating the query string for each product and updating query from time to time. This is really a pain as I have to do every time a new product is launched and sky-rocketing length of query is another issue. How can I achieve this via built-in BigQuery functions or minimal headache?
Code to generate Sample Data
WITH orders as (SELECT "pizza" AS product,
1 AS user_id,
50 AS value, 1 AS id,
UNION ALL SELECT "burger", 1, 30,2
UNION ALL SELECT "pizza", 2, 50,3
UNION ALL SELECT "fries", 1, 10,4
UNION ALL SELECT "pizza", 3, 50,5
UNION ALL SELECT "burger", 1, 30, 6
UNION ALL SELECT "burger", 3, 30, 7)
Use below
with user_value as (
select user_id, sum(value) values
from `project.dataset.table`
group by user_id
), product_user as (
select distinct product, user_id
from `project.dataset.table`
)
select product,
sum(values) as value_by_customers_of_these_products,
count(user_id) as total_customers,
round(sum(values) / count(user_id), 2) as ltv
from product_user
join user_value
using(user_id)
group by product
if applied to sample data in your question - output is

Convert jsonb in PostgreSQL to rows without cycle

ffI have a json array stored in my postgres database. The first table "Orders" looks like this:
order_id, basket_items_id
1, {1,2}
2, {3}
3, {1,2,3,1}
Second table "Items" looks like this:
item_id, price
1,5
2,3
3,20
Already tried to load data with multiple sql and select of different jsonb record, but this is not a silver bullet.
SELECT
sum(price)
FROM orders
INNER JOIN items on
orders.basket_items_id = items.item_id
WHERE order_id = 3;
Want to get this as output:
order_id, basket_items_id, price
1, 1, 5
1, 2, 3
2, 3, 20
3, 1, 5
3, 2, 3
3, 3, 20
3, 1, 5
or this:
order_id, sum(price)
1, 8
2, 20
3, 33
demo:db<>fiddle
SELECT
o.order_id,
elems.value::int as basket_items_id,
i.price
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
ORDER BY 1,2,3
jsonb_array_elements_text expands the jsonb array into one row each element. With this you are able to join against your second table directly
Since the expanded array gives you text elements you have to cast them into integers using ::int
Of course you can GROUP and SUM aggregate this as well:
SELECT
o.order_id,
SUM(i.price)
FROM
orders o, jsonb_array_elements_text(basket_items_id) as elems
LEFT JOIN items i
ON i.item_id = elems.value::int
GROUP BY o.order_id
ORDER BY 1
Is your orders.basket_items_id column of type jsonb or int[]?
If the type is jsonb you can use json_array_elements_text to expand the column:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
jsonb_array_elements_text(basket_items_id)::int basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-Fiddle.
If the type is int[] (array of integers), you can run a similar query with the unnest function:
SELECT
o.order_id,
o.basket_item_id,
items.price
FROM
(
SELECT
order_id,
unnest(basket_items_id) basket_item_id
FROM
orders
) o
JOIN
items ON o.basket_item_id = items.item_id
ORDER BY
1, 2, 3;
See this DB-fiddle

Limit query by count distinct column values

I have a table with people, something like this:
ID PersonId SomeAttribute
1 1 yellow
2 1 red
3 2 yellow
4 3 green
5 3 black
6 3 purple
7 4 white
Previously I was returning all of Persons to API as seperate objects. So if user set limit to 3, I was just setting query maxResults in hibernate to 3 and returning:
{"PersonID": 1, "attr":"yellow"}
{"PersonID": 1, "attr":"red"}
{"PersonID": 2, "attr":"yellow"}
and if someone specify limit to 3 and page 2(setMaxResult(3), setFirstResult(6) it would be:
{"PersonID": 3, "attr":"green"}
{"PersonID": 3, "attr":"black"}
{"PersonID": 3, "attr":"purple"}
But now I want to select people and combine then into one json object to look like this:
{
"PersonID":3,
"attrs": [
{"attr":"green"},
{"attr":"black"},
{"attr":"purple"}
]
}
And here is the problem. Is there any possibility in postgresql or hibernate to set limit not by number of rows but to number of distinct people ids, because if user specifies limit to 4 I should return person1, 2, 3 and 4, but in my current limiting mechanism I will return person1 with 2 attributes, person2 and person3 with only one attribute. Same problem with pagination, now I can return half of a person3 array attrs on one page and another half on next page.
You can use row_number to simulate LIMIT:
-- Test data
CREATE TABLE person AS
WITH tmp ("ID", "PersonId", "SomeAttribute") AS (
VALUES
(1, 1, 'yellow'::TEXT),
(2, 1, 'red'),
(3, 2, 'yellow'),
(4, 3, 'green'),
(5, 3, 'black'),
(6, 3, 'purple'),
(7, 4, 'white')
)
SELECT * FROM tmp;
-- Returning as a normal column (limit by someAttribute size)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3;
-- Returning as a normal column (overall limit)
SELECT * FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 4;
-- Returning as a JSON column (limit by someAttribute size)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute",
row_number() OVER(PARTITION BY "PersonId" ORDER BY "PersonId") AS rownum
from
person) as tmp
WHERE rownum <= 3 GROUP BY "PersonId";
-- Returning as a JSON column (limit by person)
SELECT "PersonId", json_object_agg('color', "SomeAttribute") AS attributes FROM (
select
"PersonId",
"SomeAttribute"
from
person) as tmp
GROUP BY "PersonId"
LIMIT 4;
In this case, of course, you must use a native query, but this is a small trade-off IMHO.
More info here and here.
I'm assuming you have another Person table. With JPA, you should do the query on Person table(one side), not on the PersonColor(many side).Then the limit will be applied on number of rows of Person then
If you don't have the Person table and can't modify the DB, what you can do is use SQL and Group By PersonId, and concatenate colors
select PersonId, array_agg(Color) FROM my_table group by PersonId limit 2
SQL Fiddle
Thank you guys. After I realize that it could not be done with one query I just do sth like
temp_query = select distinct x.person_id from (my_original_query) x
with user specific page/per_page
and then:
my_original_query += " AND person_id in (temp_query_results)

PostgreSQL, Custom aggregate

Is here way to get function like custom aggregate when MAX and SUM is not enough to get result?
Here is my table:
DROP TABLE IF EXISTS temp1;
CREATE TABLE temp1(mydate text, code int, price decimal);
INSERT INTO temp1 (mydate, code, price) VALUES
('01.01.2014 14:32:11', 1, 9.75),
( '', 1, 9.99),
( '', 2, 40.13),
('01.01.2014 09:12:04', 2, 40.59),
( '', 3, 18.10),
('01.01.2014 04:13:59', 3, 18.20),
( '', 4, 10.59),
('01.01.2014 15:44:32', 4, 10.48),
( '', 5, 8.19),
( '', 5, 8.24),
( '', 6, 11.11),
('04.01.2014 10:22:35', 6, 11.09),
('01.01.2014 11:48:15', 6, 11.07),
('01.01.2014 22:18:33', 7, 22.58),
('03.01.2014 13:15:40', 7, 21.99),
( '', 7, 22.60);
Here is query for getting result:
SELECT code,
ROUND(AVG(price), 2),
MAX(price)
FROM temp1
GROUP BY code
ORDER BY code;
In short:
I have to get LAST price by date (written as text) for every grouped code if date exists otherwise (if date isn't written) price should be 0.
In column LAST is wanted result and result of AVG and MAX for illustration:
CODE LAST AVG MAX
------------------------------
1 9.75 9.87 9.99
2 40.59 40.36 40.59
3 18.20 18.15 18.20
4 10.48 10.54 10.59
5 0.00 8.22 8.24
6 11.09 11.09 11.11
7 21.99 22.39 22.60
How would I get wanted result?
How that query would look like?
EDITED
I simply have to try 'IMSoP's advices to update and use custom aggregate functions first/last.
SELECT code,
CASE WHEN MAX(mydate)<>'' THEN
(SELECT last(price ORDER BY TO_TIMESTAMP(mydate, 'DD.MM.YYYY HH24:MI:SS')))
ELSE
0
END AS "LAST",
ROUND(AVG(price), 2) AS "AVG",
MAX(price) AS "MAX"
FROM temp1
GROUP BY code
ORDER BY code;
With this simple query I get same results as with Mike's complex query.
And more, those one better consumes double (same) entries in mydate column, and is faster.
Is this possible? It look's similar to 'SELECT * FROM magic()' :)
You said in comments that one code can have two rows with the same date. So this is sane data.
01.01.2014 1 3.50
01.01.2014 1 17.25
01.01.2014 1 99.34
There's no deterministic way to tell which of those rows is the "last" one, even if you sort by code and "date". (In the relational model--a model based on mathematical sets--the order of columns is irrelevant, and the order of rows is irrelevant.) The query optimizer is free to return rows is the way it thinks best, so this query
select *
from temp1
order by mydate, code
might return this on one run,
01.01.2014 1 3.50
01.01.2014 1 17.25
01.01.2014 1 99.34
and this on another.
01.01.2014 1 3.50
01.01.2014 1 99.34
01.01.2014 1 17.25
Unless you store some value that makes the meaning of last obvious, what you're trying to do isn't possible. When people need to make last obvious, they usually use a timestamp.
After your changes, this query seems to return what you're looking for.
with distinct_codes as (
select distinct code
from temp1
),
corrected_table as (
select
case when mydate <> '' then TO_TIMESTAMP(mydate, 'DD.MM.YYYY HH24:MI:SS')
else null
end as mydate,
code,
price
from temp1
),
max_dates as (
select code, max(mydate) max_date
from corrected_table
group by code
)
select c1.mydate, d1.code, coalesce(c1.price, 0)
from corrected_table c1
inner join max_dates m1
on m1.code = c1.code
and m1.max_date = c1.mydate
right join distinct_codes d1
on d1.code = c1.code
order by code;

t-sql outer join across three tables

I have three tables:
CREATE TABLE person
(id int,
name char(50))
CREATE TABLE eventtype
(id int,
description char(50))
CREATE TABLE event
(person_id int,
eventtype_id int,
duration int)
What I want is a single query which gives me a list of the total duration of each eventtype for each person, including all zero entries. E.g. if there are 10 people and 15 different eventtypes, there should be 150 rows returned, irrespective of the contents of the event table.
I can get an outer join to work between two tables (e.g. durations for all eventtypes), but not with a second outer join.
Thanks!
You'll have to add a CROSS APPLY to the mix to get the non-existing relations.
SELECT q.name, q.description, SUM(q.Duration)
FROM (
SELECT p.Name, et.description, Duration = 0
FROM person p
CROSS APPLY eventtype et
UNION ALL
SELECT p.Name, et.description, e.duration
FROM person p
INNER JOIN event e ON e.person_id = p.id
INNER JOIN eventtype et ON et.id = e.eventtypeid
) q
GROUP BY
q.Name, q.description
You can cross join person and eventtype, and then just join the result to the event table:
SELECT
p.Name,
et.Description,
COALESCE(e.duration,0)
FROM
person p
cross join
eventtype et
left join
event e
on
p.id = e.person_id and
et.id = e.eventtype_id
A cross join is one where, for each row in the left table, it's joined to every row in the right table.
If you want a row for every combination of person and eventtype, that suggets a CROSS JOIN. To get the duration we need to join to event, but this needs to be an OUTER join since there might not always be a row. Your use of "total" suggests there there could be more than one event for a given combination of person and event, so we'll need a SUM in there as well.
Sample data:
insert person values ( 1, 'Joe' )
insert person values ( 2, 'Bob' )
insert person values ( 3, 'Tim' )
insert eventtype values ( 1, 'Cake' )
insert eventtype values ( 2, 'Pie' )
insert eventtype values ( 3, 'Beer' )
insert event values ( 1, 1, 10 )
insert event values ( 1, 2, 10 )
insert event values ( 1, 2, 5 )
insert event values ( 2, 1, 10 )
insert event values ( 2, 2, 7 )
insert event values ( 3, 2, 8 )
insert event values ( 3, 3, 16 )
insert event values ( 1, 1, 10 )
The query:
SELECT
PET.person_id
, PET.person_name
, PET.eventtype_id
, PET.eventtype_description
, ISNULL(SUM(E.duration), 0) total_duration
FROM
(
SELECT
P.id person_id
, P.name person_name
, ET.id eventtype_id
, ET.description eventtype_description
FROM
person P
CROSS JOIN eventtype ET
) PET
LEFT JOIN event E ON PET.person_id = E.person_id
AND PET.eventtype_id = E.eventtype_id
GROUP BY
PET.person_id
, PET.person_name
, PET.eventtype_id
, PET.eventtype_description
Output:
person_id person_name eventtype_id eventtype_description total_duration
----------- ----------- ------------ --------------------- --------------
1 Joe 1 Cake 20
1 Joe 2 Pie 15
1 Joe 3 Beer 0
2 Bob 1 Cake 10
2 Bob 2 Pie 7
2 Bob 3 Beer 0
3 Tim 1 Cake 0
3 Tim 2 Pie 8
3 Tim 3 Beer 16
Warning: Null value is eliminated by an aggregate or other SET operation.
(9 row(s) affected)