Take the following example:
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
Now I have two concurrent transactions that take the last operation, add the value and then save (insert). The values are never updated.
Transaction 1
Takes the last operation (3) and locks it "for update" and inserts
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
4 + 10.00 170.00
Transaction 2
Takes the last operation and locks it "for update". Since the first transaction has a "for update" lock, it waits to get the latest. In this example transaction 2 began before transaction 1 ended.
The actual row returned from postgresql is #3 (after the fist lock is released). So it ends up like this:
Operation Signal Value Balance
1 + 100.00 100.00
2 + 50.00 150.00
3 + 10.00 160.00
4 + 10.00 170.00
5 + 10.00 170.00
So the balance ends up 170.00. The desired is to be 180.00.
This is in the READ_COMMITTED transaction mode.
With SERIALIZABLE mode Transaction 2 throws an error about concurrency.
I have tried the following:
Selected the latest row with "for update";
Selected a row with this_is_the_last = true "for update", changed it to false and then inserted a new one with this_is_the_last = true. PostgreSQL ends up returning a row with this_is_the_last = false even if it's not (on transaction 2 after transaction 1 relases the for update lock).
Is there a way to make a row level lock and make transaction 2 wait for transaction 1 in a way that transaction 2 will not select the same "latest" as transaction 1?
This kind of “lost update” need never happen in PostgreSQL, even if you use READ COMMITTED isolation.
Define operation as the primary key constraint on the table.
Each of your transactions will execute:
INSERT INTO transaction (operation, signal, value, balance)
SELECT operation + 1,
'+',
10,
balance + 10
FROM transaction
ORDER BY operation DESC
LIMIT 1;
Then if there is a concurrent transaction, one of the two will receive a primary key violation, since operation is unique. That transaction simply retries the operation until it succeeds.
Related
I have something similar to the following table, which is a randomly ordered list of thousands of transactions with a Customer_ID and an order_cost for each transaction.
Customer_ID
order_cost
1
$503
53
$7
4
$80
13
$76
6
$270
78
$2
8
$45
910
$89
10
$3
1130
$43
etc...
etc...
I want to group the transactions by Customer_ID, aggregate the cost of all the orders into a spending column, and then create a new "decile" row that would assign a number 1-10 to each customer so that when the "spending" for all customers in a decile is added up, each decile contains 10% of all the spending.
The resulting table would look something like the table below where each ascending decile will contain fewer customers, but the total sum of "spending" for all the records in each decile group will be the same for deciles 1-10. (The actual numbers in this sample column don't actually add up, it's just the concept)
Customer_ID
spending
Decile
45
$500
1
3
$700
1
349
$800
1
23
$1,000
1
64
$2,000
1
718
$2,100
1
3452
$2,300
1
1276
$2,600
2
10
$3,000
2
34
$4,000
2
etc...
etc...
etc...
So far I have grouped by Customer_ID, aggregated the order_cost to a spending column, ordered each customer in ascending order based on the spending column, and then partitioned all the customers into 5000 groups. From there I manually found the values for each .when statement that would result in deciles 1-10 each containing the right amount of customers so each decile has 10% of the sum of the entire spending column. It's pretty time-consuming to use trial and error to find the right bucket configuration that results in each decile having 10% of the spending column.
I'm trying to find a way to automate this process so I don't have to find the right bucketing ratio for each decile by trial and error.
This is my code so far:
Import pyspark.sql.functions as F
deciles = (table
.groupBy('Customer_ID')
.agg(F.sum('order_cost').alias('spending')).alias('a')
.withColumn('rank', F.ntile (5000).over(W.Window.partitionBy()
.orderBy(F.asc('spending'))))
.withColumn('rank', F.when(F.col('rank')<=4628, F.lit(1))
.when(F.col('rank')<=4850, F.lit(2))
.when(F.col('rank')<=4925, F.lit(3))
.when(F.col('rank')<=4965, F.lit(4))
.when(F.col('rank')<=4980, F.lit(5))
.when(F.col('rank')<=4987, F.lit(6))
.when(F.col('rank')<=4993, F.lit(7))
.when(F.col('rank')<=4997, F.lit(8))
.when(F.col('rank')<=4999, F.lit(9))
.when(F.col('rank')<=5000, F.lit(10))
.otherwise (F.lit(0)))
)
end_table = (table.alias('a').join(deciles.alias('b'), ['Customer_ID'], 'left')
.selectExpr('a.*', 'b.rank')
)
I have an orders table with datetime when an order was placed, and when it was completed:
orderid
userid
price
status
createdat
doneat
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
I would like to have a new column that is the cumulative total (sum price) per user when the order was created:
orderid
userid
price
status
createdat
doneat
cumulative total when placed (per user)
1
128
100
completed
2/16/21 18:40:45
2/21/21 07:59:46
0
2
128
150
completed
2/21/21 05:27:29
2/23/21 11:58:23
0
3
128
100
completed
9/3/21 08:38:14
9/10/21 14:24:35
250
4
5
100
completed
5/28/22 23:28:07
6/26/22 06:10:35
0
5
5
100
canceled
7/8/22 22:28:57
8/10/22 06:55:17
100
6
5
100
completed
7/25/22 13:46:38
8/10/22 06:57:20
100
7
5
5
completed
8/7/22 18:07:07
8/12/22 06:56:23
100
The logic is sum the price for each user for all orders that were completed before the current row's created at date. For orderid=2, although it's the user's 2nd order, there are no orders that were completed before its createdat datetime of 2/21/21 05:27:29, so the cumulative total when placed is 0.
The same for orderid in [5,6,7]. For those orders and that userid, the only order that was completed before their createdat dates is order 4, so their cumulative total when placed is 100.
In PowerBI the logic is like this:
SUMX (
filter(
orders,
earlier orders.userid = orders.userid && orders.doneat < orders.createdat && order.status = 'completed'),
orders.price)
Would anyone have any hints of how to achieved this in postgresql?
I tried something like this and it didn't work.
select (case when o.doneat < o.createdat over (partition by o.userid, o.status order by o.createdat)
then sum(o.price) over (partition by o.userid, o.status ORDER BY o.doneat asc ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
end) as cumulativetotal_whenplaced
from order o
Thank you
You can duplicate each row into:
an "original" (which we'll decorate with a flag keep = true), that has an accounting value val = 0 (so far), and a time t = createdat;
a "duplicate" (keep = false), that has the price to account for (if status is 'completed') as val and a time t = doneat.
Then it's just a matter of accounting for the right bits:
select orderid, userid, price, status, createdat, doneat, cumtot
from (
select *, sum(val) over (partition by userid order by t, keep desc) as cumtot
from (
select *, createdat as t, 0 as val, true as keep from foo
union all
select *, doneat as t,
case when status = 'completed' then price else 0 end as val,
false as keep
from foo
) as a
) as a
where keep
order by orderid;
Example: DB Fiddle.
Note for RedShift: the window expression above needs to be replaced by:
...
select *, sum(val) over (
partition by userid order by t, keep desc
rows unbounded preceding) as cumtot
...
Result for your data:
orderid
userid
price
status
createdat
doneat
cumtot
1
128
100
completed
2021-02-16T18:40:45.000Z
2021-02-21T07:59:46.000Z
0
2
128
150
completed
2021-02-21T05:27:29.000Z
2021-02-23T11:58:23.000Z
0
3
128
100
completed
2021-09-03T08:38:14.000Z
2021-09-10T14:24:35.000Z
250
4
5
100
completed
2022-05-28T23:28:07.000Z
2022-06-26T06:10:35.000Z
0
5
5
100
canceled
2022-07-08T22:28:57.000Z
2022-08-10T06:55:17.000Z
100
6
5
100
completed
2022-07-25T13:46:38.000Z
2022-08-10T06:57:20.000Z
100
7
5
5
completed
2022-08-07T18:07:07.000Z
2022-08-12T06:56:23.000Z
100
Note: this type of accounting across time is actually robust to many corner-cases (various orders overlapping, some starting and finishing while others are still in process, etc.) It is the basis for a fast interval compaction algorithm that I should describe someday on SO.
Bonus: try to figure out why the partitioning window is ordered by t (fairly obvious) and also by keep desc (less obvious).
I have a bunch of table, say, Job and transactions
Job:
id Name
4 Clay
6 Glow
7 Circle
9 Jam
Transactions
Id Job_id Person Marks
2 6 Amy 0
3 3 Keith 30
5 3 Glass 10
7 9 Know 60
11 6 Play 81
13 6 Play 100
How do I find below return query which should return three column Job_id(Id of Job), Job_name(name of job) and level , which is one of three possible Strings: "Hard", "Easy", "Medium".
**Job_id** **Job_name** **Level**
----------------------------------------------------
4 Clay Hard
6 Glow Easy
9 Jam Medium
Level is calculated if average score on Transactions,
-- If average score for job is lower than or equal to 20, then its level is "Hard".
-- If average score for job is higher than 20 but lower than or equal to 60, then its level is "Medium".
-- If average score for job is higher than 60, then its level is "Easy".
I'm not sure if I should use a subQuery for this, or if there's an easier way.
Thanks!
select j.id,j.name,if(avg(t.marks)> 60,'Easy',if(avg(t.marks) <= 60 and avg(t.marks) > 20,'Medium','Hard')) as level from job j left join Transactions t on j.id= t.Job_id where t.id is not null group by Job_id
I just wrote down the query by looking at your data, it will be good if you provide schema with data in sqlfiddle etc. You can try above query
for SQLLite use as below
select j.id AS Job_id,
j.name AS Job_name,
CASE
WHEN avg(t.marks)>60 THEN 'EASY'
WHEN avg(t.marks)<=60 and avg(t.marks)>20 THEN 'MEDIUM'
WHEN avg(t.marks)<=20 THEN 'HARD'
END Level
from job j
left join Transactions t
on j.id = t.Job_id
where t.id is not null
group by Job_id;
I have a hard time figuring this one out. And i apologize if the answer is out there, i have searched through all of stackoverflow.
I have an order system, where i order by tableID, and continue to make rows in my Database if someone orders stuff. My problem is getting my data out on 1 row, in this example the table ID, so all the ordered stuff, shows on 1 line only.
I have 3 tables, Food, Drink, Dessert, all with a foreign key in my OrderTable.
id fk_tableid fk_drinkId fk_foodId fk_dessId amount
1 5 2 0 0 2
2 5 0 1 0 1
3 5 0 2 0 1
4 5 0 0 2 2
11 8 1 0 0 2
21 1 1 0 0 5
22 1 0 1 0 9
23 1 0 0 1 2
By a normal select, with left joins, i can get the data out on multiple rows, like this where i get those with tableId 5 and showing the name of the ordered consumable also:
id fk_tableId fk_drinkId fk_foodId fk_dessId amount foodName drinkName dessertName
1 5 2 0 0 2 NULL Sodavand NULL
2 5 0 1 0 1 Lasagne NULL NULL
3 5 0 2 0 1 Pizza NULL NULL
4 5 0 0 2 2 NULL NULL Softice
I tried using group_concat also, which put data on 1 line, but it seems to put everything on 1 line, not just grouped by tableId.
How i want it to be is something like this (the 2x Lasagne for example, is just how i want it to look at the site, but maybe i need to use 1xLasagne twice instead. It would just look messy with 1x Beer 10 times.):
fk_tableId fk_drinkId fk_foodId fk_dessId foodName drinkName dessertName fulllPrice
5 2,2 1,2 2,2 Pizza,Lasagne 2xSodavand 2xSoftice 195
I am aware my question might be wrongly formatted, but i also have a hard time 100% knowing what to google and search for here. I have tried for 2 days, and even asked my teacher who could not do it, he was doing something with CASES and sum(), but it did not work out either.
Any help will be much appreciated!
UPDATE:
Added SQL Query:
SELECT
menukort_ordre.id,
fk_tableId,
fk_drinkId,
fk_foodId,
fk_dessId,
amount,
menukort_food.name AS foodName,
menukort_drink.name AS drinkName,
menukort_dessert.name AS dessertName
FROM menukort_ordre
LEFT Join menukort_drink
ON (menukort_ordre.fk_drinkId = menukort_drink.id)
LEFT Join menukort_food
ON (menukort_ordre.fk_foodId = menukort_food.id)
LEFT Join menukort_dessert
ON (menukort_ordre.fk_dessId = menukort_dessert.id)
WHERE fk_tableid = fk_tableid
With GROUP_CONCAT i tried to do this instead, which put it on 1 row, but due to my WHERE, i get all data on 1 row.
GROUP_CONCAT(menukort_food.name ) AS foodName,
GROUP_CONCAT(menukort_drink.name) AS drinkName,
GROUP_CONCAT(menukort_dessert.name) AS dessertName
UPDATE:
First off I changed your database design since there was no need for 3 tables like that unless you really wanted them to be separated as such. I understand wanting to separate data, but there are times to do so and times not to do it. I understand since my personal database project has me breaking everything up. So the below solution will be based off of my design which is as follows.
Category
Code or ID (PK)
Category
This table will be a lookup table just to make sure drink and food and desert is spelled correctly. Frankly you don't need it unless you need that information specific and want it to be correct.
Next will be the table that stores the drinks, deserts, and food
Items
ID serial
Category
Name
Price
and final the order table that will keep track of the orders
Order
BillID
TableNum
ItemNum (fk)
ID (PK)
This way you can keep track of which table the food goes to and each check or bill. Your design was fine if you wanted to find out how much each table made in a day, but I'm assuming like an actual restaurant you would want to know for each bill. With this you can have multiple orders of a coke or whatever at the same table on the same bill.
Now on to the solution.
This doesn't have the count, but could work on it if you really need it, but frankly I think it is pointless to have a count unless you are going to ungroup the results and have something like this:
tableNum BillNum ItemNum ItemName
1 1 1 Coke
1 1 2 Steak
1 1 3 Pasta
1 1 1 Coke
then you could end up with something like this
tableNum BillNum ItemNum ItemName TimesBy
1 1 1 Coke 2
1 1 2 Steak 1
1 1 3 Pasta 1
The SQL CODE below will give you what you need I believe. I'm using my version of the database and I think you should too just because it is easier and there is no point to having 3 tables for each thing.
CREATE TEMPORARY TABLE IF NOT EXISTS table2 AS (
select BillID, tablenum,ItemNum,Items.name,Items.price
from Orders, Items
where Orders.ItemNum=Items.id
);
create TEMPORARY TABLE IF NOT EXISTS table3 AS (
select SUM(price) as total, BillID
from table2
group by BillID
);
select table3.BillID, TableNum, GROUP_CONCAT(ItemNum order by ItemNum ASC) as ItemNum, GROUP_CONCAT(name order by name ASC) as Item, GROUP_CONCAT(price order by name asc) as ItemPrice, total
from table2, table3
where table2.BillID=table3.BillID
group by BillID;
DROP TABLE IF EXISTS table2;
DROP TABLE IF EXISTS table3;
A few other solutions would be to look into using something like php (programming) to help with this or a stored procedure.
If you need an explanation just ask. Also I'm curious is this for homework or a project? I just want to know why your doing this?
Hope it helps.
I have a table with a column called "Priority". No two records should have the same priority value.
If I add a new entry with the same priority value as an existing record, it should increment the
priority of the other records that follow IF the previous row's increment causes a duplicate priority.
For example:
We want to insert a record with a priority of 2.
(BEFORE)
priority
1
2
3
5
(AFTER)
Priority
1
2
3
4
5
Another example:
Insert a record with a priority of 2
(BEFORE)
priority
1
2
3
5
7
(AFTER)
Priority
1
2
3
4
5
7
I am doing this with the following code and it works as long as there are no gaps in sequence:
UPDATE MyTable SET Priority = Priority + 1
WHERE LocationId = #LocationId AND Priority >= #priorityToInsert
The problem is that this update statement increments ALL priority values. Example #2 above fails
because the record with priority 7 gets incremented to 8 when it shouldn't.
Please help!
You could use something like this:
UPDATE MyTable t1
SET t1.Priority = t1.Priority + 1
WHERE t1.LocationId = #LocationId
AND t1.Priority >= #priorityToInsert
AND NOT EXISTS (
SELECT *
FROM MyTable t2
WHERE t2.LocationId = #LocationId
AND t2.Priority = t1.Priority - 1
)
The problem is that if you also had an 8 in your case, it WOULD advance the 8 to 9 because it sees the 7, even though the 7 wouldn't advance.
Because it's recursive like this, you would probably have to use a recursive or iterative technique.
I would think about a different design or perhaps something with triggers.