MySQL 5.6 SELECT UNION Query - select

The SELECT UNION query below functions as needed.
We want to display different columns from the same table and same entry as separate rows.
I know there has to be a better / cleaner way.
Please advise.
SELECT task1 AS Job FROM prevent WHERE task1 != "" AND eq = ? AND id = ? AND pmTask LIKE ?
UNION
SELECT task2 AS Job FROM prevent WHERE task2 != "" AND eq = ? AND id = ? AND pmTask LIKE ?
UNION
SELECT task3 AS Job FROM prevent WHERE task3 != "" AND eq = ? AND id = ? AND pmTask LIKE ?
Here is a snapshot of the db entry displayed below.

Your query is the actual best way of doing it.
In MySQL 8.x you could use a CTE to simplify the code, but since you are using 5.6, I don't see a better option.
To be brutally honest, your database model is not that great. It will support a fixed/maximum number of tasks per job since their are included as columns of the table rather than as 1:n relationship with another table. This design will generate a lot more work, every time you need to store and retrieve data from it. Unfortunately, unless you change your model, there's little you can do to improve those queries.
EDIT AS OF Jan 29, 2020:
From your description, jobs and tasks seem to be two different entities. If that's the case, instead of using a single table to store both of them it's advisable to use two.
For example:
create table prevent (
id int primary key not null,
eq int,
eqName varchar(50),
timeFrame int,
pmTask varchar(50)
);
create table task (
prevent_id int primary key not null,
task_number int,
name varchar(100),
constraint fk_task_job1 foreign key (job_id) references prevent (id)
);
insert into prevent (id, eq, eqName, timeFrame, pmTask)
values (910, 910, 'Boiler', 90, 'Boiler Quarterly Check');
insert into task (prevent_id, task_number, name) values (910, 1, 'job work tasks');
insert into task (prevent_id, task_number, name) values (910, 2, 'check oil levels');
insert into task (prevent_id, task_number, name) values (910, 3, 'belt tension');
Then your query would be:
select
t.task_number,
t.name
from prevent j
join task t on t.prevent_id = j.id
where t.name <> ''
and j.id = ?
and j.pmTask like ?
order by t.task_number

Related

Can postgreSQL OnConflict combine with JSON obejcts?

I wanted to perform a conditional insert in PostgreSQL. Something like:
INSERT INTO {TABLE_NAME} (user_id, data) values ('{user_id}', '{data}')
WHERE not exists(select 1 from files where user_id='{user_id}' and data->'userType'='Type1')
Unfortunately, insert and where does not cooperate in PostGreSQL. What could be a suitable syntax for my query? I was considering ON CONFLICT, but couldn't find the syntax for using it with JSON object. (Data in the example)
Is it possible?
Rewrite the VALUES part to a SELECT, then you can use a WHERE condition:
INSERT INTO { TABLE_NAME } ( user_id, data )
SELECT
user_id,
data
FROM
( VALUES ( '{user_id}', '{data}' ) ) sub ( user_id, data )
WHERE
NOT EXISTS (
SELECT 1
FROM files
WHERE user_id = '{user_id}'
AND data -> 'userType' = 'Type1'
);
But, there is NO guarantee that the WHERE condition works! Another transaction that has not been committed yet, is invisible to this query. This could lead to data quality issues.
You can use INSERT ... SELECT ... WHERE ....
INSERT INTO elbat
(user_id,
data)
SELECT 'abc',
'xyz'
WHERE NOT EXISTS (SELECT *
FROM files
WHERE user_id = 'abc'
AND data->>'userType' = 'Type1')
And it looks like you're creating the query in a host language. Don't use string concatenation or interpolation for getting the values in it. That's error prone and makes your application vulnerable to SQL injection attacks. Look up how to use parameterized queries in your host language. Very likely for the table name parameters cannot be used. You need some other method of either whitelisting the names or properly quoting them.

Is it possible to look at the output of the previous row of a PostgreSQL query?

This is the question: Is it possible to look at the outputs, what has been selected, from the previous row of a running SQL query in Postgres?
I know that lag exists to look at the inputs, the "from" of the query. I also know that a CTE, subquery or lateral join can solve most issues of this kind. But I think the problem I'm facing genuinely requires a peek at the output of the previous row. Why? Because the output of the current row depends on a constant from a lookup table and the value used too look up that constant is an aggregate of all the previous rows. And if that lookup returns the wrong constant all subsequent rows will be increasingly off from the expected value.
The whole rest of this text is a simplified example based on the problem I'm facing. It should be possible to input it to PostgreSQL 12 and above and play around. I'm terribly sorry that it is as complicated as it is, but I think it is the most simple I can make it while still retaining the core issue: lookup in lookup table based on an aggregate from all previous rows as well as the fact that the "inventory" that's being tracked is modeled as a series of transactions of two discrete types.
The database itself exists to keep track of multiple fish farms, or cages full of fish. Fish can be moved/transferred from between these farms and the farms are fed about daily. Why not just carry the aggregate as a field in the table? Because it should be possible to switch out the lookup table after the season is over, to adjust it to better match with reality.
-- A listing of all groups of fish ever grown.
create table farms (
id bigserial primary key,
start timestamp not null,
stop timestamp
);
insert into farms
(id, start)
values (
1, '2021-02-01T13:37'
);
-- A transfer of fish from one odling to another.
-- If the source is null the fish is transferred from another fishery outside our system.
-- If the destination is null the fish is being slaughtered, removed from the system.
create table transfers (
source bigint references farms(id),
destination bigint references farms(id),
timestamp timestamp not null default current_timestamp,
total_weight_g bigint not null constraint positive_nonzero_total_weight_g check (total_weight_g > 0),
average_weight_g bigint not null constraint positive_nonzero_average_weight_g check (average_weight_g > 0),
number_fish bigint generated always as (total_weight_g / average_weight_g) stored
);
insert into transfers
(source, destination, timestamp, total_weight_g, average_weight_g)
values
(null, 1, '2021-02-01T16:38', 5, 5),
(null, 1, '2021-02-15T16:38', 500, 500);
-- Transactions of fish feed into a farm.
create table feedings (
id bigserial primary key,
growth_table bigint not null,
farm bigint not null references farms(id),
amount_g bigint not null constraint positive_nonzero_amunt_g check (amount_g > 0),
timestamp timestamp not null
);
insert into feedings
(farm, growth_table, amount_g, timestamp)
values
(1, 1, 1, '2021-02-02T13:37'),
(1, 1, 1, '2021-02-03T13:37'),
(1, 1, 1, '2021-02-04T13:37'),
(1, 1, 1, '2021-02-05T13:37'),
(1, 1, 1, '2021-02-06T13:37'),
(1, 1, 1, '2021-02-07T13:37');
create view combined_feed_and_transfer_history as
with transfer_history as (
select timestamp, destination as farm, total_weight_g, average_weight_g, number_fish
from transfers as deposits
where deposits.destination = 1 -- TODO: This view only works for one farm, fix that.
union all
select timestamp, source as farm, -total_weight_g, -average_weight_g, -number_fish
from transfers as withdrawals
where withdrawals.source = 1
)
select timestamp, farm, total_weight_g, number_fish, average_weight_g, null as growth_table
from transfer_history
union all
select timestamp, farm, amount_g, 0 as number_fish, 0 as average_weight_g, growth_table
from feedings
order by timestamp;
-- Conversion tables from feed to gained weight.
create table growth_coefficients (
growth_table bigserial not null,
average_weight_g bigint not null constraint positive_nonzero_weight check (average_weight_g > 0),
feed_conversion_rate double precision not null constraint positive_foderkonverteringsfaktor check (feed_conversion_rate >= 0),
primary key(growth_table, average_weight_g)
);
insert into growth_coefficients
(average_weight_g, feed_conversion_rate, growth_table)
values
(5.00,0.10,1),
(10.00,10.00,1),
(20.00,1.30,1),
(50.00,1.31,1),
(100.00,1.32,1),
(300.00,1.36,1),
(600.00,1.42,1),
(1000.00,1.50,1),
(1500.00,1.60,1),
(2000.00,1.70,1),
(2500.00,1.80,1),
(3000.00,1.90,1),
(4000.00,2.10,1),
(5000.00,2.30,1);
-- My current solution is a bad one. It does a CTE that sums over all events but does not account
-- for the feed conversion rate. That means that the average weight used too look up the feed
-- conversion rate will diverge more and more from reality the further into the season time goes.
-- This is why it is important to look at the output, the average weight, of the previous row.
-- We start by summing up all the transfer and feed events to get a rough average_weight_g.
with estimate as (
select
timestamp,
farm,
total_weight_g as transaction_size_g,
growth_table,
sum(total_weight_g) over (order by timestamp) as sum_weight_g,
sum(number_fish) over (order by timestamp) as sum_number_fish,
sum(total_weight_g) over (order by timestamp) / sum(number_fish) over (order by timestamp) as average_weight_g
from
combined_feed_and_transfer_history
)
select
timestamp,
sum_number_fish,
transaction_size_g as trans_g,
sum_weight_g,
closest_lookup_table_weight.average_weight_g as lookup_g,
converted_weight_g as conv_g,
sum(converted_weight_g) over (order by timestamp) as sum_conv_g,
sum(converted_weight_g) over (order by timestamp) / sum_number_fish as sum_average_g
from
estimate
join lateral ( -- We then use this estimated_average_weight to look up the closest constant in the growth coefficient table.
(select gc.average_weight_g - estimate.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g >= estimate.average_weight_g order by gc.average_weight_g asc limit 1)
union all
(select estimate.average_weight_g - gc.average_weight_g as diff, gc.average_weight_g from growth_coefficients gc where gc.average_weight_g <= estimate.average_weight_g order by gc.average_weight_g desc limit 1)
order by diff
limit 1
) as closest_lookup_table_weight
on true
join lateral ( -- If the historical event is a feeding we need to lookup the feed conversion rate.
select case when growth_table is null then 1
else (select feed_conversion_rate
from growth_coefficients gc
where gc.growth_table = growth_table
and gc.average_weight_g = closest_lookup_table_weight.average_weight_g)
end
) as growth_coefficient
on true
join lateral (
select feed_conversion_rate * transaction_size_g as converted_weight_g
) as converted_weight_g
on true;
At the very bottom is my current "solution". With the above example data the sum_conv_g should end up being 5.6, but due to the aggregate being used as the lookup not accounting for the conversion rate the sum_conv_g ends up 45.2 instead.
One idea I had was if there perhaps something like query-local variables one could use to store the sum_average_g between rows? There's always the escape hatch of just querying out the transactions to my generic programming language Clojure and solving it there, but it would be neat if it could be solved entirely within the database.
You have to formulate a recursive subquery. I posted a simplified version of this question over at the DBA SE and got the answer there. The answer to that question can be found here and can be expanded to this more complicated question, though I would wager that no one will ever have the interest to do that.

Is it possible to bulk update specific values in postgresql efficiently?

I have created a pipeline which is required to update a high number of rows in postgres where each row should be updated differently.
After looking up I found that this could be done using postgres UPDATE.. FROM.. syntax (https://www.postgresql.org/docs/current/sql-update.html) and I came up with the following query that works perfectly fine:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array['Math', 'Math']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, 100]) as grade) as data_table
where grades.id = data_table.id;
There's also another way to do it with WITH syntax like this:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(WITH vals (id, course_id, student_id, grade) as (VALUES (1, 'Math', 1000, 95), (2, 'Math', 1001, 100)) SELECT * from vals) as data_table
where grades.id = data_table.id;
My problem is that sometimes I want in some raws to update a field and sometime not. When I don't want to update I just want to keep the value that is currently in the table. In this case, I would want to potentially do something like:
update grades g
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array[g.course_id, 'Math2']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, g.grade]) as grade) as data_table
where grades.id = data_table.id;
However this is not possible and I get back the error HINT: There is an entry for table "g", but it cannot be referenced from this part of the query.
Also postgresql documentation specifies about it in the From description:
Note that the target table must not appear in the from_list,
unless you intend a self-join (in which case it must appear with an alias in the from_list).
Does anyone know if there's a way to perform such bulk update ?
I've tried to use JOINs in inner query but with no luck..
Chose a value that cannot be a valid value, eg '-1' for course name and -1 for a grade, and use that for your generated values, then use a case in the insert to direct whether to use the current value or not:
update grades g
set course_id = case when data_table.course_id = '-1' then course_id else data_table.course_id end,
student_id = data_table.student_id,
grade = case when data_table.grade = -1 then g.grade else data_table.grade end
from (
select
unnest(array[1,2]) as id,
unnest(array['-1', 'Math2']) as course_id, -- use '-1' instead of g.course_id
unnest(array[1000, 1001]) as student_id,
unnest(array[95, -1]) as grade -- use -1 instead of g.grade
) as data_table
where grades.id = data_table.id
Pick whatever values you like for the impossible value.
If nulls were not allowed it would have been more straightforward and less code - use null for the impossible value and coalesce() in for the update value.

I'm trying to insert tuples into a table A (from table B) if the primary key of the table B tuple doesn't exist in tuple A

Here is what I have so far:
INSERT INTO Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
SELECT CURRENT_DATE, NULL, NewRentPayments.Rent, NewRentPayments.LeaseTenantSSN, FALSE from NewRentPayments
WHERE NOT EXISTS (SELECT * FROM Tenants, NewRentPayments WHERE NewRentPayments.HouseID = Tenants.HouseID AND
NewRentPayments.ApartmentNumber = Tenants.ApartmentNumber)
So, HouseID and ApartmentNumber together make up the primary key. If there is a tuple in table B (NewRentPayments) that doesn't exist in table A (Tenants) based on the primary key, then it needs to be inserted into Tenants.
The problem is, when I run my query, it doesn't insert anything (I know for a fact there should be 1 tuple inserted). I'm at a loss, because it looks like it should work.
Thanks.
Your subquery was not correlated - It was just a non-correlated join query.
As per description of your problem, you don't need this join.
Try this:
insert into Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
select current_date, null, p.Rent, p.LeaseTenantSSN, FALSE
from NewRentPayments p
where not exists (
select *
from Tenants t
where p.HouseID = t.HouseID
and p.ApartmentNumber = t.ApartmentNumber
)

Can Entity Framework assign wrong Identity column value in case of high concurency additions

We have an auto-increment Identity column Id as part of my user object. For a campaign we just did for a client we had up to 600 signups per minute. This is code block doing the addition:
using (var ctx = new {{ProjectName}}_Entities())
{
int userId = ctx.Users.Where(u => u.Email.Equals(request.Email)).Select(u => u.Id).SingleOrDefault();
if (userId == 0)
{
var user = new User() { /* Initializing user properties here */ };
ctx.Users.Add(user);
ctx.SaveChanges();
userId = user.Id;
}
...
}
Then we use the userId to insert data into another table. What happened during high load is that there were multiple rows with same userId even though there shouldn't be. It seems like the above code returned the same Identity (int) number for multiple inserts.
I read through a few blog/forum posts saying that there might be an issue with SCOPE_IDENTITY() which Entity Framework uses to return the auto-increment value after insert.
They say a possible workaround would be writing insert procedure for User with INSERT ... OUTPUT INSERTED.Id which I'm familiar with.
Anybody else experienced this issue? Any suggestion on how this should be handled with Entity Framework?
UPDATE 1:
After further analyzing data I'm almost 100% positive this is the problem. Identity column had skipped auto-increment values 48 times total 2727, (2728 missing), 2729,... and exactly 48 duplicates we have in the other table.
It seems like EF returned random Identity value for each row it wasn't able to insert for some reason.
Anybody have any idea what could possible be going on here?
UPDATE 2:
Possibly important info I didn't mention is that this happened on Azure Website with Azure SQL. We had 4 instances running at the time it happened.
UPDATE 3:
Stored Proc:
CREATE PROCEDURE [dbo].[p_ClaimCoupon]
#CampaignId int,
#UserId int,
#Flow tinyint
AS
DECLARE #myCoupons TABLE
(
[Id] BIGINT NOT NULL,
[Code] CHAR(11) NOT NULL,
[ExpiresAt] DATETIME NOT NULL,
[ClaimedBefore] BIT NOT NULL
)
INSERT INTO #myCoupons
SELECT TOP(1) c.Id, c.Code, c.ExpiresAt, 1
FROM Coupons c
WHERE c.CampaignId = #CampaignId AND c.UserId = #UserId
DECLARE #couponCount int = (SELECT COUNT(*) FROM #myCoupons)
IF #couponCount > 0
BEGIN
SELECT *
FROM #myCoupons
END
ELSE
BEGIN
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
END
RETURN 0
Called like this from the same EF context:
var coupon = ctx.Database.SqlQuery<CouponViewModel>(
"EXEC p_ClaimCoupon #CampaignId, #UserId, #Flow",
new SqlParameter("CampaignId", {{CampaignId}}),
new SqlParameter("UserId", {{userId}}),
new SqlParameter("Flow", {{Flow}})).FirstOrDefault();
No, that's not possible. For one, that would be an egregious bug in EF. You are not the first one to put 600 inserts/second on it. Also, SCOPE_IDENTITY is explicitly safe and is the recommended practice.
These statements go for the case that you are using a SQL Server IDENTITY column as ID.
I admit I don't know how Azure SQL Database synchronizes the generation of unique, sequential IDs, but intuitively it must be costly, especially at your rates.
If non-sequential IDs are an option, you might want to consider generating UUIDs at the application level. I know this doesn't answer your direct question, but it would improve performance (unverified) and bypass your problem.
Update: Scratch that, Azure SQL Database isn't distributed, it's simply replicated from a single primary node. So, no real performance gain to expect from alternatives to IDENTITY keys, and supposedly the number of instances is not significant to your problem.
I think your problem may be here:
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
This will update the UserId of first record it finds in the campaign that hasn't been claimed. It doesn't look robust to me in the event that inserting a user failed. Are you sure that is correct?