Double inserting records in flask SqlAlchemy connected with PostgreSql? - postgresql

Very rarely, I meet a problem that the record that I inserted into Table Tbl_CUSTOMER was double with auto ID from Postgres.
I have no idea, but I suspected that it could be caused from postgres vacuum running time. To confirm that, I tried to run postgres vacuum at the same with inserting record, but could not found this problem happened, therefore, I could not duplicate the issue to find what was the root cause and fix the problem.
models.py
class Tbl_CUSTOMER():
ID = db.Column(db.Numeric(25, 9), primary_key=True, autoincrement=True)
PotentialCustomer = db.Column(db.String(12))
FirstNameEn = db.Column(db.String(35))
LastNameEn = db.Column(db.String(35))
FirstNameKh = db.Column(db.String(35))
LastNameKh = db.Column(db.String(35))
Salutation = db.Column(db.String(4))
Gender = db.Column(db.String(6))
DateOfBirth = db.Column(db.String(10))
CountryOfBirth = db.Column(db.String(2))
Nationality = db.Column(db.String(2))
ProvinceOfBirth = db.Column(db.String(3))
views.py
dataInsert =Tbl_CUSTOMER(
PotentialCustomer = request.form['PotentialCustomer'],
FirstNameEn = request.form['FirstNameEn'],
LastNameEn = request.form['LastNameEn'],
FirstNameKh = request.form['FirstNameKh'],
LastNameKh = request.form['LastNameKh'],
Salutation = request.form['Salutation'],
Gender = request.form['Gender'],
DateOfBirth = request.form['DateOfBirth'],
CountryOfBirth = request.form['CountryOfBirth'],
Nationality = request.form['Nationality'],
ProvinceOfBirth = request.form['ProvinceOfBirth']
)
db.session.add(dataInsert)
db.session.commit()
This problem does not happen frequently. So, what is the problem, and how can I fix this to prevent it happen in future? Thanks.

If you create a unique key ( or replace your primary key ) with some hashing function value based on all the values of your row, that may help you to see when this problem is happening. Using this hashing column you will be able to decide what you should happen when your system get the same value ( same hash ). One option, for example, just ignores the new row, keeping the old one. Other, is to rewrite, etc.
The chance of getting the same hash value from different rows is so small that I would not even consider that. Look this thread https://crypto.stackexchange.com/questions/1170/best-way-to-reduce-chance-of-hash-collisions-multiple-hashes-or-larger-hash if you want to see more details about that.

Related

Is there a way to expand dynamically tables found in multiple columns using Power Query?

I have used the List.Accumulate() to merge mutliple tables. This is the output I've got in this simple example:
Now, I need a solution to expand all these with a formula, because in real - world I need to merge multiple tables that keep increasing in number (think Eurostat tables, for instance), and modifying the code manually wastes much time in these situations.
I have been trying to solve it, but it seems to me that the complexity of syntax easily becomes the major limitation here. For instance, If I make a new step where I nest in another List.Accumulate() the Table.ExpandTableColumns(), I need to pass inside a column name of an inner table, as a text. Fine, but to drill it down actually, I first need to pass a current column name in [] in each iteration - for instance, Column 1 - and it triggers an error if I store column names to a list because these are between "". I also experimented with TransformColumns() but didn't work either.
Does anyone know how to solve this problem whatever the approach?
See https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
which boils down to this function
let Source = (TableToExpand as table, optional ColumnNumber as number) =>
//https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
let ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,
ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},
ColumnContents = Table.Column(TableToExpand, ColumnName),
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, each if _ is table then Table.ColumnNames(_) else {}))),
NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),
CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,
ExpandedTable = if CanExpandCurrentColumn then Table.ExpandTableColumn(TableToExpand, ColumnName, ColumnsToExpand, NewColumnNames) else TableToExpand,
NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,
OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) then ExpandedTable else ExpandAll(ExpandedTable, NextColumnNumber)
in OutputTable
in Source
alternatively, unpivot all the table columns to get one column, then expand that value column
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"PriorStepNameHere", "ValueColumnNameHere"), each if _ is table then Table.ColumnNames(_) else {}))),
#"Expanded ColumnNameHere" = Table.ExpandTableColumn(#"PriorStepNameHere", "ValueColumnNameHere",ColumnsToExpand ,ColumnsToExpand ),

Querying for overlapping time ranges in SQLAlchemy and Postgres

I'm using Flask-SQLAlchemy to describe a Postgres database. Three related tables look like this (in part):
from sqlalchemy.dialects.postgresql import TSTZRANGE
class Shift(Base):
__tablename__ = "shifts"
id = db.Column(db.Integer, primary_key=True)
hours = db.Column(TSTZRANGE, nullable=False)
class Volunteer(Base):
__tablename__ = "volunteers"
id = db.Column(db.Integer(), primary_key=True)
shifts = db.relationship(
"Shift",
secondary="shift_assignments",
backref=db.backref("volunteers", lazy="dynamic"),
)
class ShiftAssignment(Base):
__tablename__ = "shift_assignments"
__table_args__ = (db.UniqueConstraint('shift_id', 'volunteer_id', name='_shift_vol_uc'),)
id = db.Column(db.Integer, primary_key=True)
shift_id = db.Column("shift_id", db.Integer(), db.ForeignKey("shifts.id"))
volunteer_id = db.Column(
"volunteer_id", db.Integer(), db.ForeignKey("volunteers.id")
)
Now, I'm assigning a Volunteer to new Shift and want to make sure that the vol isn't already committed to a different Shift at the same time.
I have tried this in a Volunteer instance method, but it's not working:
new_shift = db.session.get(Shift, new_shift_id)
if new_shift not in self.shifts:
for shift in self.shifts:
overlap = db.session.scalar(shift.hours.overlaps(new_shift.hours))
This results in the following exception:
'DateTimeTZRange' object has no attribute 'overlaps'
It seems like I should probably not even be doing this by iterating over the list anyway, but should be directly querying the DB to do the date overlap math. So I guess I need to join the volunteers and shifts and then filter to find out if any shifts overlap with the target shift. But I can't figure out how to do that, and examples of overlaps and its RangeOperators friends is really thin on the ground.
Would appreciate a hand here.
It was much easier than I was making it. Again, this is in a Volunteer instance method.
new_shift = db.session.get(Shift, new_shift_id)
overlapping_shift = (
db.session.query(Shift, ShiftAssignment)
.join(ShiftAssignment)
.filter(ShiftAssignment.volunteer_id == self.id)
.filter(Shift.hours.overlaps(new_shift.hours))
.first()
)
if overlapping_shift:
print("overlap found")
Note that the query returns a (Shift, ShiftAssignment) tuple. We join the two appropriate tables and then filter twice, left with any overlapping shifts assigned to the current volunteer.

Multiple conditions in postgresql_where?

postgresql_where is useful to get around the (in my opinion wrong, but apparently the SQL standard defines it) way in which Postgres defines unique-ness, where Null values are always unique. A typical example is shown below, where no item can have identical name+purpose+batch_id values (and None/Null is considered one unique value due to the second Index).
class Item(StoredObject, Base):
batch_id = Column(Integer, ForeignKey('batch.id'))
group_id = Column(Integer, ForeignKey('group.id'))
name = Column(Text, nullable=False)
purpose = Column(Text, nullable=False, default="")
__table_args__ = (
Index('idx_batch_has_value',
'group_id', 'name', 'purpose', 'batch_id',
unique=True,
postgresql_where=(batch_id.isnot(None))),
Index('idx_batch_has_no_value',
'group_id', 'name', 'purpose',
unique=True,
postgresql_where=(batch_id.is_(None))),
)
However, I want the same behaviour across two ids (batch_id and group_id), that is to say that no item can have identical name+purpose+batch_id+group_id values (None/Null is considered one unique value in both batch_id and group_id).
I can workaround this by creating a 'default' batch/group object with a fixed ID (say 0). This means I'd have to ensure that batch/group object exists, cannot be deleted, and that that id doesn't get re-appropriated for another 'real' batch/group objects (not to mention I'd have to remember to reduce all counts by one when using/writing functions which count how many batches/groups I have).
Do-able, and I'm about to do it now, but there must be a better way! Is there something like:-
postgresql_where = (batch_id.isnot(None) AND group_id.isnot(None))
That would solve the problem where, in my opinion, it is meant to be solved, in the DB rather than in my model and/or initialization code.

SQL update statements updates wrong fields

I have the following code in Postgres
select op.url from identity.legal_entity le
join identity.profile op on le.legal_entity_id =op.legal_entity_id
where op.global_id = '8wyvr9wkd7kpg1n0q4klhkc4g'
which returns 1 row.
Then I try to update the url field with the following:
update identity.profile
set url = 'htpp:sam'
where identity.profile.url in (
select op.url from identity.legal_entity le
join identity.profile op on le.legal_entity_id =op.legal_entity_id
where global_id = '8wyvr9wkd7kpg1n0q4klhkc4g'
);
But the above ends up updating more than 1 row, actually all of the rows of the identity table.
I would assume since the first postgres statement returns one row, only one row at most can be updated, but I am getting the wrong effect where all of the rows are being updated. Why ?? Please help a nubie fix the above update statement.
Instead of using profile.url to identify the row you want to update, use the primary key. That is what it is there for.
So if the primary key column is called id, the statement could be modified to:
UPDATE identity.profile
SET ...
WHERE identity.profile.id IN (SELECT op.id FROM ...);
But you can do this much simpler in PostgreSQL with
UPDATE identity.profile op
SET url = 'htpp:sam'
FROM identity.legal_entity le
WHERE le.legal_entity_id = op.legal_entity_id
AND le.global_id = '8wyvr9wkd7kpg1n0q4klhkc4g';

Can Entity Framework assign wrong Identity column value in case of high concurency additions

We have an auto-increment Identity column Id as part of my user object. For a campaign we just did for a client we had up to 600 signups per minute. This is code block doing the addition:
using (var ctx = new {{ProjectName}}_Entities())
{
int userId = ctx.Users.Where(u => u.Email.Equals(request.Email)).Select(u => u.Id).SingleOrDefault();
if (userId == 0)
{
var user = new User() { /* Initializing user properties here */ };
ctx.Users.Add(user);
ctx.SaveChanges();
userId = user.Id;
}
...
}
Then we use the userId to insert data into another table. What happened during high load is that there were multiple rows with same userId even though there shouldn't be. It seems like the above code returned the same Identity (int) number for multiple inserts.
I read through a few blog/forum posts saying that there might be an issue with SCOPE_IDENTITY() which Entity Framework uses to return the auto-increment value after insert.
They say a possible workaround would be writing insert procedure for User with INSERT ... OUTPUT INSERTED.Id which I'm familiar with.
Anybody else experienced this issue? Any suggestion on how this should be handled with Entity Framework?
UPDATE 1:
After further analyzing data I'm almost 100% positive this is the problem. Identity column had skipped auto-increment values 48 times total 2727, (2728 missing), 2729,... and exactly 48 duplicates we have in the other table.
It seems like EF returned random Identity value for each row it wasn't able to insert for some reason.
Anybody have any idea what could possible be going on here?
UPDATE 2:
Possibly important info I didn't mention is that this happened on Azure Website with Azure SQL. We had 4 instances running at the time it happened.
UPDATE 3:
Stored Proc:
CREATE PROCEDURE [dbo].[p_ClaimCoupon]
#CampaignId int,
#UserId int,
#Flow tinyint
AS
DECLARE #myCoupons TABLE
(
[Id] BIGINT NOT NULL,
[Code] CHAR(11) NOT NULL,
[ExpiresAt] DATETIME NOT NULL,
[ClaimedBefore] BIT NOT NULL
)
INSERT INTO #myCoupons
SELECT TOP(1) c.Id, c.Code, c.ExpiresAt, 1
FROM Coupons c
WHERE c.CampaignId = #CampaignId AND c.UserId = #UserId
DECLARE #couponCount int = (SELECT COUNT(*) FROM #myCoupons)
IF #couponCount > 0
BEGIN
SELECT *
FROM #myCoupons
END
ELSE
BEGIN
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
END
RETURN 0
Called like this from the same EF context:
var coupon = ctx.Database.SqlQuery<CouponViewModel>(
"EXEC p_ClaimCoupon #CampaignId, #UserId, #Flow",
new SqlParameter("CampaignId", {{CampaignId}}),
new SqlParameter("UserId", {{userId}}),
new SqlParameter("Flow", {{Flow}})).FirstOrDefault();
No, that's not possible. For one, that would be an egregious bug in EF. You are not the first one to put 600 inserts/second on it. Also, SCOPE_IDENTITY is explicitly safe and is the recommended practice.
These statements go for the case that you are using a SQL Server IDENTITY column as ID.
I admit I don't know how Azure SQL Database synchronizes the generation of unique, sequential IDs, but intuitively it must be costly, especially at your rates.
If non-sequential IDs are an option, you might want to consider generating UUIDs at the application level. I know this doesn't answer your direct question, but it would improve performance (unverified) and bypass your problem.
Update: Scratch that, Azure SQL Database isn't distributed, it's simply replicated from a single primary node. So, no real performance gain to expect from alternatives to IDENTITY keys, and supposedly the number of instances is not significant to your problem.
I think your problem may be here:
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
This will update the UserId of first record it finds in the campaign that hasn't been claimed. It doesn't look robust to me in the event that inserting a user failed. Are you sure that is correct?