How to delete multiple identical rows with Entity Framework

How to delete multiple identical rows with Entity Framework - entity-framework

Some of the records in my table are duplicates. When attempting to delete them, I get this error:
Database operation expected to affect 1 row(s) but actually affected 2 row(s). Data may have been modified or deleted since entities were loaded.
The query being called is a simple "DELETE FROM [] WHERE []". I've tried using the Remove method, as well as setting EntityState.Deleted. In both cases it expects the delete call to only hit one row, but it hits two.
How do I make it delete any number of rows fitting the parameters I pass to it?

Apparently EF doesn't include any kind of batch delete, but there's an extension that does: http://entityframework-plus.net/
Can be found on NuGet as Z.EntityFramework.Plus
With that you can chain Where and Delete:
context.Table.Where(x => x.field == parameter).Delete();
This will work regardless of the number of rows and even if they are duplicates.

Related

Power Query - Appending two tables but the other table might be empty depending on the situation - throws an error in that case

I am working on a solution that involves merging two queries in Power Query to retrieve a single data table back to Excel. The first query is always populated but the other query comes from an ERP and might be empty (empty table) from time to time.
Appending the two queries involves making the header names the same in the two queries before the appending takes place. As the second query sometimes results in an empty table, the error arises in the steps when Power Query is modifying the header names in the second table (it cannot modify the header names as there are no headers).
"Error message: Expression.Error: The column 'PartMtl_Company' of the table wasn't found.
Details: PartMtl_Company" where the PartMtl_Company is the leftmost column in my table.
I am kind of thinking that I would need to evaluate whether the second table is empty and skip the renaming steps if that is the case. I assume merging the populated first table with an empty table would cause no problem and would only result in the first table. I have tried to look around for a suitable M-code but have not come across such.

I'm thinking you might be able to use Table.RowCount to solve this. Something along the lines of:
= if Table.RowCount(Table2) > 0 then...
You would modify the headers only if there is data in the second table. Same goes for the appending of the tables: you would only append if there is data in the second table, since you won't have renamed any headers otherwise.

Thank you Marc! That did the trick.
In the end, I wrote some in the lines of
= if Table.RowCount(Table2) > 0 then... (code that works on a non-empty table) ...else Table2
, which returns the empty table if it is empty to begin with. Appending the second table into the first table did not throw an error but returned only the first table like planned.

Postgresql - Return column subset from cursor

I have a legacy stored procedure returning a number (row count) a cursor with many columns; I need to retrieve a subset of the selected columns. I can think of three ways of doing it:
Invoke the existing procedure from the outside, and map columns to my own data structures trimming unneeded columns;
Write a new stored procedure, mostly identical to the existing one but returning different columns;
Write a new stored procedure, invoking the old one internally and filtering columns (the referenced entities and thus the number of rows are exactly the same as the existing procedure).
Number 2 is obviously a no-go.
Number 1 is viable. As far as I know, there is little difference in the computing cost between retrieving one or more columns, in that the engine has to read full rows regardless, before filtering unrequired columns; I do have a feeling it would be heavier on the runtime invoking the procedure from the outside, as objects representing unneeded columns would exist on returning from the DB call.
I would be interested in implementing Number 3, but I would prefer to maintain the same return type as the existing function (count + refcursor) for conformity.
I think I could transfer all the rows in the cursor returned by the existing function into a temporary table as described e.g. in this question, and use it as a source for the output cursor but:
I am not sure of how the output cursor would behave with a temporary table created with a drop-on-commit clause (would the results exist reliably after the procedure has terminated? Would the temporary table be dropped as expected?);
I read that temporary tables are expensive to use, and it feels like overkill for what in the end is a filtering of columns on the same rows from a pre-computed result.
Is there a way to query the existing cursor so that it may be used as a source for the output cursor, while filtering columns?

Postgres 'if not exists' fails because the sequence exists

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)

Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.

EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

how to avoid cross-mapping in SQL query while using 2 lists with IN opeartor?

I have a table HistoryRecords which has two columns recordName and timeStamp. I have to delete the records based on both of these. I had used the following query to delete 2 records <'abc', '2010/10/20 19:39:20.0'> and <'def', '2010/10/25 17:43:3.0'> :
DELETE FROM HistoryRecords WHERE (recordName IN (N'abc',N'def') AND timeStamp IN (N'2010/10/20 19:39:20.0',N'2010/10/25 17:43:3.0'))
The problem is that the above query leads to deletion of other records also like <'abc', '2010/10/25 17:43:3.0'> because the list of recordNames contains 'abc' and list of timestamp contains '2010/10/25 17:43:3.0'.
Please let me know any approach that will prevent this extra unintended deletion.

You should seprate cases. Dont combine
delete from historyRecords
where (recName='abc' and timeStamp='2010...19:39')
OR
(recName='def' and timeStamp='2010...17:43')

Does DataReader.NextResult retrieves the result is always the same order

I have a SELECT query that yields multiple results and do not have any ORDER BY clause.
If I execute this query multiple times and then iterate through results using DataReader.NextResult(), would I be guaranteed to get the results in the same order?
For e.g. if I execute the following query that return 199 rows:
SELECT * FROM products WHERE productid < 200
would I always get the first result with productid = 1 and so on?
As far as I have observed it always return the results in same order, but I cannot find any documentation for this behavior.
======================================
As per my research:
Check out this blog Conor vs. SQL. I actually wanted to ask if the query-result changes even if the data in table remains the same (i.e no update or delete). But it seems like in case of large table, when SQL server employees parallelism, the order can be different

First of all, to iterate the rows in a DataReader, you should call Read, not NextResult.
Calling NextResult will move to the next result set if your query has multiple SELECT statements.
To answer your question, you must not rely on this.
A query without an ORDER BY clause will return rows in SQL Server's default iteration order.
For small tables, this will usually be the order in which the rows were added, but this is not guaranteed and is liable to change at any time. For example, if the table is indexed or partitioned, the order will be different.

No, DataReader will return the results in the order they come back from SQL. If you don't specify an ORDER BY clause, that will be the order that they exist in the table.

It is possible, perhaps even likely that they will always return in the same order, but this isn't guaranteed. The order is determined by the queryplan (at least in SQL Server) on the database server. If something changes that queryplan, the order could change. You should always use ORDER BY if the order of results is in anyway important to your processing of the data.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to delete multiple identical rows with Entity Framework - entity-framework

Related

Power Query - Appending two tables but the other table might be empty depending on the situation - throws an error in that case

Postgresql - Return column subset from cursor

Postgres 'if not exists' fails because the sequence exists

how to avoid cross-mapping in SQL query while using 2 lists with IN opeartor?

Does DataReader.NextResult retrieves the result is always the same order

Categories

Resources