I've created entity data model for the following database tables in SqlCe:
CREATE TABLE [test_vulnerabilities] (
[id] INTEGER PRIMARY KEY,
[description] NTEXT NOT NULL DEFAULT ''
);
CREATE TABLE [test_software_vulnerabilities]
(
[id] INTEGER PRIMARY KEY IDENTITY,
[vulnerability_id] INTEGER NOT NULL
REFERENCES [test_vulnerabilities]([id]),
[details] NTEXT NOT NULL DEFAULT ''
);
Entities (created by adding entity model based on existing database):
entity Vulnerability in set Vulnerabilities
Id int
Description string
Software ICollection<SoftwareVulnerability> - navigation property
entity SoftwareVulnerability in set SoftwareVulnerabilities
Id int
Details string
VulnerabilityId int
Vulnerability Vulnerability - navigation property
and executing the following query:
var query = (from v in entities.Vulnerabilities.Include("Software")
where v.Id == id && v.Software.Count > 0
select v);
it is very-very-very slow because the generated SQL joins vulnerabilities with software_vulnerability with left outer join.
Is there any way to simply say that I want only vulnerabilities with non-empty software_vulnerability and the INNER JOIN is ok?
Thanks!
No. You don't have control over used joins. You can try to revert the query:
var query = (from s in entities.SofwareVulnerabilities.Include("Vulnerability")
where s.VulnerabilityId == id
select s);
You will get all software vulnerabilities for your single expected vulnerability and the vulnerability will be included. If your relation is from software vulnerability is correctly configured as mandatory it should hopefully use inner join.
I think this may be slow because you are using count. I would just try .Any() here instead as it will probably be a heap faster
Related
Suppose I have the following table structure:
create table PEOPLE (
ID integer not null primary key,
NAME varchar(100) not null
);
create table CHILDREN (
ID integer not null primary key,
PARENT_ID_1 integer not null references PERSON (id),
PARENT_ID_2 integer not null references PERSON (id)
);
and that I want to generate a list of the names of each person who is a parent. In slick I can write something like:
for {
parent <- people
child <- children if {
parent.id === child.parent_id_1 ||
parent.id === child.parent_id_2
}
} yield {
parent.name
}
and this generates the expected SQL:
select p.name
from people p, children c
where p.id = c.parent_id_1 or p.id = c.parent_id_2
However, this is not optimal: the OR part of the expression can cause horrendously slow performance in some DBMSes, which end up doing full-table scans to join on p.id even though there's an index there (see for example this bug report for H2). The general problem is that the query planner can't know if it's faster to execute each side of the OR separately and join the results back together, or simply do a full table scan [2].
I'd like to generate SQL that looks something like this, which then can use the (primary key) index as expected:
select p.name
from people p, children c
where p.id in (c.parent_id_1, c.parent_id_2)
My question is: how can I do this in slick? The existing methods don't seem to offer a way:
ColumnExtensionMethods.in takes Query as a parameter, but I don't have a Query I have a number or Rep[Long] for each of my ID columns
ColumnExtensionMethods.inSet is for binding existing (known) literal arrays, not for joining to sets of columns
What I'd like to be able to write is something like this:
for {
parent <- people
child <- children
if parent.id in (child.parent_id_1, child.parent_id_2)
} yield {
p.name
}
but that's not possible right now.
[1] My actual design is a little more complex than this, but it boils down to the same problem.
[2] Some DBMSes do have this optimisation for simple cases, e.g. OR-expansion in Oracle.
Turns out this isn't currently (as at slick 3.2.3) possible, so I've raised an issue on github and submitted a pull request to add this functionality.
Here is what I have so far:
INSERT INTO Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
SELECT CURRENT_DATE, NULL, NewRentPayments.Rent, NewRentPayments.LeaseTenantSSN, FALSE from NewRentPayments
WHERE NOT EXISTS (SELECT * FROM Tenants, NewRentPayments WHERE NewRentPayments.HouseID = Tenants.HouseID AND
NewRentPayments.ApartmentNumber = Tenants.ApartmentNumber)
So, HouseID and ApartmentNumber together make up the primary key. If there is a tuple in table B (NewRentPayments) that doesn't exist in table A (Tenants) based on the primary key, then it needs to be inserted into Tenants.
The problem is, when I run my query, it doesn't insert anything (I know for a fact there should be 1 tuple inserted). I'm at a loss, because it looks like it should work.
Thanks.
Your subquery was not correlated - It was just a non-correlated join query.
As per description of your problem, you don't need this join.
Try this:
insert into Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
select current_date, null, p.Rent, p.LeaseTenantSSN, FALSE
from NewRentPayments p
where not exists (
select *
from Tenants t
where p.HouseID = t.HouseID
and p.ApartmentNumber = t.ApartmentNumber
)
I am currently using EFCore 1.1 (preview release) with SQL Server.
I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId into tmp
from oi in tmp.DefaultIfEmpty()
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.
SELECT [order].[OrderId], [order].[OrderStatusTypeId],
[order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId],
[order].[ParentFSPId], [order].[ParentOrderId],
[order].[PayPalECToken], [order].[PaymentFailureTypeId] ....
...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty],
[orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS
[orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY
[order].[OrderId]
(There are many more columns not shown here.)
On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:
SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM
[Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON
[order].[OrderId] = [orderItem].[OrderId]
I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.
Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.
Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.
3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.
The following applies to EF Core 1.1.0 (release).
Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.
What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.
Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):
var orders = from o in ctx.Order
from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
OrderDt = o.OrderDt,
Sku = oi.Sku,
Qty = (int?)oi.Qty
};
which produces the following SQL:
SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
SELECT [t0].*
FROM (
SELECT NULL AS [empty]
) AS [empty0]
LEFT JOIN (
SELECT [oi0].*
FROM [OrderItem] AS [oi0]
WHERE [oi0].[OrderId] = [o].[OrderId]
) AS [t0] ON 1 = 1
) AS [t1]
As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.
Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).
Using "into" will create a temporary identifier to store the results.
Reference : MDSN: into (C# Reference)
So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
update Claim
set first_name = random_name(7),
Last_name = random_name(6),
t2.first_name=random_name(7),
t2.last_name=random_name(6)
from Claim t1
inner join tbl_ecpremit t2
on t1.first_name = t2.first_name
I am getting below error
column "t2" of relation "claim" does not exist
You can do this with a so-called data-modifying CTE:
WITH c AS (
UPDATE claim SET first_name = random_name(7), last_name = random_name(6)
WHERE <put your condition here>
RETURNING *
)
UPDATE tbl_ecpremit SET last_name = c.last_name
FROM c
WHERE first_name = c.first_name;
This assumes that random_name() is a function you define, it is not part of PostgreSQL afaik.
The nifty trick here is that the UPDATE in the WITH query returns the updated record in the first table using the RETURNING clause. You can then use that record in the main UPDATE statement to have exactly the same data in the second table.
This is all very precarious though, because you are both linking on and modifying the "first_name" column with some random value. In real life this will only work well if you have some more logic regarding the names and conditions.
We have an auto-increment Identity column Id as part of my user object. For a campaign we just did for a client we had up to 600 signups per minute. This is code block doing the addition:
using (var ctx = new {{ProjectName}}_Entities())
{
int userId = ctx.Users.Where(u => u.Email.Equals(request.Email)).Select(u => u.Id).SingleOrDefault();
if (userId == 0)
{
var user = new User() { /* Initializing user properties here */ };
ctx.Users.Add(user);
ctx.SaveChanges();
userId = user.Id;
}
...
}
Then we use the userId to insert data into another table. What happened during high load is that there were multiple rows with same userId even though there shouldn't be. It seems like the above code returned the same Identity (int) number for multiple inserts.
I read through a few blog/forum posts saying that there might be an issue with SCOPE_IDENTITY() which Entity Framework uses to return the auto-increment value after insert.
They say a possible workaround would be writing insert procedure for User with INSERT ... OUTPUT INSERTED.Id which I'm familiar with.
Anybody else experienced this issue? Any suggestion on how this should be handled with Entity Framework?
UPDATE 1:
After further analyzing data I'm almost 100% positive this is the problem. Identity column had skipped auto-increment values 48 times total 2727, (2728 missing), 2729,... and exactly 48 duplicates we have in the other table.
It seems like EF returned random Identity value for each row it wasn't able to insert for some reason.
Anybody have any idea what could possible be going on here?
UPDATE 2:
Possibly important info I didn't mention is that this happened on Azure Website with Azure SQL. We had 4 instances running at the time it happened.
UPDATE 3:
Stored Proc:
CREATE PROCEDURE [dbo].[p_ClaimCoupon]
#CampaignId int,
#UserId int,
#Flow tinyint
AS
DECLARE #myCoupons TABLE
(
[Id] BIGINT NOT NULL,
[Code] CHAR(11) NOT NULL,
[ExpiresAt] DATETIME NOT NULL,
[ClaimedBefore] BIT NOT NULL
)
INSERT INTO #myCoupons
SELECT TOP(1) c.Id, c.Code, c.ExpiresAt, 1
FROM Coupons c
WHERE c.CampaignId = #CampaignId AND c.UserId = #UserId
DECLARE #couponCount int = (SELECT COUNT(*) FROM #myCoupons)
IF #couponCount > 0
BEGIN
SELECT *
FROM #myCoupons
END
ELSE
BEGIN
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
END
RETURN 0
Called like this from the same EF context:
var coupon = ctx.Database.SqlQuery<CouponViewModel>(
"EXEC p_ClaimCoupon #CampaignId, #UserId, #Flow",
new SqlParameter("CampaignId", {{CampaignId}}),
new SqlParameter("UserId", {{userId}}),
new SqlParameter("Flow", {{Flow}})).FirstOrDefault();
No, that's not possible. For one, that would be an egregious bug in EF. You are not the first one to put 600 inserts/second on it. Also, SCOPE_IDENTITY is explicitly safe and is the recommended practice.
These statements go for the case that you are using a SQL Server IDENTITY column as ID.
I admit I don't know how Azure SQL Database synchronizes the generation of unique, sequential IDs, but intuitively it must be costly, especially at your rates.
If non-sequential IDs are an option, you might want to consider generating UUIDs at the application level. I know this doesn't answer your direct question, but it would improve performance (unverified) and bypass your problem.
Update: Scratch that, Azure SQL Database isn't distributed, it's simply replicated from a single primary node. So, no real performance gain to expect from alternatives to IDENTITY keys, and supposedly the number of instances is not significant to your problem.
I think your problem may be here:
UPDATE TOP(1) Coupons
SET UserId = #UserId, IsClaimed = 1, ClaimedAt = GETUTCDATE(), Flow = #Flow
OUTPUT DELETED.Id, DELETED.Code, DELETED.ExpiresAt, CAST(0 AS BIT) as [ClaimedBefore]
WHERE CampaignId = #CampaignId AND IsClaimed = 0
This will update the UserId of first record it finds in the campaign that hasn't been claimed. It doesn't look robust to me in the event that inserting a user failed. Are you sure that is correct?