Continuing from my earlier question where I described my schema (repeated here for your convenience):
Parties ( PartyId, ClientId, AddressId, DateTime )
Tents ( PartyId, TentDetails... )
Clowns ( PartyId, AddressId, ClownDetails... )
SecurityAgentAssignment ( PartyId, AgentId, fromTime, untilTime )
Addresses ( AddressId, Street, City, State )
....and there's about 10 other tables of a similar design: all in a many-to-one relationship with Parties.
My ASP.NET MVC web application has a Summary page that displays every detail about a party. I'm using EF1.0 so I have to do my own eager-loading. Here's the logic I'm using:
Party dbParty = GetParty(partyId);
dbParty.Tents.EnsureLoaded();
dbParty.Clowns.EnsureLoaded();
foreach(Clown clown in dbParty.Clowns) clown.Address.EnsureLoaded();
dbParty.Security.EnsureLoaded();
foreach(SecurityAgentAssignment assignment in dbParty.Security) assignment.Agent.EnsureLoaded();
// and the 10 other relationships too
The code above takes about 3 seconds to run. Given this isn't eager-loading, but lazy-loading, surely it should just fire off about 15 simple SELECT queries and be done?
I don't have the SQL Server Profiler installed, and I don't know how to get the SQL generated when you're using .Load instead of IQueryable.
I use these extension methods as helpers:
private static readonly R.FieldInfo _entityReferenceContext = typeof(RelatedEnd).GetField("_context", R.BindingFlags.Instance | R.BindingFlags.NonPublic );
private static readonly R.PropertyInfo _relatedEndOwner = typeof(RelatedEnd).GetProperty("Owner", R.BindingFlags.Instance | R.BindingFlags.NonPublic );
private static Boolean IsAttached(this RelatedEnd relatedEnd) {
Object context = _entityReferenceContext.GetValue( relatedEnd );
return context != null;
}
public static TEntity EnsureLoaded<TEntity>(this EntityReference<TEntity> eref) where TEntity : class, IEntityWithRelationships {
// EntityReference<TEntity> derives from RelatedEnd.
RelatedEnd erefAsRelatedEnd = (RelatedEnd)eref;
erefAsRelatedEnd.EnsureLoaded();
return eref.Value;
}
public static void EnsureLoaded(this RelatedEnd end) {
IEntityWithRelationships owner = (IEntityWithRelationships)_relatedEndOwner.GetValue( end, null );
EntityObject ownerEntity = owner as EntityObject;
if( ownerEntity != null ) {
if( ownerEntity.EntityState == EntityState.Added || ownerEntity.EntityState == EntityState.Detached ) return; // calling .Load on a Added object causes an exception.
}
if( end.IsAttached() && !end.IsLoaded ) end.Load();
}
This question is moot - I was connecting to my database server over a VPN connection. I thought it would be okay as the ping time was only about 50ms between my computer and the database server, but I forget how chatty SQL Server is. Considering that it fires off about 60 queries at once it makes sense that 60 * 50ms == 3000ms.
When I retried my application on the same LAN as the server (ping time < 1ms) the whole load operation executes in under 30ms with no need for any further optimization. Problem solved.
Related
I have a console program that moves Data between two different servers (DatabaseA and DatabaseB).
Database B is a Postgres-Server.
It calls a lot of stored procedures and other raw queries.
I use ExecuteSqlRaw a lot.
I also use NpsqlBulk.EfCore.
The program uses the same context instance for DatabaseB during the whole run it takes to finish.
Somehow i get locks on some of my tables on DatabaseB that never get released.
This happens always on my table mytable_fromdatabase_import.
The code run on that is the following:
protected override void AddIdsNew()
{
var toAdd = IdsNotInDatabaseB();
var newObjectsToAdd = GetByIds(toAdd).Select(Converter.ConvertAToB);
DatabaseBContext.Database.ExecuteSqlRaw("truncate mytable_fromdatabase_import; ");
var uploader = new NpgsqlBulkUploader(DatabaseBContext);
uploader.Insert(newObjectsToAdd); // inserts data into mytable_fromdatabase_import
DatabaseBContext.Database.ExecuteSqlRaw("call insert_myTable_from_importTable();");
}
After i run it the whole table is not accessable annymore and when i query the locks on the server i can see there is a process holding it.
How can i make sure this process always closes and releases its locks on tables?
I thought ef-core would do that automaticaly.
-----------Edit-----------
I just wanted to add that this is not a temporary problem during the run of the console. When i run this code and it is finished my table is still locked and nothing can access it. My understanding was that the ef-core context would release everything after it is disposed (if by error or by being finished)
The problem had nothing to do with ef core but with a wrong configured backupscript. The program is running now with no changes to it and it works fine
For concrete task you need right tools. Probably you have locks when retrieve Ids and also when trying to do not load already imported records. These steps are slow!
I would suggest to use linq2db (disclaimer, I'm co-author of this library)
Create two projects with models from different databases:
Source.Model.csproj - install linq2db.SQLServer
Destination.Model.csproj - install linq2db.PostgreSQL
Follow instructions in T4 templates how to generate model from two databases. It is easy and you can ask questions on linq2db`s github site.
I'll post helper class which I've used for transferring tables on my previous project. It additionally uses library CodeJam for mapping, but in your project, for sure, you can use Automapper.
public class DataImporter
{
private readonly DataConnection _source;
private readonly DataConnection _destination;
public DataImporter(DataConnection source, DataConnection destination)
{
_source = source;
_destination = destination;
}
private long ImportDataPrepared<TSource, TDest>(IOrderedQueryable<TSource> source, Expression<Func<TSource, TDest>> projection) where TDest : class
{
var destination = _destination.GetTable<TDest>();
var tableName = destination.TableName;
var sourceCount = source.Count();
if (sourceCount == 0)
return 0;
var currentCount = destination.Count();
if (currentCount > sourceCount)
throw new Exception($"'{tableName}' what happened here?.");
if (currentCount >= sourceCount)
return 0;
IQueryable<TSource> sourceQuery = source;
if (currentCount > 0)
sourceQuery = sourceQuery.Skip(currentCount);
var projected = sourceQuery.Select(projection);
var copied =
_destination.BulkCopy(
new BulkCopyOptions
{
BulkCopyType = BulkCopyType.MultipleRows,
RowsCopiedCallback = (obj) => RowsCopiedCallback(obj, currentCount, sourceCount, tableName)
}, projected);
return copied.RowsCopied;
}
private void RowsCopiedCallback(BulkCopyRowsCopied obj, int currentRows, int totalRows, string tableName)
{
var percent = (currentRows + obj.RowsCopied) / (double)totalRows * 100;
Console.WriteLine($"Copied {percent:N2}% \tto {tableName}");
}
public class ImporterHelper<TSource>
{
private readonly DataImporter _improrter;
private readonly IOrderedQueryable<TSource> _sourceQuery;
public ImporterHelper(DataImporter improrter, IOrderedQueryable<TSource> sourceQuery)
{
_improrter = improrter;
_sourceQuery = sourceQuery;
}
public long To<TDest>() where TDest : class
{
var mapperBuilder = new MapperBuilder<TSource, TDest>();
return _improrter.ImportDataPrepared(_sourceQuery, mapperBuilder.GetMapper().GetMapperExpressionEx());
}
public long To<TDest>(Expression<Func<TSource, TDest>> projection) where TDest : class
{
return _improrter.ImportDataPrepared(_sourceQuery, projection);
}
}
public ImporterHelper<TSource> ImprortData<TSource>(IOrderedQueryable<TSource> source)
{
return new ImporterHelper<TSource>(this, source);
}
}
So begin transferring. Note that I have used OrderBy/ThenBy to specify Id order to do not import already transferred records - important order fields should be Unique Key combination. So this sample is reentrant and can be re-run again when connection is lost.
var sourceBuilder = new LinqToDbConnectionOptionsBuilder();
sourceBuilder.UseSqlServer(SourceConnectionString);
var destinationBuilder = new LinqToDbConnectionOptionsBuilder();
destinationBuilder.UsePostgreSQL(DestinationConnectionString);
using (var source = new DataConnection(sourceBuilder.Build()))
using (var destination = new DataConnection(destinationBuilder.Build()))
{
var dataImporter = new DataImporter(source, destination);
dataImporter.ImprortData(source.GetTable<Source.Model.FirstTable>()
.OrderBy(e => e.Id1)
.ThenBy(e => e.Id2))
.To<Dest.Model.FirstTable>();
dataImporter.ImprortData(source.GetTable<Source.Model.SecondTable>().OrderBy(e => e.Id))
.To<Dest.Model.SecondTable>();
}
For sure boring part with OrderBy can be generated automatically, but this will explode this already not a short answer.
Also play with BulkCopyOptions. Native Npgsql COPY may fail and Multi-Line variant should be used.
I have a LINQ query
var age = new int[]{1,2,3};
dbContext.TA.WHERE(x=> age.Contains( x.age)).ToList()
In an online article #11 (https://medium.com/swlh/entity-framework-common-performance-mistakes-cdb8861cf0e7) mentioned it is not a good practice as it creates many execution plan at the SQL server.
In this case, how should LINQ be revised so that I can do the same thing but minimize the amount of execution plans generated?
(note that I have no intention to convert it into a stored procedure and pass & join with the UDT as again it requires too many effort to do so)
That article offers some good things to keep in mind when writing expressions for EF. As a general rule that example is something to keep in mind, not a hard "never do this" kind of rule. It is a warning over writing queries that allow for multi-select and to avoid this when possible as it will be on the more expensive side.
In your example with something like "Ages", having a hard-coded list of values does not cause a problem because every execution uses the same list. (until the app is re-compiled with a new list, or you have code that changes the list for some reason.) Examples where it can be perfectly valid to use this is with something like Statuses where you have a status Enum. If there are a small number of valid statuses that a record can have, then declaring a common array of valid statuses to use in an Contains clause is fine:
public void DeleteEnquiry(int enquiryId)
{
var allowedStatuses = new[] { Statuses.Pending, Statuses.InProgress, Statuses.UnderReview };
var enquiry = context.Enquiries
.Where(x => x.EnquiryId == enquiryId && allowedStatuses.Contains(x.Status))
.SingleOrDefault();
try
{
if(enquiry != null)
{
enquiry.IsActive = false;
context.SaveChanges();
}
else
{
// Enquiry not found or invalid status.
}
}
catch (Exception ex) { /* handle exception */ }
}
The statuses in the list aren't going to change so the execution plan is static for that context.
The problem is where you accept something like a parameter with criteria that include a list for a Contains clause.
it is highly unlikely that someone would want to load data where a user could select ages "2, 4, and 6", but rather they would want to select something like: ">=2", or "<=6, or "2>=6" So rather than creating a method that accepts a list of acceptable ages:
public IEnumerable<Children> GetByAges(int[] ages)
{
return _dbContext.Children.Where(x => ages.Contains( x.Age)).ToList();
}
You would probably be better served with ranging the parameters:
private IEnumerable<Children> GetByAgeRange(int? minAge = null, int? maxAge = null)
{
var query = _dbContext.Children.AsQueryable();
if (minAge.HasValue)
query = query.Where(x => x.Age >= minAge.Value);
if (maxAge.HasValue)
query = query.Where(x => x.Age <= maxAge.Value);
return query.ToList();
}
private IEnumerable<Children> GetByAge(int age)
{
return _dbContext.Children.Where(x => x.Age == age).ToList();
}
I have min 100 000 data into a Job_Details table and I'm using Entity Framework to map the data.
This is the code:
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
var lstJob = dbContext.Job_Details.ToList();
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
I found to this line is taking about 2-3 min to execute
var lstJob = dbContext.Job_Details.ToList();
How can i solve this issue?
To outline the performance issues with your example: (see inline comments)
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Loads *ALL* entities into memory. This effectively takes all fields for all rows across from the database to your app server. (Even though you don't want it all)
var lstJob = dbContext.Job_Details.ToList();
// Filters from the data in memory.
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
// Maps the entity to a DTO and adds it to the return collection.
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
// Returns the DTOs.
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
First: pass your WHERE clause to EF to pass to the DB server rather than loading all entities into memory..
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Will pass the where expression to be DB server to be executed. Note: No .ToList() yet to leave this as IQueryable.
var jobs = dbContext.Job_Details..Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null));
Next, use SELECT to load your DTOs. Typically these won't contain as much data as the main entity, and so long as you're working with IQueryable you can load related data as needed. Again this will be sent to the DB Server so you cannot use functions like "MapBEJobForSearchObj" here because the DB server does not know this function. You can SELECT a simple DTO object, or an anonymous type to pass to a dynamic mapper.
var dtos = jobs.Select(ie => new JobBO
{
JobId = ie.JobId,
// ... populate remaining DTO fields here.
}).ToList();
getJobResponse.Jobs = dtos;
return getJobResponse;
}
Moving the .ToList() to the end will materialize the data into your JobBO DTOs/ViewModels, pulling just enough data from the server to populate the desired rows and with the desired fields.
In cases where you may have a large amount of data, you should also consider supporting server-side pagination where you pass a page # and page size, then utilize a .Skip() + .Take() to load a single page of entries at a time.
I know this has been covered to some extend in other chains, but even after interpolating it to my needs, I'm still having issues. As many of you know documentation is virtually non-existent. Therefore, any help is greatly appreciated.
I'm trying to create a generic method which merely generates a DB script from a DLL containing EF entities. Then, I pass a DLL path and a name of the connection string defined in App.config. The first time through, the script is generated fine. I run it in SQLServer Studio and it generates all the tables, etc. However, when I run it the 2nd time (DB exists), it fails on the ScriptUpdate method with the following error:
Failed to set database initializer of type 'Disabled' for DbContext type 'AdminovateLibrary.Repository.EntityDbContext, Adminovate.AdminovateLibrary-Project' specified in the application configuration. See inner exception for details.
Inner exception is: "Could not load file or assembly 'MyLibrary' or one of its dependencies. The system cannot find the file specified.":"MyLibrary".
I assume that if the DLL contains exactly the same code as in the 1st run, the method should product an empty string. However, even if I run it with a modified DLL, it gives me the same error.
My code is as follows:
public string GenerateUpdateSchemaScript( string sourceDllFilePath, string targetConnectionName ) {
var dbMigrationsConfiguration = CreateConfiguration( sourceDllFilePath, targetConnectionName );
var dbMigrator = new DbMigrator( dbMigrationsConfiguration );
Database.SetInitializer( new CreateDatabaseIfNotExists<DbContext>() );
var scriptor = new MigratorScriptingDecorator( dbMigrator );
var script = scriptor.ScriptUpdate( null, null );
return RemoveCreateMigrationHistoryTable( dbMigrationsConfiguration, script );
}
private static DbMigrationsConfiguration CreateConfiguration2( string sourceDllFilePath, string targetConnectionName ) {
var assembly = Assembly.LoadFrom( sourceDllFilePath );
var configType = assembly.GetTypes().Single( type => typeof( DbMigrationsConfiguration ).IsAssignableFrom( type ) );
var configuration = ( DbMigrationsConfiguration )assembly.CreateInstance( configType.FullName );
if( configuration != null ) {
configuration.ContextType = assembly.GetTypes().Single( type => type.BaseType == typeof( DbContext ) );
configuration.MigrationsAssembly = assembly;
configuration.TargetDatabase = new DbConnectionInfo( targetConnectionName );
configuration.AutomaticMigrationsEnabled = true;
configuration.AutomaticMigrationDataLossAllowed = true;
}
return configuration;
}
I've been using JPA to insert entities into a database but I've run up against a problem where I need to do an insert and get the primary key of the record last inserted.
Using PostgreSQL I would use an INSERT RETURNING statement which would return the record id, but with an entity manager doing all this, the only way I know is to use SELECT CURRVAL.
So the problem becomes, I have several data sources sending data into a message driven bean (usually 10-100 messages at once from each source) via OpenMQ and inside this MDB I persists this to PostgreSQL via the entity manager. It's at this point I think there will be a "race condition like" effect of having so many inserts that I won't necessarily get the last record id using SELECT CURRVAL.
My MDB persists 3 entity beans via an entity manager like below.
Any help on how to better do this much appreciated.
public void onMessage(Message msg) {
Integer agPK = 0;
Integer scanPK = 0;
Integer lookPK = 0;
Iterator iter = null;
List<Ag> agKeys = null;
List<Scan> scanKeys = null;
try {
iag = (IAgBean) (new InitialContext()).lookup(
"java:comp/env/ejb/AgBean");
TextMessage tmsg = (TextMessage) msg;
// insert this into table only if doesn't exists
Ag ag = new Ag(msg.getStringProperty("name"));
agKeys = (List) (iag.getPKs(ag));
iter = agKeys.iterator();
if (iter.hasNext()) {
agPK = ((Ag) iter.next()).getId();
}
else {
// no PK found so not in dbase, insert new
iag.addAg(ag);
agKeys = (List) (iag.getPKs(ag));
iter = agKeys.iterator();
if (iter.hasNext()) {
agPK = ((Ag) iter.next()).getId();
}
}
// insert this into table always
iscan = (IScanBean) (new InitialContext()).lookup(
"java:comp/env/ejb/ScanBean");
Scan scan = new Scan();
scan.setName(msg.getStringProperty("name"));
scan.setCode(msg.getIntProperty("code"));
iscan.addScan(scan);
scanKeys = (List) iscan.getPKs(scan);
iter = scanKeys.iterator();
if (iter.hasNext()) {
scanPK = ((Scan) iter.next()).getId();
}
// insert into this table the two primary keys above
ilook = (ILookBean) (new InitialContext()).lookup(
"java:comp/env/ejb/LookBean");
Look look = new Look();
if (agPK.intValue() != 0 && scanPK.intValue() != 0) {
look.setAgId(agPK);
look.setScanId(scanPK);
ilook.addLook(look);
}
// ...
The JPA spec requires that after persist, the entity be populated with a valid ID if an ID generation strategy is being used. You don't have to do anything.