I am using Entity Framework to layer on my SQL Server 2008 database. The EF is present in my web service and the webservice is invoked by a Silverlight client.
I am seeing a serious performance issue in terms of the duration taken by a query to execute in the EF. This wouldn't happen in the consecutive calls.
A little bit of googling revealed that, it's caused per app domain to construct the in-memory model of the db objects. I found this Microsoft link explaining pre-generation of views for performance improvement. Even after implementing the steps, the performance actually degraded instead of improving. I am curious, if anyone has tried this approach successfully and if there are any other avenues for improving performance.
I am using .NET 3.5.
A couple areas to look at for EF performance
Do as much of the processing before calling things like tolist(). ToList will bring everything in the set into memory. By default, EF will keep building the expression tree and only actually process it when you need the data in memory. That first query will be against the database, but afterwards the processing will be in memory. When working with large data, you definitely want as much of the heavy lifting done by the database as possible.
EF 1 only has the option to pull the entire row back. Therefore if you have a column that is a large string or binary blob, it is going to be pulled down and into memory whether you need it or not. You can create a projection that doesn't include this column, but then you don't get the benefits of having it be an entity.
You can look at the sql generated by EF using the suggestion in this post
How do I view the SQL generated by the Entity Framework?
The same laws of physics apply for EF queries as they do for ordinary SQL. Check your database tables and make sure that you have indexes on primary and foreign keys, that your database is properly normalized, and so forth. If performance is degrading after Microsoft's suggestions, then that's my guess as to the problem area.
Are you hosting the webservice in IIS? Is it running on the same site as the Silverlight App? What about the database itself? Is it running on a dedicated machine? Are there other apps hitting it? The first call to a dormant database is painful (I've had situations where it would actually time out in my environment.)
There are a number of factors to take into consideration here. But it comes down to more than just EF's overhead.
edit I didn't fully qualify but the process of opening the first connection to SQL Server is slow regardless of your data access solution.
Use SQL Profiler to check how many queries executed to retrieve your data.If it's large number use Include() method of ObjectQuery to retrieve child objects with parent in one query.
Related
I am working on a project which uses graphql and PostgreSQL where we want to select data from the database with a value after a certain date. It is currently selecting all data from the database and then filtering it on the server:
.filter(({time}) => moment(time).isAfter(startTime))
However I would have thought it would be best to do this filtering in the database query as the full dataset is never used.
Is there any benefit to doing it on the server rather than in the database query?
Barring some unusual edge case -- such as other parts of your backend code really do need all the data for some reason -- it would definitely be more efficient to filter everything on the Postgres side via the SQL that is being used to fetch the data in the first place.
This is true for several reasons:
Assuming the table is properly indexed, the filtering will be able to occur much faster within the database.
The unneeded data will not need to be serialized and sent over the wire to the backend, only to then be discarded by the backend's own filtering.
The memory footprint should be reduced on both the Postgres and server end due to needing to process only a portion of the results.
I've not worked with GraphQL myself, but from doing a bit of poking around through its docs, it appears GraphQL often uses other mechanisms in different layers (outside of the database) to try to improve performance.
It would be worth seeing what the actual SQL is that your GraphQL query is generating (that may be possible via a function in GraphQL; it could also be done by enabling certain log settings on the Postgres server and correlating the log output to the query). That may lead to further optimization possibilities if you want to keep things purely GraphQL.
Jumping down to a raw query seems like it would be a good possibility though. Certainly that is something that is often done with ORMs like Django and ActiveRecord.
I have taken over an existing MVC website which uses entity framework and hangfire and is hosted on Azure and uses Azure database. Every so often the website times out.
I'm new to Azure portal, entity framework and hangfire.
If I increase the DTU's it clears the timeout issues?
I'm looking for ways of how to diagnose why the website times out. I have added error logging using elmah and checked hangfire but this doesn't give me any further information.
Is there anything in azure portal that can help?
If it "times out" and if "increasing DTU resolves timeouts" and these observations are true (I think it's on you to really convince yourself this is absolutely true, don't make this assumption lightly) then the usual and obvious candidate is "a slow sql query". Entity Framework is often used with linq to create sql queries without writing sql. These queries are often fine for very simple tasks, such as someData.Where(x=>x.Id == 1).First(), however, if linq is used to join tables, or create complex associations, the generated sql can become monstrously bad, from a performance perspective. You can add logging to write out the sql generated by linq, or you can try to trace the database to see what sql is running on it. If tracing is out of the question, there are still meta queries you can use to view things like cached query plans and SQL Server can give you estimated costs and cached execution counts.
You can still hang yourself without using linq. You can still use stored procedures with EF. Way too many developers are naive about SQL performance still; you need to comb over your back end and learn the schema, the stored procedures; inspect the sql contents of everything. Check for any database triggers (easy to miss). Red flags are subqueries, too many joining, too many results from a query, lots of string manipulation in a query, joining tables on strings, or XML/JSON-based SQL work.
Be aware that "slow sql queries" will become slower when load is high. And when slow sql queries build up, they only take more time to resolve. This can also cause debilitating table locking, depending on the nature of the query.
But queries can be performant and still cause locking. ie One table is being written to often and it's blocking other writes or reads from that table. This is a little harder to diagnose, but you can figure it out by carefully inspecting logs of database calls and how long they take to execute. There are also sql queries you can run on the database to diagnose long-running queries, or what tables are locked at a given point in time.
Finally, check for any back end webjobs for your application. If timeouts occur at reoccurring days or times, then somebody's batch SQL could be blocking your production database from being read.
But this is all speculation. I think you need to do more research to determine what is actually causing the site to become unresponsive. If you can log response times for common queries, you can rule out SQL-based latency as being the culprit or not and work from there. There's nothing inherently "amiss" about any of the technologies you specified.
If queries are perfomant but still causing issues, a long term solution is to add something like a message queue and batch your sql work intelligently, or just make the database work asynchronous and not block the UI.
You should correlate any logged timeouts with azure's monitoring. Azure can give you CPU/RAM/page visits and such on the dashboard.
SQL Azure is a bit of a different beast. It doesn't have the on-demand performance of a dedicated DB unless you're prepared to throw serious $$ at it. And even then ...
EF, when written for well can perform quite well. When written poorly it can be a dog, and those problems are compounded on a platform like SQL Azure.
The first thing is to check that your EF contexts are set up to use an execution strategy suited to Azure: https://learn.microsoft.com/en-us/ef/ef6/fundamentals/connection-resiliency/retry-logic
The next thing would be to see what kinds of SQL tracing you can run on Azure. Tracing is essential to see what EF is doing behind the scenes. I'm not familiar with tools available for Azure, in my case my Azure experience was running SQL Server on VMs because SQL Azure was too immature, not HIPAA compliant at the time, and expensive for the DTU estimates we were able to get. Worst case, can you restore an database backup into an SQL Server instance and point a copy of your application environment temporarily at that to run through common usage scenarios? Using an SQL Trace you can pick up on exactly when and how often EF is executing queries, and what queries it is executing.
Things to look at:
How many queries are running? If you are loading a set of records and expect one query, are there a whole heap of queries getting sent? This would indicate lazy-load calls being triggered.
What queries are being run? Is it selecting a lot more fields than are being displayed? This would be potentially a case where entire entities are being loaded where a .Select() could be used to reduce the amount of data. Perhaps even the case where entire sets of entities are being loaded that aren't relevant to what is displayed/done, such as cases where someone is using .ToList() prior to just doing a .Count() or .Any() or doing a .FirstOrDefault() just to do a != null check.
Is the database properly indexed? Copy some of the heavier queries into SQL Manager and execute them with an execution plan. Are there indexing suggestions?
The common sins of developing with EF and other ORMs boil down to "pulling too much, too often." It's surprising how many clients I've worked with have development teams that have not used a profiler to inspect their ORM use efficiency. (and I'm talking 0% so far.)
I have a small project with inherited C# code, specifically Entity Framework Core. This is hosted in Azure and recently I saw a very interesting feature that I would like to try out: "Automatic Tuning" for the database.
I have a couple of questions regarding this:
Would it conflict with my Entity Framework, as the database objects were originally created from code? My understanding is that it shouldn't, but I would like to be sure.
Is it worth it or anyone had any trouble with it?
Thanks!
Automatic Tuning does not get in conflict in any way with Entity Framework (EF). It just create indexes needed by queries in use on your application. It also drops duplicated and unneeded indexes (but existent unique indexes are not dropped) and chooses the best query plan created by SQL Server. None of these are related to EF.
One thing you need to consider is that Azure SQL Database needs to monitor query activities at least for a day in order to identify some recommendations.
Another thing to take in consideration is that Automatic Tuning does not update statistics and does not defrag indexes.
I am starting with the Entity Framework. It sounds great. But I am wondering if I should watch out for some weakness somewhere. Any experience there?
You probably need to start prefixing these questions with the version you are talking about. A good amount of the annoyances have been fixed in the upcoming version in .NET 4.0.
Here is what I would say after working with the first version for about 6 months using a decent size DB in sql 2k8(40+ tables, several tables with close to 1M rows, and decent amount of traffic)
Lack of Foreign key properties. Meaning if I want to know or work with just the id of a related table I have to load the actual entity. (fixed in next version)
Utter lack of an easy outer join like linq to sql has when using DefaultIfEmpty. Fixed in next version.
Generated Sql is less than optimal This seems to be fixed in next version as well
Very difficult to abstract from your code for testability and for use in multi tiered environments, but it can be done. This can also be classified as the POCO problem that also has been resolved.
There are more, but these are my top ones.
Overall I would use it again, but if you are starting from scratch please save yourself some pain and wait for the latest version or start using the beta if you can.
You might find the walkthroughs for Entity Framework 4.0 useful. All of the new features discussed are annoying emissions from the currently released version for someone.
I found the new TDD/testability features and T4 code generation features especially interesting.
About EF1:
Generated SQL is horrible. It multiples joins, it is 10x bigger than it could. I had a simple query, but with a lot of joins and generating this query by EF (not executing) was slowing down significantly my application. No, I couldn't use precompiled query. I used view to cope with it. SQL Profiler was helpful.
Primary keys in views are not recognized properly. You have to change edmx file by hand when you import view or doing schema refresh.
You can design entities from database in graphical manner, update model from database, but it doesn't always work good, specially when you change field types or foreign keys.
You can't update one table in model, always have to update whole model from db.
You can't define base class for your entities, it is already defined (EntityObject). You can use interfaces, because classes are defined as partial.
No POCO, entity classes are strongly connected to framework.
You can set foreign key by EntityReference.EntityKey, but when you have EntityCollection, prepare for round trip to db. Or am I missing something?
I am finding the POCO objects and model-first design in the EF4 beta very sexy.
I have done some research for "The bast way to insert huge data into DB with C#" then a lot of people just suggested me using SqlBulkCopy. After I tried it out and it really amazed me. Undoubtedly, SqlBulkCopy is very very fast. It seems that SqlBulkCopy is a perfect way to insert data (especially huge data). But why dont we use it at all times. Is there any drawback of using SqlBulkCopy?
SqlBulkCopy does exist for Oracle v11 as well, but it's provided by the Oracle .NET assemblies you get when you install Oracle Client. The SqlBulkCopy class is basically implemented one by one, by the provider of the target database engine.
One HUGE drawback, though - there is absolutely no error reporting. If, for example, you've updated data in a DataSet, are flushing it back tothe DB with an adapter, and there's a key violation (or any other failure), the culprit DataRows will have .HasErrors set to true, and you can add that to your exception message when it's raised.
With SqlBulkCopy, you just get the type of the error and that's it. Good luck debugging it.
Two reasons I can think of:
As far as I know, it's only available for Microsoft SQL Server
In a lot of normal workloads, you don't do bulk inserts, but occasional inserts intermixed with selects and updates. Microsoft themselves state that a normal insert is more efficient for that, on the SqlBulkCopy MSDN page.
Note that if you want a SqlBulkCopy to be equivalent to a normal insert, at the very least you'll have to pass it the SqlBulkCopyOptions.CheckConstraints parameter.