Azure Database, EF, Time out issues - entity-framework

I have taken over an existing MVC website which uses entity framework and hangfire and is hosted on Azure and uses Azure database. Every so often the website times out.
I'm new to Azure portal, entity framework and hangfire.
If I increase the DTU's it clears the timeout issues?
I'm looking for ways of how to diagnose why the website times out. I have added error logging using elmah and checked hangfire but this doesn't give me any further information.
Is there anything in azure portal that can help?

If it "times out" and if "increasing DTU resolves timeouts" and these observations are true (I think it's on you to really convince yourself this is absolutely true, don't make this assumption lightly) then the usual and obvious candidate is "a slow sql query". Entity Framework is often used with linq to create sql queries without writing sql. These queries are often fine for very simple tasks, such as someData.Where(x=>x.Id == 1).First(), however, if linq is used to join tables, or create complex associations, the generated sql can become monstrously bad, from a performance perspective. You can add logging to write out the sql generated by linq, or you can try to trace the database to see what sql is running on it. If tracing is out of the question, there are still meta queries you can use to view things like cached query plans and SQL Server can give you estimated costs and cached execution counts.
You can still hang yourself without using linq. You can still use stored procedures with EF. Way too many developers are naive about SQL performance still; you need to comb over your back end and learn the schema, the stored procedures; inspect the sql contents of everything. Check for any database triggers (easy to miss). Red flags are subqueries, too many joining, too many results from a query, lots of string manipulation in a query, joining tables on strings, or XML/JSON-based SQL work.
Be aware that "slow sql queries" will become slower when load is high. And when slow sql queries build up, they only take more time to resolve. This can also cause debilitating table locking, depending on the nature of the query.
But queries can be performant and still cause locking. ie One table is being written to often and it's blocking other writes or reads from that table. This is a little harder to diagnose, but you can figure it out by carefully inspecting logs of database calls and how long they take to execute. There are also sql queries you can run on the database to diagnose long-running queries, or what tables are locked at a given point in time.
Finally, check for any back end webjobs for your application. If timeouts occur at reoccurring days or times, then somebody's batch SQL could be blocking your production database from being read.
But this is all speculation. I think you need to do more research to determine what is actually causing the site to become unresponsive. If you can log response times for common queries, you can rule out SQL-based latency as being the culprit or not and work from there. There's nothing inherently "amiss" about any of the technologies you specified.
If queries are perfomant but still causing issues, a long term solution is to add something like a message queue and batch your sql work intelligently, or just make the database work asynchronous and not block the UI.
You should correlate any logged timeouts with azure's monitoring. Azure can give you CPU/RAM/page visits and such on the dashboard.

SQL Azure is a bit of a different beast. It doesn't have the on-demand performance of a dedicated DB unless you're prepared to throw serious $$ at it. And even then ...
EF, when written for well can perform quite well. When written poorly it can be a dog, and those problems are compounded on a platform like SQL Azure.
The first thing is to check that your EF contexts are set up to use an execution strategy suited to Azure: https://learn.microsoft.com/en-us/ef/ef6/fundamentals/connection-resiliency/retry-logic
The next thing would be to see what kinds of SQL tracing you can run on Azure. Tracing is essential to see what EF is doing behind the scenes. I'm not familiar with tools available for Azure, in my case my Azure experience was running SQL Server on VMs because SQL Azure was too immature, not HIPAA compliant at the time, and expensive for the DTU estimates we were able to get. Worst case, can you restore an database backup into an SQL Server instance and point a copy of your application environment temporarily at that to run through common usage scenarios? Using an SQL Trace you can pick up on exactly when and how often EF is executing queries, and what queries it is executing.
Things to look at:
How many queries are running? If you are loading a set of records and expect one query, are there a whole heap of queries getting sent? This would indicate lazy-load calls being triggered.
What queries are being run? Is it selecting a lot more fields than are being displayed? This would be potentially a case where entire entities are being loaded where a .Select() could be used to reduce the amount of data. Perhaps even the case where entire sets of entities are being loaded that aren't relevant to what is displayed/done, such as cases where someone is using .ToList() prior to just doing a .Count() or .Any() or doing a .FirstOrDefault() just to do a != null check.
Is the database properly indexed? Copy some of the heavier queries into SQL Manager and execute them with an execution plan. Are there indexing suggestions?
The common sins of developing with EF and other ORMs boil down to "pulling too much, too often." It's surprising how many clients I've worked with have development teams that have not used a profiler to inspect their ORM use efficiency. (and I'm talking 0% so far.)

Related

Should I filter data in PostgreSQL or server backend?

I am working on a project which uses graphql and PostgreSQL where we want to select data from the database with a value after a certain date. It is currently selecting all data from the database and then filtering it on the server:
.filter(({time}) => moment(time).isAfter(startTime))
However I would have thought it would be best to do this filtering in the database query as the full dataset is never used.
Is there any benefit to doing it on the server rather than in the database query?
Barring some unusual edge case -- such as other parts of your backend code really do need all the data for some reason -- it would definitely be more efficient to filter everything on the Postgres side via the SQL that is being used to fetch the data in the first place.
This is true for several reasons:
Assuming the table is properly indexed, the filtering will be able to occur much faster within the database.
The unneeded data will not need to be serialized and sent over the wire to the backend, only to then be discarded by the backend's own filtering.
The memory footprint should be reduced on both the Postgres and server end due to needing to process only a portion of the results.
I've not worked with GraphQL myself, but from doing a bit of poking around through its docs, it appears GraphQL often uses other mechanisms in different layers (outside of the database) to try to improve performance.
It would be worth seeing what the actual SQL is that your GraphQL query is generating (that may be possible via a function in GraphQL; it could also be done by enabling certain log settings on the Postgres server and correlating the log output to the query). That may lead to further optimization possibilities if you want to keep things purely GraphQL.
Jumping down to a raw query seems like it would be a good possibility though. Certainly that is something that is often done with ORMs like Django and ActiveRecord.

Why does Azure Database perform better with transactions

We decided to use a micro-orm against an Azure Database. As our business only needs "inserts" and "selects", we decided to suppress all code-managed SqlTransaction (no concurrency issues on data).
Then, we noticed that our instance of Azure Database responded very slowly. The "rpc completed" event occured in delays that are hundreds times the time needed to run a simple sql statement.
Next, we benchmarked our code with EF6 and we saw that the server responded very quickly. As EF6 implements a built-in transaction, we decided to restore the SqlTransaction (ReadCommited) on the micro-orm and we noticed everything was fine.
Does Azure Database require an explicit SqlTransaction (managed by code) ? How does the SqlTransaction influence Azure Database performances ? Why was it implemented that way ?
EDIT : I am going to post some more precise information about the way we collected traces. It seems our Azure events logs sometimes express in nanoseconds, sometimes in milliseconds. Seems so weird.
If I understand what you are asking correctly, batching multiple SQL queries into one transaction will give you better results on any DBS. Committing after every insert/update/delete has a huge overhead on a DBS that is not designed for it (like MyISAM on MySQL).
It can even cause bad flushes to disk and thrashing if you do too much. I once had a programmer committing thousands of entries to one of my DBs every minute, each as their own transactions, and it brought the server to a halt.
InnoDB, one of 2 most popular database formats for MySQL, can only commit 20-30 transactions a second (or maybe it was 2-3... it's been a long time), as each is flushed to the disk at the end for ACID compliance.

Is it mandatory to run Database Designer for every schema in HP Vertica?

Constantly i have been hitting with Resource pool allocation error after creating several tables in new schema.
After running the Database Designer in vertica for particular schema with all tables the queries are running fine.
Kindly help me to understand the concept.
The Database Designer is optional; you don't have to use it at all. Using it helps you optimize your physical layout, and if you're having trouble with resource-pool allocation it sounds like you might benefit from that.
From the documentation:
The HP Vertica Database Designer:
Analyzes your logical schema, sample data, and, optionally, your sample queries.
Creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Can be used by anyone without specialized database knowledge.
Can be run and rerun any time for additional optimization without stopping the database.
Uses strategies to provide optimal query performance and data compression.
You can run DBD for just a particular query (optimizes whatever's needed to support that query) or for your entire database. It uses sample queries that you provide, so if your usage patterns change over time it can help to rerun it.

entity framework performance

I am using Entity Framework to layer on my SQL Server 2008 database. The EF is present in my web service and the webservice is invoked by a Silverlight client.
I am seeing a serious performance issue in terms of the duration taken by a query to execute in the EF. This wouldn't happen in the consecutive calls.
A little bit of googling revealed that, it's caused per app domain to construct the in-memory model of the db objects. I found this Microsoft link explaining pre-generation of views for performance improvement. Even after implementing the steps, the performance actually degraded instead of improving. I am curious, if anyone has tried this approach successfully and if there are any other avenues for improving performance.
I am using .NET 3.5.
A couple areas to look at for EF performance
Do as much of the processing before calling things like tolist(). ToList will bring everything in the set into memory. By default, EF will keep building the expression tree and only actually process it when you need the data in memory. That first query will be against the database, but afterwards the processing will be in memory. When working with large data, you definitely want as much of the heavy lifting done by the database as possible.
EF 1 only has the option to pull the entire row back. Therefore if you have a column that is a large string or binary blob, it is going to be pulled down and into memory whether you need it or not. You can create a projection that doesn't include this column, but then you don't get the benefits of having it be an entity.
You can look at the sql generated by EF using the suggestion in this post
How do I view the SQL generated by the Entity Framework?
The same laws of physics apply for EF queries as they do for ordinary SQL. Check your database tables and make sure that you have indexes on primary and foreign keys, that your database is properly normalized, and so forth. If performance is degrading after Microsoft's suggestions, then that's my guess as to the problem area.
Are you hosting the webservice in IIS? Is it running on the same site as the Silverlight App? What about the database itself? Is it running on a dedicated machine? Are there other apps hitting it? The first call to a dormant database is painful (I've had situations where it would actually time out in my environment.)
There are a number of factors to take into consideration here. But it comes down to more than just EF's overhead.
edit I didn't fully qualify but the process of opening the first connection to SQL Server is slow regardless of your data access solution.
Use SQL Profiler to check how many queries executed to retrieve your data.If it's large number use Include() method of ObjectQuery to retrieve child objects with parent in one query.

What's the drawback of SqlBulkCopy

I have done some research for "The bast way to insert huge data into DB with C#" then a lot of people just suggested me using SqlBulkCopy. After I tried it out and it really amazed me. Undoubtedly, SqlBulkCopy is very very fast. It seems that SqlBulkCopy is a perfect way to insert data (especially huge data). But why dont we use it at all times. Is there any drawback of using SqlBulkCopy?
SqlBulkCopy does exist for Oracle v11 as well, but it's provided by the Oracle .NET assemblies you get when you install Oracle Client. The SqlBulkCopy class is basically implemented one by one, by the provider of the target database engine.
One HUGE drawback, though - there is absolutely no error reporting. If, for example, you've updated data in a DataSet, are flushing it back tothe DB with an adapter, and there's a key violation (or any other failure), the culprit DataRows will have .HasErrors set to true, and you can add that to your exception message when it's raised.
With SqlBulkCopy, you just get the type of the error and that's it. Good luck debugging it.
Two reasons I can think of:
As far as I know, it's only available for Microsoft SQL Server
In a lot of normal workloads, you don't do bulk inserts, but occasional inserts intermixed with selects and updates. Microsoft themselves state that a normal insert is more efficient for that, on the SqlBulkCopy MSDN page.
Note that if you want a SqlBulkCopy to be equivalent to a normal insert, at the very least you'll have to pass it the SqlBulkCopyOptions.CheckConstraints parameter.