Azure SQL / EF performance issues

Azure SQL / EF performance issues - entity-framework

Recently I have run into some pretty weird performance issues with SQL Azure / Web Apps / Entity Framework.
It would appear that occasionally calls to the database (both read and write queries) would hang for anywhere between a few seconds and a few minutes (!). This happens even on a select query on a table with 4 columns containing 5 rows of data.
The issue appears to happen at random and is not reproducible. Upgrading DB to a higher performance tier seems not to have any effect. Both the web app and the sql azure database are in the same region.
DB performance graph is generally flat lining around 0.5% of resource utilization with an occasional spike to around 5% - so the issue certainly does not lie with the resource constraints.
I would have no clue how to start investigating the issue given the intermittent nature of it. I would greatly appreciate any feedback. Last week the issue
Could it have something to do with the way Entity Framework handles the DB connections specifically to the sql azure? Testing on the local SQL express has never caused anything similar.

After battling performance issues with Entity Framework, we finally switched over to Dapper and saw a huge increase in performance. They have some benchmarks on their GitHub page showing the speed difference.
https://github.com/StackExchange/dapper-dot-net
Also, I am unsure what version of EF you are using, but if it is EF Core, the performance of that is currently worse than that of previous versions. Another comparison of performance can be found here: https://www.exceptionnotfound.net/dapper-vs-entity-framework-vs-ado-net-performance-benchmarking/.

Ok, so looks like I have found a solution for my performance issues - and it as as simple as enabling Multiple Active Result Sets (https://msdn.microsoft.com/en-us/library/h32h3abf(v=vs.110).aspx) for the conenction.
There must have been some change in the EF framework from around 2 years ago - as the issue has only arisen after I have upgraded to 6.1.3. I'm not sure what the original version was - but it was which ever one was current 2 years ago.
I hope this helps someone else. It has caused me a lot of grief and cost a large potential project to fall through.

Related

How to synchronize deployments (especially of database object changes) on multiple environments

I have this challenge. I am the DevOps engineer and a software engineer in a team where months back, the developers moved from having a central Oracle DB to having the DB on a CentOS VM on their individual laptops. The move from a central DB was to reduce dependency on the DBAs and also to eliminate issues that stemmed from inconsistent data.
The plan for sharing and ensuring synchronization of the Database with everyone on the team was that each person will share change scripts with everyone. The problem is that we use Skype for communication (we just setup slack but are yet to start using it fully), and although people sometimes post the text of DB change scripts, it could be missed by some. The other problem is that some developers miss posting the changes. Further, new releases are deployed in Production without being deployed on the Test and Demo environments.
This has posed a serious challenge for us, especially myself who of recent, became responsible for ensuring that our Demo deployments were in sync with the Production deployments.
Most of the synchronization issues border on the lack of sync of the Database due to missing change scripts or missing DB objects. Oracle is our DB of preference.
A typical deployment in the Demo environment is a very painful process that involves testing an application and as issues occur due to missing DB table columns, functions, stored procs, we have to look for the missing DB objects, apply them to the DB and then continue until all issues are resolved.
How can I solve this problem to ensure smooth, painless and less time-consuming deployments? Can migrating our applications to Docker help with the DB synchronization issues and the associated lack of discipline of the developers? What process can we put into place to improve in this area?
Thank you very much in advance for your help.

Have a look # http://www.dbmaestro.com
I strongly recommend you to join the live demo session
DBmaetro TeamWork can help you merge the changes from multiple DBs into a single shared DB and to move safely the changes from one environment to the other
Danny

What are the concerns of using Crate.io as a primary datastore?

If I am correct, crate (crate.io) is backed by Elasticsearch (Lucene). Weren't there a few articles a month ago that said that ES lost some writes under heavy load? Are there any other concerns?

You are right, Crate is backed by Elasticsearch. We think that the guys at elasticsearch are doing a great job on improving data consistency. A good read is http://www.elasticsearch.org/blog/resiliency-elasticsearch/ which gives a pretty good overview about efforts towards reliability. We at Crate are confident that this storage engine is safe to use as primary store. We also see, that issues regarding this area are getting actively worked on by the Lucene and Elasticsearch Community.

I am currently evaluating Crate.io as a primary datastore for work. As the above answer is vague and unspecific, maybe it's time for an update on this question here. There is a Dec.2016 keynote presentation out there from the Jepsen author Kyle Kingsbury on Youtube who investigated Crate.io for some problems with the resiliency in Elasticsearch. The first 8 minutes are introduction, the Crate.io part is from 23:50 till 31:10.
For those of you who don't want to watch the full video, here is a short summary.
First, the test setup. They set up databases and a random pattern of clients with random queries. Also, they voluntarily introduced problems for the databases, like network partitioning. Secondly, the results. According to Kingsbury, there are two issues with ES resiliency. Both of them carry on to Crate.io. Let's get to the details...
Dirty reads
The first one - ES #20031 - is that ES may cause dirty reads, divergence and lost updates if network partitions occur. As of now - December 2017 - this issue is still open. In my opinion, it is possible for the same problems to occur if a node is unresponsive for extremely heavy duty, like during extensive querying, reindexing or garbage collection.
Lost updates
According to Kingsbury, there is another problem ("Can promote stale binaries") with ES that cause updates to get completely lost when network partitioning occurs. It has been tagged as #20384 and there is kind of a fix which Kingsbury summarizes as "partial". So, ES may still cause lost data upon writing.
What does ES say?
On the official site of ES about resiliency, only one of the two problems - #20384 - is mentioned. It has been marked as solved in the version 5.0 release notes, although the official site says that there is only a partial fix.
What does Crate.io say?
On the Crate.io documentation on resiliency, there is a list of known problems with Crate.io resiliency. The ES bug #20384 is commented as partially fixed and still causing an open problem. The ES bug #20031 is not mentioned. However, there is a paragraph about an issue with networking partition which Crate.io marked as fixed - so the official page is kind of inconclusive here.
Conclusion
Kingsbury concluded in December 2016 that Crate.io should not be used as the primary data store. It could be used of course as a replication of your primary data to benefit from the time series database features that Crate.io offers. He also suggests that for machine data where 5% data loss is not a severe problem, Crate.io is a viable option as primary store.
It is my impression that some bugs Kingsbury reported may have been fixed but not all.

EventStore vs. MongoDb [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I would like to know what advantages there are to using EventStore (http://geteventstore.com) over implementing event sourcing yourself in a MongoDb.
The reason I ask, is that our company has a number of people that work with MongoDb daily. They don't work with Event Sourcing though. While they are not completely in the dark about the subject, they aren't about to start implementing it anywhere either.
I am about to start a project, that is perfectly suited for Event Sourcing. There are about 16 very well defined events, and about 7 well defined projections. I say "about" because I know there will be demand for more projections and events once they see the product in use.
The approach is going to be API first, with a REST Api that other parts of our organisation are going to consume.
While I have read a lot about Event Sourcing the way Greg Young defines it, I have never actually implemented an Event Sourcing solution.
This is a green field project. No technology restrictions since we are going to expose everything as a REST interface. So if anyone has working experience with EvenStore or Event Sourcing with MongoDb please enlighten me.
Also an almost totally non related question about Event Sourcing:
Do you ever query the event store directly? Or would you always create new projections and replay event to populate those projections?

Disclaimer I am Greg Young (if you cant read my name :))
I am going to answer this question though I believe it will likely get deleted anyways. This question alone for me is a bit odd, but the answers are fairly bizarre. I won't take the time to answer each reply individually but will instead put all of my comments in this reply.
1) There is a comment that we only run on a custom version of mono which is a detail but... This is not the case (and has not been for over a year). We were waiting on critical patches we made to mono (as example threadpool.c to hit their master). This has happened.
2) EventStore is 3-clause BSD licensed. Not sure how you could claim we are not Open Source. We also have a company behind it and provide commercial support.
3) Someone mentioned us going on to version 3 in Sept. Version 1 was released 2 years ago. Version 2 added Clustering (obviously some breaking changes vs single node). Version 3 is adding a ton of stuff including ability to have competing consumers. Very little has changed in terms of the actual client protocol over this time (especially for those using the HTTP API).
What is really disturbing for me in the recommendations however is that they don't seem to understand what they are comparing. It would be roughly the equivalent of me saying "Which should I use neo4j or leveldb?". You could build yourself a graph database on top of leveldb but that would be quite a bit of work.
Mongo in this case would be a storage engine on the event store the OP would have to write him/herself. The writing of a production quality event store is a non-trivial exercise on top of a storage engine if you want to have even the most basic operations.
I wrote this in response to the mailing list equivalent of this question:
How will you do the following with Mongo?:
Write and read events to/from streams with ordering/optimistic concurrency/etc
Then:
Your projections don't want to read from streams in the same way they were written, projections are normally interested in event types and want all events of type T regardless of stream written to and in proper order.
You probably also want for instance the ability to switch live from pushed event notifications to handling pulled information (eg polling) etc.
It would make more sense if Kafka, datomic, and Event Store were being compared.

Seeing as the other replies don't talk about the tooling or benefits in EventStore and only refer to the benefits of MongoDB I'll chime in. But note that my experience is limited.
I'll start with the cons...
There are a lot of check-ins which can lead to deciding which version you are going to actively support yourself. While the team has been solidifying their releases, that they have arrived at version 3 not even 18 months after being released should be an indicator that you have to pull up the version you are supporting for another more recent version (which can also impact the platform you choose to deploy to).
It's not going to easily work on every platform (especially if you're trying to move to a cloud environment or a docker based lxc container). Some of this is due to the community surrounding other DBs such as Mongo. But the team seems to have been working their butts off on read/write performance while maintaining cross platform stability. As time presses on I've found that you don't want to deviate too far from a bare-metal OS implementation which this day in age is not attractive.
Uses a special version of Mono. Finding support for older versions of Mono only serve to make the process more of a root canal.
To make the most of performance of EventStore you really need to think about your architecture. EventStore outputs to flat files and event data can grow pretty quickly. What's the fail rate of the disks are you persisting your data to. How are things compressed? archived? etc. You have a lot of control and the control is geared towards storing your data as events. However, while I'm sure Greg Young himself could quote me to my grave the features that optimize and save your disks in the long term, I'll more than likely find a mature Mongo community that has had experience running into similar cases.
And the Pros...
RESTful - It's AtomPub. Is your stream not specific enough? Create another and do http gets till your hearts content. Concerned about routing do do an http forward. Concerned about security put an http proxy in front. Simple!
You have a nice suite of tools and UI for testing out and building your projections as your events start to generate new data (eg. use chrome browser as a way to debug your projections... ya they're written with java script)
Read performance - Since the application outputs to a flat file you can get kernel level caching and expose them via http in the drop of a hat. Also indexes are across your streams for querying projections against larger data sets (but I really get the feeling index performance will creep up on you over time).
I personally would not use this for a core / mission critical / or growing application! However, if you have a side case for keeping your evented environment interesting then I'd give it go! I personally have to stick to Mongo for now.

node.js JSON/REST framework for a single-page application

I'm developing an application that consists of a 'fat' javascript client backed by a JSON/REST server for data access. The data is stored in mongodb but the system should be flexible enough to switch to a relational backend if required.
I started out with pintura as the server side framework but recently ran into some problems (specifically with the perstore/filestore). I noticed that one problem has even been reported (including a fix) over a month ago, but there has been no reply to it and the issue is still present.
There seems to be relatively low activity in this project so I was wondering if many people are actually using it, and how stable the framework is.
Does anybody here have experience with this framework or know of an alternative framework that has similar capabilities?

I agree the project and the website/blog does not seem to be active, although the perstore repository does have recent activity. I'd contact the author there since your problem seems more related to that.
A search for REST on http://search.npmjs.org/ will show quite a few results, although I cannot recommend any from experience.

How do I properly test my database performance with high load demand?

I have found a lot of topics about stress-testing web application.
My goals are different, it's to test only database (sybase sql anywhere 9).
What I need:
Some tool to give a diagnostic of all sqls and find a bottleneck. I wish I could macro-view the entire system easily.
Best practices to design/build a good sql queries.
The system issues are:
20GB database size.
2-5 request per second
Thousands sql spread in the code (this messy can be solved only rewriting the system).

The quickest way would actually be to upgrade your SQL Anywhere to v10 or (better) v11, as the latest releases include a complete performance diagnostic toolset. See the documentation here for more details.

several open source tools are listed here:
http://www.opensourcetesting.org/performance.php

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse