We are very successfully using EF 5.0 for our real time server as well as from our internal websites. Now I need to create a utility that parses the data history to create a new table, which will be done using a copy of the production database used for data mining. Given that EF is transaction based, is there a good way to create a very large table where the table may have > 1M rows? My current thinking is no, and that the way to do this is to perhaps read the data with EF, but create a CSV file that is then bulk loaded, which I successfully do in some other situations. I'm not necessarily looking for the most efficient way, but I cannot imagine that EF or SQL would do well to add > 1M records with a single transaction. I know I could batch them in 1000 record chunks but that is not especially appealing. EF is said to be MSFT's principal data access technology going forward but they need to support this sort of scenario as part of that plan. Any ideas and insight appreciated. Thx.
EF is not geared for bulk operations (see Efficient way to do bulk insert/update with Entity Framework).
Instead of using a CSV to bulk-load, perhaps you want to look into the SQL Server Bulk Copy API
Related
I have a small project with inherited C# code, specifically Entity Framework Core. This is hosted in Azure and recently I saw a very interesting feature that I would like to try out: "Automatic Tuning" for the database.
I have a couple of questions regarding this:
Would it conflict with my Entity Framework, as the database objects were originally created from code? My understanding is that it shouldn't, but I would like to be sure.
Is it worth it or anyone had any trouble with it?
Thanks!
Automatic Tuning does not get in conflict in any way with Entity Framework (EF). It just create indexes needed by queries in use on your application. It also drops duplicated and unneeded indexes (but existent unique indexes are not dropped) and chooses the best query plan created by SQL Server. None of these are related to EF.
One thing you need to consider is that Azure SQL Database needs to monitor query activities at least for a day in order to identify some recommendations.
Another thing to take in consideration is that Automatic Tuning does not update statistics and does not defrag indexes.
I'm committed along the route of using SQLite without core data.
I need to speed up a function which performs some database transactions after querying the database. I've created a dictionary for the rows with all the values I'll need.
I need to do this to avoid the database locking.
At the moment I'm calling my add record to database function, which opens and closes the database each time.
Obviously this is where the process is slow.
I was thinking that it's common for apps to be embedded with a database setup script, so it must be possible to run a batch of queries.
So I'm thinking if I can build up a string with all my queries I could just execute that.
But I'm not 100% this is the best approach or how to execute batch queries.
Can anyone advise me how to proceed?
For starters .. check out these links:
how-do-i-improve-the-performance-of-sqlite
ios-coredata-batch-insert (Yes I know that you said no core data - but it is worth a read)
fast-bulk-inserts-into-sqlite (Looks similar in content to the first link)
I was about to do the same - using plain SQLite instead of CoreData - but changed my mind later. In that process if found this link useful: Improve INSERT-per-second performance of SQLite? . Beyond the obvious (transaction,prepared statement,..) it uses some SQLite specific performance tweaks.
I am using Self Tracking Entities with the Entity Framework 4. I have 2 databases, with the exact same schema. However, tables in one database will be added to/edited etc (and I mean data will be added/edited, not the actual table definitions) and at certain points of the day I will need to synchronize all the changes between this database and the other database.
I can create a separate context for both of them. But if I read a large graph from one database, how can I update the other database with the graph? Is there an easy way?
My database model is large and complex and fully relational. So it would be a big job to go through every single entity and do a read from the other database to see if it exists or not, update/insert it if need be, and then carry this on through the full object graph!
Any ideas?
This is not a use case for EF. In EF you will have to do exactly what you've described. Self tracking entities are able to track changes to these object instances - they know nothing about changes made to their own database over time and they will not know anything about state of your second database as well.
Try to look at SQL server native features (including mirroring, transaction log shipping or SSIS) and MS Sync framework. Depending on your detailed requirements these tools can suite you better.
We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).
I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).
I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –
The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns.
(this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)
Has anyone come across this?
AND:
The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintainable solution.
I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard
http://msdn.microsoft.com/en-us/library/ms141715.aspx
The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
SCD is way too slow for what I want. I need to use set based sql.
It turned out that a lot of my problems were with bugs in the provider.
I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.
Also Postgres doesn't let you do cross db querys, so I solved the problem this way:
Data Source from Production DB to a temp Archive DB table
Run set based query between temp table and archive table
Truncate temp table
Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.
Took a while, but I got there in the end.
This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.
What enterprise ETL solution would you suggest?
I am using Entity Framework to layer on my SQL Server 2008 database. The EF is present in my web service and the webservice is invoked by a Silverlight client.
I am seeing a serious performance issue in terms of the duration taken by a query to execute in the EF. This wouldn't happen in the consecutive calls.
A little bit of googling revealed that, it's caused per app domain to construct the in-memory model of the db objects. I found this Microsoft link explaining pre-generation of views for performance improvement. Even after implementing the steps, the performance actually degraded instead of improving. I am curious, if anyone has tried this approach successfully and if there are any other avenues for improving performance.
I am using .NET 3.5.
A couple areas to look at for EF performance
Do as much of the processing before calling things like tolist(). ToList will bring everything in the set into memory. By default, EF will keep building the expression tree and only actually process it when you need the data in memory. That first query will be against the database, but afterwards the processing will be in memory. When working with large data, you definitely want as much of the heavy lifting done by the database as possible.
EF 1 only has the option to pull the entire row back. Therefore if you have a column that is a large string or binary blob, it is going to be pulled down and into memory whether you need it or not. You can create a projection that doesn't include this column, but then you don't get the benefits of having it be an entity.
You can look at the sql generated by EF using the suggestion in this post
How do I view the SQL generated by the Entity Framework?
The same laws of physics apply for EF queries as they do for ordinary SQL. Check your database tables and make sure that you have indexes on primary and foreign keys, that your database is properly normalized, and so forth. If performance is degrading after Microsoft's suggestions, then that's my guess as to the problem area.
Are you hosting the webservice in IIS? Is it running on the same site as the Silverlight App? What about the database itself? Is it running on a dedicated machine? Are there other apps hitting it? The first call to a dormant database is painful (I've had situations where it would actually time out in my environment.)
There are a number of factors to take into consideration here. But it comes down to more than just EF's overhead.
edit I didn't fully qualify but the process of opening the first connection to SQL Server is slow regardless of your data access solution.
Use SQL Profiler to check how many queries executed to retrieve your data.If it's large number use Include() method of ObjectQuery to retrieve child objects with parent in one query.