Data Type for storing document in SQL Server 2008 via Entity Framework - entity-framework

I'm trying to store a document in SQL Server 2008 using the Entity Framework.
I believe I have the code for doing this completed. The problem I'm now facing is which Data Type to use in SQL Server and in my entity model.
'Image' was my first choice, but this causes an "Invalid mapping" error when I update the model. I see that there's no equivalent of 'Image' (going by the Type drop-down in the entity's properties).
So then I tried 'varbinary(MAX)' and I see that this maps to 'binary' in the entity model. However, when I run the code it tells me that the data would be truncated so it stopped. Upon investigation I see that the SQL Server Data Type 'binary' is 8000 bytes long - which is why I chose 'varbinary(MAX)' - so the entity model seems to be reducing/mapping 'varbinary(MAX)' to 'binary'.
Is this right?
If so, what should my Data Types be (in both SQL Server 2008 and in my entity model) please? Any suggestions?

Change the data type of your model to byte[] and it will be all right dude, if you need more explanations please leave a comment.
EDIT:
Dude, I had tried it before in Linq to Sql and this time I tried it in EF, in conceptual model of your namely Foo.edmx file your type is Binary(you can open it through open with context menu of Visual Studio and then selecting Xml Editor or any other text editor like notepad) but in a generated file named Foo.designer.cs your data type is Byte[].
And there is not a limit you mentioned above.
I tried it with a 10000 bytes and it's inserted successfully without truncating my array. About benchmarking on saving documents in database or file system I read an article and it said that in Sql Server 7, file system have a better performance on retrieving stored data but in later versions of Sql Server, it take over the file system speed and it suggested saving documents on Sql Server.
IMHO on saving documents, if they are not too large, I prefer to store them on DB (NoSql DBs has great performance here as far as I know),
First: Integrity of my data,
Second: More performance that you can have(cause if your folder has large number of files, reading and writing files in those folder slows down gradually more and more unless you organize them in more than one folder and preferably in a tree like folders),
Third: Security policies that you may apply to them through your application more easily(although you can do this on file system approach but i think it's easier here)
Fourth: you can benefit from facilities provided by your DBMS for querying and manipulating and ... those files
and much more ... :-)

Ideally you should not store documents in the database, instead store the path to the document in the database, which then points to the physical document on the web server itself (or some other storage, CDN, etc).
However, if you must store it in SQL Server (and seeing as though your on SQL 2008),
you should be using FILESTREAM.
And it is supported by EF4 (i believe). It maps to binary.
As i said though, i'm not sure how well this will perform, run some benchmarks - and if it's not performing too well, try using regular ADO.NET/FileStream API.
I still think you should put it on the file system, not in the database (my opinion of course)

Related

Transferring data from Lotus Notes to DB2 using Agent

before you say something, I searched a lot but didn't find how to do that.
So I got database in .NSF format for use in Lotus Notes. I need to write an Agent (I know how to) so data from that database will be automatically transferred to DB2 database.
So before I create DB2 tables, how do i know which structure I need to use? How do I check how exactly data in that .NSF file is stored?
Thanks
Notes documents are unstructured, there's no guarantee that any two documents in a database have the same structure. You will need to decide what data you want to transfer to a relational table, then check each document to see if it contains the corresponding fields (items). You didn't mention what language you're planning to use for your agent; in Java you would use NotesDocument.getItems() to enumerate all items in a document.
As mustaccio also said, since Notes/Domino is a NoSQL database, you don't have a schema.
You should talk to the developer of the application and get an understanding of what data is lovated where.
You could of course use the Design Synopsis function in Domino Designer to export the actual design, but document can potentially contain data not showing up in the design.
If you want to export the documents as XML, I have a tool I wrote available here: http://www.texasswede.com/home.nsf/Page/Notes%20XML%20Exporter
You can export all the documents and then look at the XML to see what data you have.

Entity Framework with large number of tables

Our database has about 500 tables we'd like to use in our EF model. Of those I'd be happy to start with 50 or fewer just to get our feet wet after working in plain ADO.net for years.
The problem is, our SQL server contains many thousands of other tables that exist in our database that have been created through the years and many that are dynamically generated. Believe it or not:
select count(*) from INFORMATION_SCHEMA.TABLES
73261
So that's a lot of tables. I have found that pretty much every tool I've tried to design, build or template EF models or entities either hangs or does not return a list of tables. Even SQL Server Object Explorer in VS2012 won't list the tables and instead shows the Tables folder with a little "x" over the icon. So I can't even select a subset of tables.
What options do I have for using EF? Is there a template where I can explicitly define the tables that I want to use entities for? Even with 50 tables, I don't want to hand code each one in an empty EDMX.
Using a Database / Code First approach and avoiding connecting Visual Studio to the database at all (i.e. don't create an edmx, or connect with server explorer) would allow you to do this easily. It does not give you any of the Model First advantages, but I think it sounds like your project would be better served with a Database / Code First approach anyway as:
You have an existing Model, and are not looking to push changes from your EDMX to the DB
You are looking to implement this on a subset of your database
This link has a good summation ( Code-first vs Model/Database-first ) with the caveat that in you case a Database/Code First approach does not have you pushing changes from code to the Database, so the last two bullets under code first apply less, and yours is a Database/Code First hybrid.
With 70k tables I think that any GUI is going to be tricky. When I am saying Database / Code First, I am trying to convey that you are not using the code to create / define and update your Database. Someone may be able to answer this more succinctly / accurately?
I now this is an old question. But for those who land here on a google search. The only tool I have found that actually works with thousands of tables is The Sharp Factory.
It is an ORM. Pretty simple to use. So if you are looking for an ORM that can work with a large number of tables and does not require you to write "POCOS" or "Mappings" or SQL then this is the tool.
You can find it here: The Sharp Factory

Using sql server 2008 ado.net entity data model and postgresql connectionstring

I'm using Visual Studio 2008, SQL Server 2008, asp.net mvc 2.
Now, I have to change my database to postgresql. I tried many ways. But I did something different.
I created the data model based on SQL Server 2008 database.
I have the same structured database in postgresql.
So I just changed the connection string of the entity data model and I got the following error
The 'System.Data.SqlClient' provider from the specified SSDL
artifact(s) does not match the expected 'Npgsql' provider from the
connection string.
How do I solve this error.
Please let me know if this is correct way to implement.
Well, the physical layer of your model (the "SSDL" file in your metadata) will of course contain the physical aspects of your database model - including the database provider that this model is based on. You cannot just simply change the database connection string and be done with that.....
I see two options:
my preferred solution: just re-create the EF model based on your Postgres database - that's the cleanest way to go. If you're doing it right, this is contained in a single assembly in your project anyway, so you could more or less swap in a new assembly to go against Postgres instead of SQL Server
a hack in my opinion: you could have the metadata files (the *.ssdl, *.msl, *.csdl) for your model written out to disk, and then manually edit the SSDL file to switch to the Postgres provider. I have no idea if that will even work and what side-effects it might have! Do at your own risk, and do it on a backup copy of your project first!

entity framework performance

I am using Entity Framework to layer on my SQL Server 2008 database. The EF is present in my web service and the webservice is invoked by a Silverlight client.
I am seeing a serious performance issue in terms of the duration taken by a query to execute in the EF. This wouldn't happen in the consecutive calls.
A little bit of googling revealed that, it's caused per app domain to construct the in-memory model of the db objects. I found this Microsoft link explaining pre-generation of views for performance improvement. Even after implementing the steps, the performance actually degraded instead of improving. I am curious, if anyone has tried this approach successfully and if there are any other avenues for improving performance.
I am using .NET 3.5.
A couple areas to look at for EF performance
Do as much of the processing before calling things like tolist(). ToList will bring everything in the set into memory. By default, EF will keep building the expression tree and only actually process it when you need the data in memory. That first query will be against the database, but afterwards the processing will be in memory. When working with large data, you definitely want as much of the heavy lifting done by the database as possible.
EF 1 only has the option to pull the entire row back. Therefore if you have a column that is a large string or binary blob, it is going to be pulled down and into memory whether you need it or not. You can create a projection that doesn't include this column, but then you don't get the benefits of having it be an entity.
You can look at the sql generated by EF using the suggestion in this post
How do I view the SQL generated by the Entity Framework?
The same laws of physics apply for EF queries as they do for ordinary SQL. Check your database tables and make sure that you have indexes on primary and foreign keys, that your database is properly normalized, and so forth. If performance is degrading after Microsoft's suggestions, then that's my guess as to the problem area.
Are you hosting the webservice in IIS? Is it running on the same site as the Silverlight App? What about the database itself? Is it running on a dedicated machine? Are there other apps hitting it? The first call to a dormant database is painful (I've had situations where it would actually time out in my environment.)
There are a number of factors to take into consideration here. But it comes down to more than just EF's overhead.
edit I didn't fully qualify but the process of opening the first connection to SQL Server is slow regardless of your data access solution.
Use SQL Profiler to check how many queries executed to retrieve your data.If it's large number use Include() method of ObjectQuery to retrieve child objects with parent in one query.

What applications do you use for data entry and retrieval via ODBC?

What apps or tools do you use for data entry into your database? I'm trying to improve our existing (cumbersome) system that uses a php web based system for entering data one ... item ... at ... a ... time.
My current solution to this is to use a spreadsheet. It works well with text and numbers that are human readable, but not with foreign keys that are used to join with the other table's rows.
Imagine that I want a row of data to include what city someone lives in. The column holding this is id_city, which is keyed to the "city" table which has two columns: id (serial) and name (text).
I envision being able to extend the spreadsheet capabilities to include dropdown menu's for every row of the id_city column that would allow the user to select which city (displaying the text of the city names), but actually storing the city id chosen. This way, the spreadsheet would:
(1) show a great deal of data on each screen and
(2) could be exported as a csv file and thrown to our existing scripts that manually insert rows into the database.
I have been playing around with MS Excel and Access, as well as OpenOffice's suite, but have not found something that gives me the functionality I mention above.
Other items on my wish-list:
(1) dynamically fetch the name of cities that can be selected by the user.
(2) allow the user to push the data directly into the backend (not via external files/scripts.
(3) If any of the columns of the rows of data gets changed in the backend, the user could refresh the data on the screen to reflect any recent changes.
Do you know how I could improve the process of data entry? What tools do you use? I use PostgreSQL for the backend and have access to MS Office, OpenOffice, as well as web based solutions. I would love a solution that is flexible, powerful, and doesn't require much time to develop or deploy (I know, dream on...)
I know that pgAdmin3 has similar functionality, but from what I have seen, it is more of an administrative tool rather than something for users to use.
As j_random_hacker noted, I've used MS Access for years (since Access 97) to connect to an ODBC Data Source.
You can do this via linking to external tables: (in Access 2010:)
New -> Blank Database
External Data -> ODBC Database -> Link to Data Source
Machine Data Source -> New -> System Data Source -> Select Driver (Oracle, or whatever) -> Finish
Enter a new name for your DSN, the all of the connection parameters, then click OK
Select newly created DSN, hit ok.
You can do so much once Access sees your external table as a linked table, including sorting, filtering, etc. There's one caveat: as far as I can tell, ALL operations happen on the client side unless you're using a pass-through query. That's fine if you're looking at a table with 3000 records. With 2,000,000 records, that hurts. To be clear, all data in the table comes down to the workstation, for all tables being joined, and the join happens client-side, NOT server-side.
There are usually standalone tools for basic database management - e.g., for Oracle and MySQL a free tool called SQL Developer suffices for basic database data entry.
For more complex types (especially involving clobs) I can usually knock an application together in Java+SWT in a day if we already have the model and DAOs available on the Java side. Yeah, you have to put some effort in, but if it will be used regularly in the future then it is probably worth it.
In your case (well, the case where you have bulk imports of data) knocking up some Perl that reads from the CSV and does the city id lookup would be trivial to implement. Maybe a waste for a one-off thing? Depends on the amount of data to import.
I would be surprised if MS Access can't do what you're looking for -- this is basically the exact use case for it. Namely, quickly throwing together a nice UI for a simple CRUD DB application that a spreadsheet doesn't quite stretch to.
This is an answer, technically, but not a recommendation:
I've used Excel and SSIS for importing simple data entry files into MS SQL, but it's not adequate - there's very little ability to control the data, and SSIS is so very touchy, especially when working with Excel.
MS Access does not work well with some non-Microsoft databases. There is an open-source equivalent called Apache OpenOffice Base you may want to try.