Is there a resource efficient way to load a file into a database? - ado.net

I am trying to understand how to correctly load a large file into a database. I understand how to get the file from the database and stream it back without using too many resources by using a DataReader to read into a buffer and then writing the buffer to the OutputStream.
When it comes to storing the file all of the examples I could find read then entire file into a byte array and then supply it to a data parameter.
Is there a way to store the file into a database without having to read then entire file into memory first?
I am using ASP.NET and Sql Server

If you can use .Net 4.5, there is new support for streaming. Also, see Using ADO's new Async methods which gives some complimentary examples.

Related

Seeding data into MongoDB database

I'm creating a MERN project and want a file to seed data into my database. Using sql I've done this by creating a seed file with a .db extension which I would then run as a script in my terminal. I am wondering how this is done for MongoDB and what file extension I should use, is this just a json file? I am also wondering what the proper way of doing this is. I was looking online but I see so many different ways that people do things so I'm just trying to figure out what the standard is.
Create each collection in a separate JSON or CSV file, and use mongoimport

How to improve the speed of a lot of small files' read and write?

My Job is to Improve the speed of reading a lot of small file (1KB) from disk to write into our database.
The database is open source to me, and I can change all the code from the client to the server.
The database architecture is that , it is a simple master-slave distributed HDFS based database like HBase. The small file from disk can be insert into our database and combined into bigger block automatically and then write into HDFS.(also the big file can be split to smaller block by database and then write into HDFS)
One way to change the client is to increase the thread number.
I don't have any other idea.Or you can provide some idea to do the performance analysis.
One of the way to process such small files could be to convert these small files to a sequence file and store it into HDFS. Then use this file as a Map Reduce job input file to put the data into HBase or similar database.
This uses aws as an example but it could be any storage/queue setup:
If the files were able to exist on a shared storage such as S3 you could add one queue entry for each file and then just start throwing servers at the queue to add the files to the db. At that point the bottleneck becomes the db instead of the client.

ETL tool or ad-hoc solutions?

I'm designing a data warehouse system, the origin data sources are two: files (hexadecimal format, record structure known) and PostgreSQL database.
The ETL phase has to read the content of the two sources (files and DB) and combining/integrating/cleaning them. After this, loading data into the DW.
For this purpose, is better a tool (for example Talend) or ad-hoc solution (writing ad-hoc routines by using a programming language)?
I would suggest you use the Bulk Loader to get your flat file into DB. This allows you to customize the loading rules and then process/cleanse the resulting data set using regular SQL (no other custom code to write)

Is saving data in SQLite as BLOB, a good programming practice?

Currently I am working with the project in which I need to parse complex XML, which contains multilevel details (Name & Paths for PDFs, PNGs, etc) at each node.
I need to store all the data in local memory of iPhone/iPad.
Should I create classes for each of those details and make appropriate tables in SQLite or store the data as BLOB and retrieve all the data all the time?
Any Suggestions, thoughts are most welcome...
EDIT:
I am storing Files in DocumentsDirectory and path to SQLite database. Question is to create well defined database tables or to store data in BLOB form.
Pros and Cons for both approach would be much appreciated. Thanks.
in my opinion you should simply use BLOB : when you startup your app load you'r xml into an object all changes will be made to that object so you can win the time to rewrite back to disk
on exit application save all to disk..
Using of BLOB is not a good process .Store all pdf and images in Document directory..store that path only in DB...

Data Type for storing document in SQL Server 2008 via Entity Framework

I'm trying to store a document in SQL Server 2008 using the Entity Framework.
I believe I have the code for doing this completed. The problem I'm now facing is which Data Type to use in SQL Server and in my entity model.
'Image' was my first choice, but this causes an "Invalid mapping" error when I update the model. I see that there's no equivalent of 'Image' (going by the Type drop-down in the entity's properties).
So then I tried 'varbinary(MAX)' and I see that this maps to 'binary' in the entity model. However, when I run the code it tells me that the data would be truncated so it stopped. Upon investigation I see that the SQL Server Data Type 'binary' is 8000 bytes long - which is why I chose 'varbinary(MAX)' - so the entity model seems to be reducing/mapping 'varbinary(MAX)' to 'binary'.
Is this right?
If so, what should my Data Types be (in both SQL Server 2008 and in my entity model) please? Any suggestions?
Change the data type of your model to byte[] and it will be all right dude, if you need more explanations please leave a comment.
EDIT:
Dude, I had tried it before in Linq to Sql and this time I tried it in EF, in conceptual model of your namely Foo.edmx file your type is Binary(you can open it through open with context menu of Visual Studio and then selecting Xml Editor or any other text editor like notepad) but in a generated file named Foo.designer.cs your data type is Byte[].
And there is not a limit you mentioned above.
I tried it with a 10000 bytes and it's inserted successfully without truncating my array. About benchmarking on saving documents in database or file system I read an article and it said that in Sql Server 7, file system have a better performance on retrieving stored data but in later versions of Sql Server, it take over the file system speed and it suggested saving documents on Sql Server.
IMHO on saving documents, if they are not too large, I prefer to store them on DB (NoSql DBs has great performance here as far as I know),
First: Integrity of my data,
Second: More performance that you can have(cause if your folder has large number of files, reading and writing files in those folder slows down gradually more and more unless you organize them in more than one folder and preferably in a tree like folders),
Third: Security policies that you may apply to them through your application more easily(although you can do this on file system approach but i think it's easier here)
Fourth: you can benefit from facilities provided by your DBMS for querying and manipulating and ... those files
and much more ... :-)
Ideally you should not store documents in the database, instead store the path to the document in the database, which then points to the physical document on the web server itself (or some other storage, CDN, etc).
However, if you must store it in SQL Server (and seeing as though your on SQL 2008),
you should be using FILESTREAM.
And it is supported by EF4 (i believe). It maps to binary.
As i said though, i'm not sure how well this will perform, run some benchmarks - and if it's not performing too well, try using regular ADO.NET/FileStream API.
I still think you should put it on the file system, not in the database (my opinion of course)