Dataset capacities - ado.net

Is there any limit of rows for a dataset. Basically I need to generate excel files with data extracted from SQL server and add formatting. There are 2 approaches I have. Either take enntire data (around 4,50,000 rows) and loops through those in .net code OR loop through around 160 records, pass every record as an input to proc, get the relavant data, generate the file and move to next of 160. Which is the best way? Is there any other way this can be handled?
If I take 450000 records at a time, will my application crash?
Thanks,
Rohit

You should not try to read 4 million rows into your application at one time. You should instead use a DataReader or other cursor-like method and look at the data a row at a time. Otherwise, even if your application does run, it'll be extremely slow and use up all of the computer's resources

Basically I need to generate excel files with data extracted from SQL server and add formatting
A DataSet is generally not ideal for this. A process that loads a dataset, loops over it, and then discards it, means that the memory from the first row processed won't be released until the last row is processed.
You should use a DataReader instead. This discards each row once its processed through a subsequent call to Read.
Is there any limit of rows for a dataset
At the very least since the DataRowCollection.Count Property is an int its limited to 4,294,967,295 rows, however there may be some other constraint that makes it smaller.
From your comments this is outline of how I might construct the loop
using (connection)
{
SqlCommand command = new SqlCommand(
#"SELECT Company,Dept, Emp Name
FROM Table
ORDER BY Company,Dept, Emp Name );
connection.Open();
SqlDataReader reader = command.ExecuteReader();
string CurrentCompany = "";
string CurrentDept = "";
string LastCompany = "";
string LastDept = "";
string EmpName = "";
SomeExcelObject xl = null;
if (reader.HasRows)
{
while (reader.Read())
{
CurrentCompany = reader["Company"].ToString();
CurrentDept = reader["Dept"].ToString();
if (CurrentCompany != LastCompany || CurrentDept != LastDept)
{
xl = CreateNewExcelDocument(CurrentCompany,CurrentDept);
}
LastCompany = CurrentCompany;
LastDept = CurrentDept;
AddNewEmpName (xl, reader["EmpName"].ToString() );
}
}
reader.Close();
}

Related

Read a Database with a loop

my problem is as follows: i have a database, each row represents a delivery that my agent has to make. The first column contains the specific name of the delivery, the others the names of the nodes to be reached ('pad_1','pad_2,'pad_3'.......).
In the main I have a string vector that contains the name of all the columns in the database.
I also have an event that at the specific delivery time calls a function with which I would like to scroll through the columns of the database of the row related to the delivery, so as to fill a vector of nodes with the contents of columns other than 0.
The query to the database, I wanted to automate it by scrolling the vector of strings with this code: ''' int [] Pad = new int[n_pc];
for (int i=0;i< n_pc;i++){
Pad[i]=selectFrom(farmacia_clinica).where(farmacia_clinica.fascia.eq(FASCIA_A).uniqueResult(farmacia_clinica.PAD_FarmaciaClinica[i]));}
the error is in the last command "uniqueResult(farmacia_clinica.PAD_FarmaciaClinica[i])"
How can I resolve this?
Thank u.
This doesn't work for QueryDSL as you need to have a StringPath and not String type as arguments for referencing the DB column. Instead, you can build an SQL query using your String array for your code.
The code will then be:
int [] Pad = new int[n_pc];
for (int i=0; i< n_pc; i++) {
Pad[i] = selectUniqueValue(int.class,
"SELECT " + PAD_FarmaciaClinica[i] +
" FROM farmacia_clinica WHERE fascia = ?;",
"FASCIA_A");
}

Getting the total number of records in PagedList

The datagrid that I use on the client is based on SQL row number; it also requires a total number of pages for its paging. I also use the PagedList on the server.
SQL Profiler shows that the PagedList makes 2 db calls - the first to get the total number of records and the second to get the current page. The thing is that I can't find a way to extract that total number of records from the PagedList. Therefore, currently I have to make an extra call to get that total which creates 3 calls in total for each request, 2 of which are absolutely identical. I understand that I probably won't be able to rid of the call to get the totals but I hate to call it twice. Here is an extract from my code, I'd really appreciate any help in this:
var t = from c in myDb.MyTypes.Filter<MyType>(filterXml) select c;
response.Total = t.Count(); // my first call to get the total
double d = uiRowNumber / uiRecordsPerPage;
int page = (int)Math.Ceiling(d) + 1;
var q = from c in myDb.MyTypes.Filter<MyType>(filterXml).OrderBy(someOrderString)
select new ReturnType
{
Something = c.Something
};
response.Items = q.ToPagedList(page, uiRecordsPerPage);
PagedList has a .TotalItemCount property which reflects the total number of records in the set (not the number in a particular page). Thus response.Items.TotalItemCount should do the trick.

SqlBulkCopy deal with batch size in ado.net

I have the following code to enter some data in the database using the class SqlBulkCopy from ADO.NET
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(DCISParameters.ConnectionString))
{
bulkCopy.DestinationTableName = "tbzErgoAnalytical";
bulkCopy.BatchSize = 250;
bulkCopy.ColumnMappings.Add("Column1", "fldESPA");
bulkCopy.ColumnMappings.Add("Column2", "fldEP");
bulkCopy.ColumnMappings.Add("Column13", "fldMISCode");
bulkCopy.WriteToServer(dbTable);
bulkCopy.SqlRowsCopied += bulkCopy_SqlRowsCopied;
}
dbTable is a DataTable object that is passed as a parameter from a method and it contains 7691 rows that i take from an excel file. I have set the batch size to 250. The problem is that 7500 (250 * 30) rows are transferred correctly to the database but then i receive the following error: "Column 'fldMISCode' does not allow DBNull.Value." I am 100% sure that there is no null value in the fldMISCode and i suppose that in the last insert i have only 191 rows left which is less than the batch size (not sure if my assumption in correct). Any idea how to deal with this error? Thanks in advance...

MongoDB C# collection.Save vs Insert+Update

From the C# documentation:
The Save method is a combination of Insert and Update. If the Id member of the document has a value, then it is assumed to be an existing document and Save calls Update on the document (setting the Upsert flag just in case it actually is a new document after all).
I'm creating my IDs manually in a base class that all my domain objects inherit from. So all my domain objects have an ID when they are inserted into MongoDB.
Questions is, should I use collection.Save and keep my interface simple or does this actually result in some overhead in the Save-call (with the Upsert flag), and should I therefor use collection.Insert and Update instead?
What I'm thinking is that the Save method is first calling Update and then figures out that my new object didn't exist in the first place, and then call Insert instead. Am I wrong? Has anyone tested this?
Note: I insert bulk data with InsertBatch, so big datachunks won't matter in this case.
Edit, Follow up
I wrote a small test to find out if calling Update with Upsert flag had some overhead so Insert might be better. Turned out that they run at the same speed. See my test code below. MongoDbServer and IMongoDbServer is my own generic interface to isolate the storage facility.
IMongoDbServer server = new MongoDbServer();
Stopwatch sw = new Stopwatch();
long d1 = 0;
long d2 = 0;
for (int w = 0; w <= 100; w++)
{
sw.Restart();
for (int i = 0; i <= 10000; i++)
{
ProductionArea area = new ProductionArea();
server.Save(area);
}
sw.Stop();
d1 += sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i <= 10000; i++)
{
ProductionArea area = new ProductionArea();
server.Insert(area);
}
sw.Stop();
d2 += sw.ElapsedMilliseconds;
}
long a1 = d1/100;
long a2 = d2/100;
The Save method is not going to make two trips to the server.
The heuristic is this: if the document being saved does not have a value for the _id field, then a value is generated for it and then Insert is called. If the document being saved has a non-zero value for the _id, then Update is called with the Upsert flag, in which case it is up to the server to decide whether to do an Insert or an Update.
I don't know if an Upsert is more expensive than an Insert. I suspect they are almost the same and what really matters is that either way it is a single network round trip.
If you know it's a new document you might as well call Insert. And calling InsertBatch is way more performant than calling many individual Inserts. So definitely prefer InsertBatch to Save.

EF4: Object Context consuming too much memory

I have a reporting tool that runs against an MS SQL Server using EF4. The general bulk of this report involves looping over around 5000 rows and then pulling numerous other rows for each one of these.
I pull the initial rows through one data context. The code that pulls the related rows involves using another data context, wrapped in a using statement. It would appear though that the memory consumed by the second data context is never freed and usage shoots up to 1.5GB before an out of memory exception is thrown.
Here a snippet of the code so you can get the idea:
var outlets = (from o in db.tblOutlets
where o.OutletType == 3
&& o.tblCalls.Count() > number && o.BelongsToUser.HasValue && o.tblUser.Active == true
select new { outlet = o, callcount = o.tblCalls.Count() }).OrderByDescending(p => p.callcount);
var outletcount = outlets.Count();
//var outletcount = 0;
//var average = outlets.Average(p => p.callcount);
foreach (var outlet in outlets)
{
using (relenster_v2Entities db_2 = new relenster_v2Entities())
{
//loop over calls and add history
//check the last time the history table was added to for this call
var lastEntry = (from h in db_2.tblOutletDistributionHistories
where h.OutletID == outlet.outlet.OutletID
orderby h.VisitDate descending
select h).FirstOrDefault();
DateTime? beginLooking = null;
I had hoped that by using a second data context that memory could be released after each iteration. It would appear it is not (or the GC is not running in time)
With the input from #adrift I altered the code so that the saving of the changes took place after each iteration of the loop, rather than all at the end. It would appear that there is a limit (in my case anyway) of around 150,000 pending writes that the data context can happily hold before consuming too much memory.
By allowing it to write changes after each iteration, it would appear that it could manage memory more effectively, although it did seem to use as much, it didn't throw an exception.