Out of memory while iterating through rowset - zend-framework

I have a "small" table of 60400 rows with zipcode data, 6mb in total. I want to iterate through them all, update a column value, and then save it.
The following is part of my Zipcodes model which extends My_Db_Table that a totalRows function that - you guessed it.. returns the total number of rows in the table (60400 rows)
public function normalizeTable() {
$this->getAdapter()->setProfiler(false);
$totalRows = $this->totalRows();
$rowsPerQuery = 5;
for($i = 0; $i < $totalRows; $i = $i + $rowsPerQuery) {
$select = $this->select()->limit($i, $rowsPerQuery);
$rowset = $this->fetchAll($select);
foreach ($rowset as $row) {
$row->{self::$normalCityColumn} = $row->normalize($row->{self::$cityColumn});
$row->save();
}
unset($rowset);
}
}
My rowClass contains a normalize function (basicly a metaphone wrapper doing some extra magic).
At first i tried a plain old $this->fetchAll(), but got a out of memory (128MB) right away. Then i tried splitting the rowset into chunks, only difference is that some rows actually gets updated. But still getting out of memory error.
Any ideas on how i can acomplish this, or should i fallback to ye'olde mysql_query()

I will suggest using the Zend_Db_Statement::fetch() function here.
http://files.zend.com/help/Zend-Framework/zend.db.statement.html

I suggest rebuilding the select statement so that only the columns needed to be upgraded will be selected $select->from($table, (array)$normalCityColumn)...

Related

Script to update 5500 fields in a telephone type column with random numbers

I would need to create a php7 script that generates 5500 random telephone numbers starting with the example number 3
"3471239900". The script should go to overwrite the data already present.
/**
* genera numero tel casuale che inizia per 3
*/
function telefono()
{
$telefono = '';
for ($k=0; $k<9; $k++) {
//genera casuale 9 cifre
$telefono .= rand(0, 9);
}
//inizia per 3
return '3' . $telefono;
}
$res = mysqli_query($conn, 'SELECT id_com FROM commesse ORDER BY id_com');
while ($riga = mysqli_fetch_assoc($res)) {
$id = (int)$riga['id_com'];
$query = "UPDATE commesse SET cliente=tel='".telefono()."' WHERE id_com=" . $id_com;
}
You don't need to invent such code to fill a single column in a database table with random numbers.
Following update statement will populate cliente_tel column of the commesse table with 10 digit random numbers all beginning with 3.
UPDATE
`commesse`
SET
`cliente_tel` = CONCAT("3",ROUND(RAND()*(999999999-100000000)+100000000))
WHERE 1;
Using ROUND() is necessary here since RAND() returns a float between 0 and 1.
Good to remember: Running any kind of update/insert statement in a loop is always expensive and slow. Try to avoid running SQL queries in a loop as much as possible.

Script is taking 11 - 20 seconds to lookup up an item in an 18,000 row data set

I have two Google sheets workbooks.
One is the "master" source of lookup data with a key based on manufacturer item #, which could be anything from 1234 to A-01/234-Name_1. This sheet, referenced via SpreadsheetApp.openByUrl, has 18,000 rows and 13 columns. The key column has been converted to plain text and the sheet is sorted by this column.
The second is the "template" where people enter item #s that they need to look up against the master, typically 20 - 1500 items at a time.
The script is in the template. It is very slow and routinely times out after 30 minutes. It was written by someone else and I am new to App Script, but I think I've managed to understand what the script is doing and where the bottleneck is occurring.
It does a bunch of stuff, but this is the meat of the lookup:
var numrows = master.getDataRange().getNumRows();
var masterdata = master.getDataRange().getValues();
var itemnumberlist = template.getDataRange().getValues();
var retreiveddata = [];
// iterate through the manf item number list to find all matches in the
// master and return those matches to another sheet
for (i = 1; i < template.getDataRange().getValues().length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][1].toString() === itemnumberlist[i][1].toString()) {
retreiveddata.push(data[j]);
anothersheet.appendRow(data[j]);
}
}
}
I used Logger.log() to determine that each time through the i loop is taking 11 - 19 seconds, which just seems insane.
I've been doing some google searching and I've tried a couple of different things...
First I tried moving the writing of found data out of the for loop so the script would be doing all of its reading first and then writing in one big chunk, but I couldn't get it exactly right. My two attempts are below.
var mycounter = 0;
for (i = 0; i < template.getDataRange().getValues().length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][0].toString() === itemnumberlist[i][0].toString()) {
retreiveddata.push(masterdata[j]);
mycounter = mycounter + 1;
}
}
}
// Attempt 1
// var myrange = retreiveddata.length;
// for(k = 0; k < myrange; k++) {
// anothersheet.appendRow(retreiveddata.pop([k]);
// }
//Attempt 2
var myotherrange = anothersheet.getRange(2,1,myothercounter, 13)
myotherrange.setValues(retreiveddata);
I can't remember for sure, because this was on Friday, but I think both attempts resulted in the script trying to write the entire master file into "anothersheet".
So I temporarily set this aside and decided to try something else. I was trying to recreate the issue in a couple of sample spreadsheets, but I was unable to do so. The same script is getting through my 15,000 row sample "master" file in less than 1 second per lookup. The only thing I can think of is that I used a random number as my key instead of a weird text string.
That led me to think that maybe I could use a hash algorithm on both the master data and the values to be looked up, but this is presenting a whole other set of issues.
I borrowed these functions from another forum post:
function GetMD5Hash(value) {
var rawHash = Utilities.computeDigest(Utilities.DigestAlgorithm.MD5,
value);
var txtHash = '';
for (j = 0; j <rawHash.length; j++) {
var hashVal = rawHash[j];
if (hashVal < 0)
hashVal += 256;
if (hashVal.toString(16).length == 1)
txtHash += "0";
txtHash += hashVal.toString(16);
Utilities.sleep(100);
}
return txtHash;
}
function RangeGetMD5Hash(input) {
if (input.map) { // Test whether input is an array.
return input.map(GetMD5Hash); // Recurse over array if so.
Utilities.sleep(100);
} else {
return GetMD5Hash(input)
}
}
It literally took me all day to get the hash value for all 18,000 item #s in my master spreadsheet. Neither GetMD5Hash nor RangeGetMD5Hash will return a value consistently. I can only do a few rows at a time. Sometimes I get "Loading..." indefinitely. Sometimes I get "#Name" with a message about GetMD5Hash being undefined (despite the fact that it worked on the previous row). And sometimes I get "#Error" with a message about an internal error.
This method actually reduces the lookup time of each item to 2 - 3 seconds (much better, but not great). However, I can't get the hash function to consistently work on the input data.
At this point I'm so frustrated and behind on my other work that I thought I'd reach out to the smart people on these forums and hope for some sort of miracle response.
To summarize, I'm looking for suggestions on these three items:
What am I doing wrong in my attempt to move the write out of the for loop?
Is there a way to get my hash value faster or utilize a different method to accomplish the same goal?
What else can I try to help speed up the script?
Any suggestions you can offer would be greatly appreciated!
-Mandy
It sounds like you hit on the right approach with attempting to move the appendRow() call out of the loop. Anytime you are reading or writing to a spreadsheet you can expect the individual call to take 1 to 2 seconds, so this will eat up a lot of time when you get matches. Storing the matches in an array and writing them all at once is the way to go.
Another thing I notice is that your script calls getValues() in the actual for loop condition statement. The condition statement is executed each time on each iteration of the loop, so this is potentially wasting a lot of time even when you don't have matches.
A final tweak that may be helpful depending on your desired behaviour. You can stop the inner for loop after it finds the first match, which, if you only care about the first match or know there will only be one match, will save you a lot of iterations. To do this, put "break" immediately after the retreiveddata.push(masterdata[j]); line.
To fix the getValues issue, Change:
for (i = 1; i < template.getDataRange().getValues().length; i++) {
To:
for (i = 1; i < itemnumberlist.length; i++) {
And that fix along with the appendRow issue, and including the break call:
for (i = 1; i < itemnumberlist.length; i++) {
for (j = 0; j < numrows; j++) {
if (masterdata[j][0].toString() === itemnumberlist[i][0].toString()) {
retreiveddata.push(masterdata[j]);
break; //stop searching after first match, move on to next item
}
}
}
//make sure you have data to write before trying to write it.
if(retreiveddata.length > 0){
var myotherrange = anothersheet.getRange(2,1,retreiveddata.length, retreiveddata[0].length);
myotherrange.setValues(retreiveddata);
}
If you are re-using the same sheet for "anothersheet" on each execution, you may also want to call anothersheet.clear() to erase any existing data before you write your fresh results.
I would pass on the hashing approach altogether, comparing strings is comparing strings, so whether they are hashes or actual part numbers I wouldn't expect a significant difference.

Using cell value as reference to sheet in formulas

I have a spreadsheet with three sheets. Two are called 2012 and 2011 and have a bunch of similar data. The last sheet does comparisons between the data.
To be able to choose year, I'm using a cell (D1) where I can I can write either 2011 or 2012. The formulas then use the INDIRECT function to include this cell as part of the reference.
INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!F:F")
This is not a pretty solution and makes the formula quite long and complex.
=IFERROR(SUM(FILTER( INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!M:M") ; (INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A4)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A5)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A6)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A7)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A8); MONTH(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!D:D"))=$B$1 ; INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!F:F")=D$3));0)
Is there a better way of doing this?
I've tried to create a separate spreadsheet for the calculations sheet and importing (IMPORTRANGE) the data from the two sheets together on one sheet with VMERGE (custom function from the script gallery) but there is quite a lot of of data in these two sheets and the import takes a long time. Any changes (like changing year) also take a long time to recalculate.
Database functions tend to be cleaner when doing this kind of thing.
https://support.google.com/docs/bin/static.py?hl=en&topic=25273&page=table.cs&tab=1368827
Database functions take a while to learn, but they are powerful.
Or
You could put INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B") in a cell on its own.
I think that you have two years of information where the schema is identical (column C has the same type of information on both sheets). Also, I'm assuming that column B tracks the year.
If so, consider holding all of your information on one sheet and and use the spreadsheet function "QUERY" to create views.
For instance, this formula returns all the cells between A1:E from a sheet named "DataSheet" where the values in column B = 2010.
=QUERY(DataSheet!A1:E; "SELECT * WHERE B = 2010";1)
Sometimes there is a really good reason to have the data stored on two sheets. If so, use one of the vMerge functions in the script gallery to assemble a working sheet. Then create views and reports from the working sheet.
function VMerge() {
var maxw=l=0;
var minw=Number.MAX_VALUE;
var al=arguments.length ;
for( i=0 ; i<al ; i++){
if( arguments[i].constructor == Array )l =arguments[i][0].length ;
else if (arguments[i].length!=0) l = 1 ; // literal values count as array with a width of one cell, empty cells are ignored!
maxw=l>maxw?l:maxw;
minw=l<minw?l:minw;
}
if( maxw==minw) { /* when largest width equals smallest width all are equal */
var s = new Array();
for( i=0 ; i<al ; i++){
if( arguments[i].constructor == Array ) s = s.concat( arguments[i].slice() )
else if (arguments[i].length!=0) s = s.concat( [[arguments[i]]] )
}
if ( s.length == 0 ) return null ; else return s //s
}
else return "#N/A: All data ranges must be of equal width!"
}
Hope this helps.

Dataset capacities

Is there any limit of rows for a dataset. Basically I need to generate excel files with data extracted from SQL server and add formatting. There are 2 approaches I have. Either take enntire data (around 4,50,000 rows) and loops through those in .net code OR loop through around 160 records, pass every record as an input to proc, get the relavant data, generate the file and move to next of 160. Which is the best way? Is there any other way this can be handled?
If I take 450000 records at a time, will my application crash?
Thanks,
Rohit
You should not try to read 4 million rows into your application at one time. You should instead use a DataReader or other cursor-like method and look at the data a row at a time. Otherwise, even if your application does run, it'll be extremely slow and use up all of the computer's resources
Basically I need to generate excel files with data extracted from SQL server and add formatting
A DataSet is generally not ideal for this. A process that loads a dataset, loops over it, and then discards it, means that the memory from the first row processed won't be released until the last row is processed.
You should use a DataReader instead. This discards each row once its processed through a subsequent call to Read.
Is there any limit of rows for a dataset
At the very least since the DataRowCollection.Count Property is an int its limited to 4,294,967,295 rows, however there may be some other constraint that makes it smaller.
From your comments this is outline of how I might construct the loop
using (connection)
{
SqlCommand command = new SqlCommand(
#"SELECT Company,Dept, Emp Name
FROM Table
ORDER BY Company,Dept, Emp Name );
connection.Open();
SqlDataReader reader = command.ExecuteReader();
string CurrentCompany = "";
string CurrentDept = "";
string LastCompany = "";
string LastDept = "";
string EmpName = "";
SomeExcelObject xl = null;
if (reader.HasRows)
{
while (reader.Read())
{
CurrentCompany = reader["Company"].ToString();
CurrentDept = reader["Dept"].ToString();
if (CurrentCompany != LastCompany || CurrentDept != LastDept)
{
xl = CreateNewExcelDocument(CurrentCompany,CurrentDept);
}
LastCompany = CurrentCompany;
LastDept = CurrentDept;
AddNewEmpName (xl, reader["EmpName"].ToString() );
}
}
reader.Close();
}

How can I reverse the row order for an ADO.NET DataTable?

I have a DataTable that I have obtained by deserializing a JSON message. I do not know ahead of time what the column names will be so I cannot use DataView.Sort on a specific column. I would simply like to reverse the order of the rows. Here is what I tried:
var reversedTable = new DataTable();
for (var row = originalTable.Rows.Count - 1; row >= 0; row--)
reversedTable.Rows.Add(response.Messages.Rows[row]);
but this throws "System.ArgumentException: This row already belongs to another table." How can I accomplish this seemingly simple task? Thanks in advance,
Frank
ANSWER:
var reversed = original.Clone();
for (var row = original.Rows.Count - 1; row >= 0; row--)
reversed.ImportRow(original.Rows[row]);
I've seen that error message before.....
You're not allowed to directly add a row from one table to another. First you have to clone the table, then you can go through and call ImportRow() for each of the rows.
Check out geekzilla for a decent example.
I hope that helps!