My report processes millions of records. When the number of rows gets too high, I get this error:
The number of rows or columns is too big. Try limiting the number of unique group values.
Details: The number of rows or columns exceeds its limit, 65535.
How can I work around (or increase) this limit?
This error is pretty straightforward. 65535 is 0xFFFF in hexadecimal, so once you hit that limit there's no more vacancies and the hotel is closed. Solutions include:
Reduce the number of rows displayed by using grouping in your crosstab or whatever.
Reduce the amount of incoming data to your report with Record Selection. (Parameters)
Perform the dependent calculations in a custom SQL statement, generated as a temporary table in your report. You can then pass the results into your report as fields, rather than having to print millions of lines.
Related
I want to use Apex Batch class to put 10,000 pieces of data into an object called A and use After Insert trigger to update the weight field value of 10,000 pieces of data to 100 if the largest number of weight fields is 100.
But now, if Batch size is 500, the number with the largest weight field value out of 500 data is applied to 500 data.
Of the following 500 data, the number with the largest weight field value applies to 500 data.
For example, if the weight field for the largest number of the first 500 data is 50,
Weight field value for data 1-50: 50
If the weight field for the largest number of the following 500 data is 100,
Weight field value for data 51-100: 100
I'm going to say that if the data is 10,000, the weight field is the largest number out of 10,000 data.
I want to update the weight field value of all data.
How shall I do it?
Here's the code for the trigger I wrote.
trigger myObjectTrigger on myObject_status__c (after insert) {
List<myObject_status__c> objectStatusList = [SELECT Id,Weight FROM myObject_status__c WHERE Id IN: Trigger.newMap.KeySet() ORDER BY Weight DESC];
Decimal maxWeight= [SELECT Id,Weight FROM myObject_status__c ORDER BY Weight DESC Limit 1].weight
for(Integer i=0;i<objectStatusList();i++){
objectStatusList[i].Weight = maxWeight;
}
update objectStatusList;
}
A trigger will not know whether the batch is still going on. Trigger works on scope of max 200 records at a time and normally sees only that. There are ways around it (create some static variable?) but even then it'd be limited to whatever is the batch's size, what came to single execute(). So if you're running in chunks of 500 - not even static in a trigger would help you.
Couple ideas:
How exactly do you know it'll be 10K? You're inserting them based on on another record? You're using the "Iterator" variant of batch? Could you "prescan" the records you're about to insert, figure out the max weight, then apply it as you insert, eliminating the need for update?
if it's never going to be bigger than 10K (and there are no side effects, no DMLs running on update) - you could combine Database.Stateful and finish() method. Keep updating the max value as you go through executes(), then in finish() update them 1 last time. Cutting it real close though.
can you "daisy chain". Submit another batch from this batch's finish. Passing same records and the max you figured out.
can you stamp the records inserted in same batch with same value, like maybe put the batch job's id into a hidden field. Then have another batch (daisy chained?) that looks for them, finds the max in the given range and applies to any that share the batch job id but not have the value applied yet
Set the weight in your finish method of the batch class, it runs once all batches have finished. Track the max weight record in a static variable in the class.
I have a table which columns are location and credit, the location contains string rows which mainly is location_name and npl_of_location_name. the credit contains integer rows which mainly is credit_of_location_name and credit_npl_of_location_name. I need to make a column which calculates the ((odd rows of the credit - the even rows of the credit)*0.1). How do i do this?
When you specify "odd rows" and "even rows" are you referring to row numbers? Because, unless your query sorts the data, you have not control over row order; the database server returns rows however they are physically stored.
Once you are sure that your rows are properly sorted, then you can use a technique such as Mod(#INROWNUM,2) = 1 to determine "odd" and zero is even. This works best if the Transformer is executing in sequential mode; if it is executed in parallel mode then you need to use a partitioning algorithm that ensures that the odd and even rows for a particular location are in the same node.
We have discussions with our development staff over the use of VARCHAR columns as they define every varchar fileds as varchar(255),varchar(500),... and much bigger than the maximum length of the field,
does varchar's length have any effect on performance in db2? We have find that it is recommended to use char instead of varchar for column of 30 bytes or less and our concern is about varchar fileds that are greater than 30 bytes.
Allowing excessive column length is not a good idea. If you allow, let’s say, a FirstName column to have maximum length 500, you may find quite a long irrelevant story there eventually, because why not if it’s allowed :)
As for performance implications.
The only problem may be here, if Extended row size is turned on for the database (you simply can’t create too “wide” table otherwise), and the total length of the row exceeds the tablespace page size. Some varchar column value gets out from the data page, and more IO will be needed to access such a row in future. You should keep in mind such a behavior. And the probability of such events is higher in case of uncontrolled varchar columns length.
This can have an performance hit with ORGANIZE BY COLUMN tables. There is a limit in the total declared width that can be processed within the Columnar Data Engine, if this limit is breached in a query plan, the remainder of the query will be processed in the Row Data Engine.
I'm looking at a postgres system with tables containing 10 or 100's of millions of rows, and being fed at a rate of a few rows per second.
I need to do some processing on the rows of these tables, so I plan to run some simple select queries: select * with a where clause based on a range (each row contains a timestamp, that's what I'll work with for ranges). It may be a "closed range", with a start and an end I know are contained in the table, and I know no new data will fall into the range, or an open range : ie one of the range boundary might not be "in the table yet" and rows being fed in the table might thus fall in that range.
Since the response will itself contains millions of rows, and the processing per row can take some time (10s of ms) I'm fully aware I'll use a cursor and fetch, say, a few 1000 rows at a time. My question is:
If I run an "open range" query: will I only get the result as it was when I started the query, or will new rows being inserted in the table that fall in the range while I run my fetch show up ?
(I tend to think that no I won't see new rows, but I'd like a confirmation...)
updated
It should not happen under any isolation level:
https://www.postgresql.org/docs/current/static/transaction-iso.html
but Postgres insures it only in Serializable isolation
Well, I think when you make a query, that means you create a new transaction and it will not receive/update data from any other transaction until it commit.
So, basically "you only get the result as it was when you started the query"
I have a table say 'T' in kdb which has rows over 6 billion. When I tried to execute query like this
select from T where i < 10
it throws wsfull expection. Is there any way I can execute queries like this in table having large amount of data.
10#T
The expression as you wrote it first makes a bitmap containing all of the elements where i (rownumber) < 10, which is as tall as one of your columns. It then does where (which just contains til 10) and then gets them from each row. You can save the last step with:
T[til 10]
but 10#T is shorter.
Assuming you have a partitioned table here, it is normally beneficial to have the partitioning column (date, int etc.) as the first item in the where clause of your query - otherwise as mentioned previously you are reading a six billion item list into memory, which will result in a 'wsfull signal for any machine with less than the requisite amount of RAM.
Bear in mind that row index starts at 0 for each partition, and is not reflective of position in the overall table. The query that you gave as an example in your question would return the first ten rows of each partition of table T in your database.
In order to do this without reaching your memory limit, you can try running the following (if your database is date-partitioned):
raze{10#select from T where date=x}each date