How do I make the trigger run after all the data is inserted into the batch class? - triggers

I want to use Apex Batch class to put 10,000 pieces of data into an object called A and use After Insert trigger to update the weight field value of 10,000 pieces of data to 100 if the largest number of weight fields is 100.
But now, if Batch size is 500, the number with the largest weight field value out of 500 data is applied to 500 data.
Of the following 500 data, the number with the largest weight field value applies to 500 data.
For example, if the weight field for the largest number of the first 500 data is 50,
Weight field value for data 1-50: 50
If the weight field for the largest number of the following 500 data is 100,
Weight field value for data 51-100: 100
I'm going to say that if the data is 10,000, the weight field is the largest number out of 10,000 data.
I want to update the weight field value of all data.
How shall I do it?
Here's the code for the trigger I wrote.
trigger myObjectTrigger on myObject_status__c (after insert) {
List<myObject_status__c> objectStatusList = [SELECT Id,Weight FROM myObject_status__c WHERE Id IN: Trigger.newMap.KeySet() ORDER BY Weight DESC];
Decimal maxWeight= [SELECT Id,Weight FROM myObject_status__c ORDER BY Weight DESC Limit 1].weight
for(Integer i=0;i<objectStatusList();i++){
objectStatusList[i].Weight = maxWeight;
}
update objectStatusList;
}

A trigger will not know whether the batch is still going on. Trigger works on scope of max 200 records at a time and normally sees only that. There are ways around it (create some static variable?) but even then it'd be limited to whatever is the batch's size, what came to single execute(). So if you're running in chunks of 500 - not even static in a trigger would help you.
Couple ideas:
How exactly do you know it'll be 10K? You're inserting them based on on another record? You're using the "Iterator" variant of batch? Could you "prescan" the records you're about to insert, figure out the max weight, then apply it as you insert, eliminating the need for update?
if it's never going to be bigger than 10K (and there are no side effects, no DMLs running on update) - you could combine Database.Stateful and finish() method. Keep updating the max value as you go through executes(), then in finish() update them 1 last time. Cutting it real close though.
can you "daisy chain". Submit another batch from this batch's finish. Passing same records and the max you figured out.
can you stamp the records inserted in same batch with same value, like maybe put the batch job's id into a hidden field. Then have another batch (daisy chained?) that looks for them, finds the max in the given range and applies to any that share the batch job id but not have the value applied yet

Set the weight in your finish method of the batch class, it runs once all batches have finished. Track the max weight record in a static variable in the class.

Related

T-SQL set a rotating Flag (True/False) in records using Stored Proc

I can do this using multiple commands in C# for the app I'm creating, but prefer a stored proc to eliminate issues with latency/locks, etc. (hopefully):
I have a table of 10 extensions (important fields):
SortOrder, Extension, IsUsed
First record will be set to IsUsed = true
When calling the stored proc, I need the IsUsed of the NEXT record in sort order to be set to true,the current record that is true set to false. When I hit the last record, rotate back to the first record.
Use Case: I need to rotate through a bank of usable numbers. Multiple people use the app, so cannot reuse. a number within the last 4 minutes (Bank of 10 will suffice, but we can extend if necessary). When the user requests a number, they get the next avail. I can build the table however needed, so any and all options to achieve use case are welcome.
I need to set the flag to true on the 1st record when stored proc is called. All other records should be false.
I have seen this, which is of interest, but doesn't quite answer:
Get "next" row from SQL Server database and flag it in single transaction
If all that you're using this for is to return a number to identify a session, I'd suggest scrapping the whole table idea and letting SQL Server do the work for you.
You can create a SEQUENCE object that will cycle and return the next value for you, without needing to write any code or maintain any tables.
CREATE SEQUENCE dbo.Extension
AS integer
START WITH 5
INCREMENT BY 5
MINVALUE 5
MAXVALUE 50
CYCLE;
This will return the number 5 the first time it's called, up to the number 50 on call number 10, and then start back over. You can adjust the numbers in the code to more or less do whatever you would like, though.
Get the next value like this:
SELECT NEXT VALUE FOR dbo.Extension;
And when/if you need to extend the range:
ALTER SEQUENCE dbo.Extension
MAXVALUE 100;
Play around with the idea on the Rextester demo.
Edit: In light of the comments above and below, I'd still stick with a SEQUENCE, I think.
Every time your code calls the table for an extension, use a query along the lines of this:
SELECT
Extension
FROM
ExtTable
WHERE
SortOrder = NEXT VALUE FOR dbo.Extension;
Functionally, this should do what you're after, again with no code to write or maintain.

How to check if the stream of rows has ended

Is there a way for me to know if the stream of rows has ended? That is, if the job is on the last row?
What im trying to do is for every 10 rows do something, my problem are the last rows, for example in 115 rows, the last 5 wont happen but i need them to.
There is no built-in functionality in Talend which tells you if you're on the last row. You can work around this using one of the following:
Get the row count beforehand. For instance, if you have a file, you
can use tFileRowCount to count the number of rows, then when you
process your file, you use a variable for your current row
number, and so you can tell if you've reached the last row. If your
data come from a database, you could either issue a query that
returns the total number of rows beforehand, or modify your main
query to return the total number of rows in an additional column and
use that (using ranking functions).
Do some processing after the subjob has ended: There may be situations
where you need a special processing for the last row, you can achieve
this by getting the last row processed by the previous subjob (which
you have already saved, for instance, by putting a tSetGlobalVar
after your target, when your subjob is done, your variable contains the last written value).
Edit
For your use case, what you could do is first store the result of the API call in memory using tHashOutput, then read it with a tHashInput in order to process it, and you'll know then how many rows you have retrieved using tHashOutput's global variable tHashOuput_X_NB_LINE.

long running queries and new data

I'm looking at a postgres system with tables containing 10 or 100's of millions of rows, and being fed at a rate of a few rows per second.
I need to do some processing on the rows of these tables, so I plan to run some simple select queries: select * with a where clause based on a range (each row contains a timestamp, that's what I'll work with for ranges). It may be a "closed range", with a start and an end I know are contained in the table, and I know no new data will fall into the range, or an open range : ie one of the range boundary might not be "in the table yet" and rows being fed in the table might thus fall in that range.
Since the response will itself contains millions of rows, and the processing per row can take some time (10s of ms) I'm fully aware I'll use a cursor and fetch, say, a few 1000 rows at a time. My question is:
If I run an "open range" query: will I only get the result as it was when I started the query, or will new rows being inserted in the table that fall in the range while I run my fetch show up ?
(I tend to think that no I won't see new rows, but I'd like a confirmation...)
updated
It should not happen under any isolation level:
https://www.postgresql.org/docs/current/static/transaction-iso.html
but Postgres insures it only in Serializable isolation
Well, I think when you make a query, that means you create a new transaction and it will not receive/update data from any other transaction until it commit.
So, basically "you only get the result as it was when you started the query"

Tableau Future and Current References

Tough problem I am working on here.
I have a table of CustomerIDs and CallDates. I want to measure whether there is a 'repeat call' within a certain period of time (up to 30 days).
I plan on creating a parameter called RepeatTime which is a range from 0 - 30 days, so the user can slide a scale to see the number/percentage of total repeats.
In Excel, I have this working. I sort CustomerID in order and then sort CallDate from earliest to latest. I then have formulas like:
=IF(AND(CurrentCustomerID = FutureCustomerID, FutureCallDate - CurrentCallDate <= RepeatTime), 1,0)
CurrentCustomerID = the current row, and the FutureCustomerID = the following row (so it is saying if the customer ID is the same).
FutureCallDate = the following row and the CurrentCallDate = the current row. It is subtracting the future call time from the first call time to measure the time in between.
The goal is to be able to see, dynamically, how many customers called in for a specific reason within maybe 4 hours or 1 day or 5 days, etc. All of the way up until 30 days (this is our actual metric but it is good to see the calls which are repeats within a shorter time frame so we can investigate).
I had a similar problem, see here for detailed version Array calculation in Tableau, maxif routine
In your case, that is basically the same thing as mine, so you could apply that solution, but I find it easier to understand the one I'm about to give, I would do:
1) Create a calculated field called RepeatTime:
DATEDIFF('day',MAX(CallDates),LOOKUP(MAX(CallDates),-1))
This will calculated how many days have passed since the last call to the current. You can add a IFNULL not to get Null values for the first entry.
2) Drag CustomersID, CallDates and RepeatTime to the worksheet (can be on the marks tab, don't need to be on rows or column).
3) Configure the table calculation of RepeatTIme, Compute using Advanced..., partitioning CustomersID, Adressing CallDates
Also Sort by Field CallDates, Maximum, Ascending.
This will guarantee the table calculation works properly
4) Now you have a base that you can use for what you need. You can either export it to csv or mdb and connect to it.
The best approach, actually, is to have this RepeatTime field calculated outside Tableau, on your database, so it's already there when you connect to it. But this is a way to use Tableau to do the calculation for you.
Unfortunately there's no direct way to do this directly with your database.

Batch update table with 2 million rows

Hi all I've got an interesting task to update a single column in a table that has roughly 2 million rows. I've tried doing this using MVC Entity Framework, however I'm encountering "Out of memory exceptions" and I'm just wondering if there's another way.
The interesting part is that its not just a simple update. The procedure needs to read the TelephoneNumber column already in the table and this could be 014812001 for example. Then it needs to calculate a score for this number based on the number of occurrences greater than 1. So for example using the above number this would score a 6 as we have 3 x 1's and 3 x 0's giving a total of 6.
Once this score has been calculated this number needs to be inserted into the a column in the current row be processed, so in our case the row with the TelephoneNumber = 014812001.
Is this possible using TSQL or is it better to carry on with my Entity Framework approach?
For such a bulk update, I would always recommend doing this on the server itself - there's really no point in dragging down 2 million rows, updating a single column, and then pushing those back to the server again.....
I think based on your description, it should be fairly simple to create a little T-SQL user defined function that would calculate this score. Once you have that, you can issue a single T-SQL statement:
UPDATE dbo.YourTable
SET Score = dbo.fnCalculateScore(TelephoneNumber)
WHERE .... (whatever condition you might have) .....
That should be faster by several orders of magnitude than with your Entity Framework approach....