count of records in a streamsets stage - streamsets

I use Streamsets to ingest records from oracle to ElasticSearch. I want to register in maprDB destination the count of record that I process each step in my oracle query. How can I get the number of records at a certain streamsets stage?

You can insert one of the script processors (JavaScript Evaluator) to trigger event record. And connect MaprDB destination to script processor event lane.
var eventRecord = sdcFunctions.createEvent("count", 1);
eventRecord.value = { recordCount : records.length};
sdcFunctions.toEvent(eventRecord);
Sample Pipeline

Related

How to create error based on Lookup value in Data Factory?

I have Azure Data Factory pipeline.
After processing data, I would like to validate from Azure SQL database to catch exception, which were not catched by data factory. There are situations were no new rows were created because of errors in system.
So I would create Lookup to make SELECT COUNT statement to check specific ID exists or not.
Value 0 would mean that no required row is created and error should occur in data factory.
How to create error for monitoring data factory if lookup value is 0.
You can lookup the sql sink with query to count the rows in it. Then use IfCondition activity to compare the result with 0 and proceed with any action as necessary.
Query: SELECT count(*) FROM [dbo].[myStudents]
IfActivityExpression: #equals(activity('Lookup SQL').output.value,0)

How to get row count in file using Azure Lookup Activity

I am reading the data file and RecordCount file having counts of record in data file. I am using lookup Activity to get the counts in data file and comparing it with the count of RecordCount file. This approach is working well and I can compare the records when we have count less than 5000. When data file has Count is more than 5000, it's considering only 5000 records and my pipeline is aborting because of count mismatch.
eg:
Datafile count: 7500
RecordCount file: 7500
Though counts are equal but Lookup will consider only 5000 records and will give a mismatch.
How can I achieve this.
Add a Data Flow to your pipeline before the Lookup. Source = ADLS Gen2, Sink = ADLS Gen2. Add a Surrogate Key transformation, call the new column as "mycounter". Add an Aggregate transformation and call the new column as "rowcount" with a formula of max(mycounter). In the Sink, output just the "rowcount" column. You'll now have a new dataset that is just the row count from any file. You can consume that row count as a single-line Lookup activity in the pipeline directly after your data flow.

Pipeline not picking up all rows/records from source

Stream sets pipeline is not picking up all the records/rows from the source. Am I missing the obvious?
In example,
Source Informix
39,136 rows
StreamSets
Input = 38,926 rows
Output = 38,928 rows

System.LimitException: Too many query rows: 50001 error in trigger

i have soql in Apex trigger where as it is fetching the all the records of test object.
SOQl is fetching more than 50000 records so when ever i am updating the records i am facing this governor limits error .
please let me know how to solve this error.
List<test__c> ocrInformation = new List<test__c>();
Map<String,String> Opporgcode=new Map<String,String>();
ocrInformation= [select id,Team__c,Org__c from test__c];//facing an error here
for(test__c oct: ocrInformation){
Opporgcode.put(oct.Org__c,oct.Team__c);
}
It's standard Salesforce limitation
total number of records retrieved by SOQL queries = 50,000
Do you really need to select all test__c records? Possibly, you could reduce amount of retrieved data with help op where or limit conditions. If not, you can try to use Batch Apex. It allows the 50k limit counts per batch execution.

Spring batch : How to skip current step based on a precondition in spring batch

I have a Spring batch step reader where the query is complex and contains join of several tables.
Job will be run everyday looking for records that were added to table A, based on the last updated date.
In the scenario where no records were added, the query still takes a long time to return results. I would like to check if there were any records that were added to table A, and only then run the full query.
Example : select count(recordID) from table A where last_update_date >
If count > 0, then proceed with the step (reader, writer etc) joining the other tables.
If count = 0, then skip the reader, writer and set step status as COMPLETED and proceed with the next step of the job.
Is this possible in Spring batch ? If yes, how can this be done?
Use a StoredProcedureItemReader.
Or a JobExecutionDecider where perform fast query and move to processing step or to job termination.