How to create conditional stages in datasatge rpoject? - datastage

I am having a requirement like: If my ODBC connector stage returns results then go to transformer stage, if my ODBC connector stage doesn't returns any result, I am planning to notify to customer with email. Could any one you please let me know how can achieve this in one job?

You can use sequential stages and make triggers according to the result

Related

Get the name of a pipeline and its activities

I am building a pipeline in ADF and I must save in the database the name of the pipeline and the activities that are being used, how can I save this information in the database?
You would get a better answer if you could be more specific on when/where you want to do that, i.e the usage scenario. Without that, my best-guess answer is that you can use PowerShell to obtain that information.
Specifically, you can use the cmdlet Get-AzDataFactoryV2Pipeline, as specified here: https://learn.microsoft.com/en-us/powershell/module/az.datafactory/get-azdatafactoryv2pipeline?view=azps-5.8.0
You can use a python script to parse these details and then load it into the database this can all be done using Azure DevOps pipelines.

Azure Logic Apps error: The response is not in a JSON format

I'm trying to execute simple step from Azure Apps to get the pipeline run statistics, said pipeline calls Logic Apps in the Web activity:
However I'm receiving the error and I don't understand what exactly the step expects as input here:
Could you please assist in resolving above?
You should not use http requests to pass in your Run Id, because Run Id changes every time you run the pipeline.
You should use Create a pipeline run action first, then you can pass the run ID of the output of this operation to the Get a pipeline run action.
You can refer to this question.
There should be a file identifier logic to be added it seems in your case:
You need to take the Output body of JSON file in next block.

How to fail Azure Data Factory pipeline based on IF Task

I have a pipeline built on Azure data Factory. It has:
a "LookUp" task that has an SQL query that returns a column [CountRecs]. This columns holds a value 0 or more.
an "if" task to check this returned value. I want to fail the pipeline when the value of [CountRecs]>0
Is this possible?
You could probably achieve this by having a Web Activity when your IF Condition is true ([CountRecs]>0) in which the web activity should call the below REST API to cancel the pipeline run by using the pipelinerunID (you can get this value by using dynamic expression - #pipeline().RunId)
Sample Dynamic Expression for Condition: #greater(activity('LookupTableRecordCount').output.firstRow.COUNTRECS, 0)
REST API to Cancel the Pipeline Run: POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelineruns/{runId}/cancel?api-version=2018-06-01
MS Doc related to Rest API: ADF Pipeline Runs - Cancel
One other possible way is to have an invalid URL in your web activity which will fail the Web activity in-turn it will fail the IfCondition activity, which inturn will result in your pipeline to fail.
There is an existing feature request related to the same requirement in ADF user voice forum suggested by other ADF users. I would recommend you please up-vote and/or comment on this feedback which will help to increase the priority of the feature request implementation.
ADF User voice feedback related to this requirement: https://feedback.azure.com/forums/270578-data-factory/suggestions/38143873-a-new-activity-for-cancelling-the-pipeline-executi
Hope this helps.
As a sort-of hack-solution you can create a "Set variable" activity which incurs division by zero if a certain condition is met. I don't like it but it works.
#string(
div(
1
, if(
greater( int(variables('date_diff')), 100 )
, 0
, 1
)
)
)

How to find which activity called another activity in my ADF Pipeline

I have created a pipeline (LogPipeline) that logs other pipelines' status to a database. The idea is that every pipeline will call the LogPipeline at the start and at the end by passing pipeline name and pipeline ID along with other parameters like started/ended/failed.
The last parameter is "Reason" where I want to capture the error message of why a pipeline may have failed.
However, in a given pipeline there are multiple activities that could fail. So I want to direct any and all failed activities to my Execute Pipeline activity and pass the error message.
But on the Execute Pipeline when filling out the parameters, I can only reference an activity by its name, e.g. Reason = #activity['Caller Activity'].Error.Message.
However, since multiple activities are calling this Execute Pipeline, is there a way to say
Reason = #activity[activityThatCalledExecutePipeline].Error.Message?
If my understanding is correct,there are multiple activities call the LogPipeline and you want to get those failed activities' names so that you could know the names inside LogPipeline. Per my knowledge,your requirement is not supported in ADF.
I'm not sure why you have to construct such complex scenario,even though you just want to log the specific fail activities and error messages anyway which is common requirement.There are many monitor ways supported by ADF,please follow below links:
1.https://learn.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor#alerts
2.https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically
I would suggest you getting an idea of Alerts and Monitor in ADF portal:
And you could set the Target Criteria
It includes:

Apache Beam Saving to BigQuery using Scio and explicitly specifying TriggeringFrequency

I'm using Spotify Scio to create a scala Dataflow pipeline which is triggered by a Pub/Sub message. It reads from our private DB and then inserts information into BigQuery.
The problem is:
I need to delete the previous data
For this, I need to use write disposition WRITE_TRUNCATE
But, the job is automatically registered as streaming and thus I get the following error: WriteDisposition.WRITE_TRUNCATE is not supported for an unbounded PCollection
So I need to manually change the pipeline to be a Batch pipeline, specifying a triggering frequency.
So until now I had the following pipeline:
sc
.customInput("Job Trigger", inputIO)
.map(handleUserInformationRetrieval(dbOperationTimeout, projectName))
.flatten
.withGlobalWindow(options = windowOptions(windowingOutputTriggerDuration))
.groupBy(_.ssoId)
.map { case (ssoId, userDataCollection) => Schemas.toTableRow(ssoId, userDataCollection) }
.filter(_.isSuccess)
.map(_.get)
.saveAsBigQuery(tableName, getSchema, WRITE_TRUNCATE, CREATE_NEVER)
I can't seem to find a way to specify a Trigger Frequency when I use the scio api (saveAsBigQuery).
It's only present in the native beam api:
BigQueryIO
.write()
.withTriggeringFrequency(Duration.standardDays(1)) // This is what I'm after
.to(bqTableName)
.withSchema(getSchema)
.withCreateDisposition(CREATE_NEVER)
.withWriteDisposition(WRITE_TRUNCATE)
If I use the BigQueryIO I'll have to use sc.pipeline.apply instead of my current pipeline.
Is there a way to somehow integrate the BigQueryIO to my current pipeline or somehow specify withTriggeringFrequency on the current pipeline?
Scio currently doesn't support specifying the method to be used for loading data to Big Query. Since this is not possible, automatically STREAMING_INSERTS is used for unbounded collections, which obviously can't support truncation. Therefore, you need to fallback to Beam's BigQueryIO specifying a triggering frequency (withTriggeringFrequency(...)) and method (withMethod(Method.FILE_LOADS)).
To integrate it in your Scio pipeline, you can simply use saveAsCustomOutput.
An example can also be found here: https://spotify.github.io/scio/io/Type-Safe-BigQuery#using-type-safe-bigquery-directly-with-beams-io-library