How to use script arguments in AWS DataPipeline SQLActivity?

How to use script arguments in AWS DataPipeline SQLActivity? - postgresql

I am trying to execute an unload command on Redshift via Data Pipeline. The script looks something like:
unload ($$ SELECT *, count(*) FROM (SELECT APP_ID, CAST(record_date AS DATE) WHERE len(APP_ID)>0 AND CAST(record_date as DATE)=$1) GROUP BY APP_ID $$) to 's3://test/unload/' iam_role 'arn:aws:iam::xxxxxxxxxxx:role/Test' delimiter ',' addquotes;
The pipeline looks something like this:
{
"objects": [
{
"role": "DataPipelineDefaultRole",
"subject": "SuccessNotification",
"name": "SNS",
"id": "ActionId_xxxxx”,
"message": "SUCCESS: #{format(minusDays(node.#scheduledStartTime,1),'MM-dd-YYYY')}",
"type": "SnsAlarm",
"topicArn": "arn:aws:sns:us-west-2:xxxxxxxxxx:notification"
},
{
"connectionString": “connection-url”,
"password": “password”,
"name": “Test”,
"id": "DatabaseId_xxxxx”,
"type": "RedshiftDatabase",
"username": “username”
},
{
"subnetId": "subnet-xxxxxx”,
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"name": "EC2",
"id": "ResourceId_xxxxx”,
"type": "Ec2Resource"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://test/logs/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"database": {
"ref": "DatabaseId_xxxxxx”
},
"scriptUri": "s3://test/script.sql",
"name": "SqlActivity",
"scriptArgument": "#{format(minusDays(node.#scheduledStartTime,1),"MM-dd-YYYY”)}”,
"id": "SqlActivityId_xxxxx”,
"runsOn": {
"ref": "ResourceId_xxxx”
},
"type": "SqlActivity",
"onSuccess": {
"ref": "ActionId_xxxxx”
}
}
],
"parameters": []
}
However, I keep getting the error: The column index is out of range: 1, number of columns: 0.
I just can't get it to work. I have tried using ?, $1 and I even tried putting the expression #{format(minusDays(node.#scheduledStartTime,1),"MM-dd-YYYY”)} directly in the script. None of them works.
I have looked at the answers to Amazon Data Pipline: How to use a script argument in a SqlActivity? but none of them are helpful.
Does anyone has idea how to use script argument in SQL Script in AWS Data Pipeline?

Related

Azure data factory Dynamic Content

I have the following output from a web activity .
{
"value": [
{
"id": "/subscriptions/xy_csv",
"name": "xy_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "samplecodes"
},
"content": {
"query": "select * from table 1",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
},
{
"id": "/subscriptions/ab_csv",
"name": "ab_csv",
"type": "Microsoft.code",
"etag": "6200",
"properties": {
"folder": {
"name": "livecode"
},
"content": {
"query": "select * from table 2",
"metadata": {
"language": "sql"
},
"currentConnection": {
"databaseName": "demo",
"poolName": "Built-in"
},
"resultLimit": 5000
},
"type": "SqlQuery"
}
}
]
I would like to create filter activity after the web activity just to filter out items that are saved under the folder name "livecode".
On the filter activity item field I have -#activity('Web1').output.value
On the condition field I have -- #startswith(item().properties.folder.name,'livecode')
The web activity is successful but the filter activity is failed with this error.
{
"errorCode": "InvalidTemplate",
"message": "The execution of template action 'FilterFilter1' failed: The evaluation of 'query' action 'where' expression '#startswith(item().properties.folder.name,'sql')' failed: 'The expression 'startswith(item().properties.folder.name,'sql')' cannot be evaluated because property 'folder' doesn't exist, available properties are 'content, type'.",
"failureType": "UserError",
"target": "Filter1",
"details": ""
}
it feels like I am going wrong on how i have written the Condition Dynamic Content filter to navigate to properties.folder.name. I am not sure what is missing in my condition. Can anyone help? thanks Much appreciated.

The error is because of the web activity output's properties object might not contain the folder sometimes.
I have taken the following json and got the same error:
{
"value":[
{
"id":"/subscriptions/xy_csv",
"name":"xy_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"content":{
"query":"select * from table 1",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
},
{
"id":"/subscriptions/ab_csv",
"name":"ab_csv",
"type":"Microsoft.code",
"etag":"6200",
"properties":{
"folder":{
"name":"livecode"
},
"content":{
"query":"select * from table 2",
"metadata":{
"language":"sql"
},
"currentConnection":{
"databaseName":"demo",
"poolName":"Built-in"
},
"resultLimit":5000
},
"type":"SqlQuery"
}
}
]
}
So, you have to modify the filter condition to check whether it contains a folder key or not using the following dynamic content. I have taken your web activity output as a parameter value and took out folder key from properties object:
#startswith(if(contains(item().properties,'folder'),item().properties.folder.name,''),'livecode')
When I debug the pipeline, we get desired result:

AWS Stepfunction: Substring from input

As a general question is it possible do a substring function within step functions?
I receive the following event:
{
"input": {
"version": "0",
"id": "d9c5fec0-d08d-6abd-4ea5-0107fbbce47d",
"detail-type": "EBS Multi-Volume Snapshots Completion Status",
"source": "aws.ec2",
"account": "12345678",
"time": "2021-11-12T12:08:16Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123"
]
}
}
but need the snapshot id as input to DescribeInstances, therefore I need to extract snap-0a98c2a42ee266123 from arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123
Is there any simple way to do this within step functions?. That is to say without having to pass it to a lambda or something equally convoluted?

This has recently become possible with the addition of new intrinsic functions. ArrayGetItem gets an item by its index. StringSplit splits a string at a delimeter. Use a Pass state to extract the snapshot name from the resource ARN:
{
"StartAt": "ExtractSnapshotName",
"States": {
"ExtractSnapshotName": {
"Type": "Pass",
"Parameters": {
"input.$": "$.input",
"snapshotName.$": "States.ArrayGetItem(States.StringSplit(States.ArrayGetItem($.input.resources,0), '/'),1)"
},
"ResultPath": "$",
"End": true
}
}
}
output:
{
"input": {
"version": "0",
"id": "d9c5fec0-d08d-6abd-4ea5-0107fbbce47d",
"detail-type": "EBS Multi-Volume Snapshots Completion Status",
"source": "aws.ec2",
"account": "12345678",
"time": "2021-11-12T12:08:16Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123"
]
},
"snapshotName": "snap-0a98c2a42ee266123"
}

TMSL Creating Multiple Partitions Unrecognised JSON property

Hi all I'm trying to put together a script to create multiple partitions within a tabular data model. I can do one at a time, but multiples seems to be erroring with the following message.
Unrecognized JSON property: partitions. Check path 'create.partitions'
I'm using the following (anonymised) generated script.
{
"create": {
"parentObject": {
"database": "MY_TABULAR",
"table": "MY_TABLE"
},
"partitions": [{
"name": "MY_TABLE 12 2018-09",
"source": {
"query": "SELECT * FROM [Fact].[MY_TABLE] WHERE PlanKey = 12 AND dateKey BETWEEN 20180901 AND 20180930",
"dataSource": "MY_DW"
}
},
{
"name": "MY_TABLE 12 2018-10",
"source": {
"query": "SELECT * FROM [Fact].[MY_TABLE] WHERE PlanKey = 12 AND dateKey BETWEEN 20181001 AND 20181031",
"dataSource": "MY_DW"
}
},
{
"name": "MY_TABLE 12 2018-11",
"source": {
"query": "SELECT * FROM [Fact].[MY_TABLE] WHERE PlanKey = 12 AND dateKey BETWEEN 20181101 AND 20181130",
"dataSource": "MY_DW"
}
}]
}
}
As far as I can tell from looking at the references this is correct, but SSMS doesn't appear to like it.

You can do this by using the Sequence command to execute multiple CreateOrReplace commands that will create the partitions. The Sequence command does have an optional maxParallelism property, however only refresh operations run in parallel (per MSDN). The example below details this further.
{
"sequence":
{
"operations": [
{
"createOrReplace": {
"object": {
"database": "YourTabularDatabase",
"table": "YourTable",
"partition": "Partition 1"
},
"partition": {
"name": "Partition 1",
"dataView": "full",
"source": {
"query": "SELECT * FROM [dbo].[SourceTable] where DateKey < 20180901",
"dataSource": "YourDataSource"
}
}
}
},
{
"createOrReplace": {
"object": {
"database": "YourTabularDatabase",
"table": "YourTable",
"partition": "Partition 2"
},
"partition": {
"name": "Partition 2,
"source": {
"query": "SELECT * FROM [dbo].[SourceTable] where DateKey >= 20180901",
"dataSource": "YourDataSource"
}
}
}
}]
}
}

Swagger UI doesn't show embedded json properties model

I am using the swagger tool for documenting my Jersey based REST API (the swaggerui I am using was downloaded on June 2014 don't know if this issue has been fixed in later versions but as I made a lot of customization to its code so I don't have the option to download the latest without investing lot of time to customize it again).
So far and until now, all my transfer objects have one level deep properties (no embedded pojos). But now that I added some rest paths that are returning more complex objects (two levels of depth) I found that SwaggerUI is not expanding the JSON model schema when having embedded objects.
Here is the important part of the swagger doc:
...
{
"path": "/user/combo",
"operations": [{
"method": "POST",
"summary": "Inserts a combo (user, address)",
"notes": "Will insert a new user and a address definition in a single step",
"type": "UserAndAddressWithIdSwaggerDto",
"nickname": "insertCombo",
"consumes": ["application/json"],
"parameters": [{
"name": "body",
"description": "New user and address combo",
"required": true,
"type": "UserAndAddressWithIdSwaggerDto",
"paramType": "body",
"allowMultiple": false
}],
"responseMessages": [{
"code": 200,
"message": "OK",
"responseModel": "UserAndAddressWithIdSwaggerDto"
}]
}]
}
...
"models": {
"UserAndAddressWithIdSwaggerDto": {
"id": "UserAndAddressWithIdSwaggerDto",
"description": "",
"required": ["user",
"address"],
"properties": {
"user": {
"$ref": "UserDto",
"description": "User"
},
"address": {
"$ref": "AddressDto",
"description": "Address"
}
}
},
"UserDto": {
"id": "UserDto",
"properties": {
"userId": {
"type": "integer",
"format": "int64"
},
"name": {
"type": "string"
},...
},
"AddressDto": {
"id": "AddressDto",
"properties": {
"addressId": {
"type": "integer",
"format": "int64"
},
"street": {
"type": "string"
},...
}
}
...
The embedded objects are User and Address, their models are being created correctly as shown in the json response.
But when opening the SwaggerUI I can only see:
{
"user": "UserDto",
"address": "AddressDto"
}
But I should see something like:
{
"user": {
"userId": "integer",
"name": "string",...
},
"address": {
"addressId": "integer",
"street": "string",...
}
}
Something may be wrong in the code that expands the internal properties, the javascript console doesn't show any error so I assume this is a bug.

I found the solution, there is a a line of code that needs to be modified to make it work properly:
In the swagger.js file there is a getSampleValue function with a conditional checking for undefined:
SwaggerModelProperty.prototype.getSampleValue = function(modelsToIgnore) {
var result;
if ((this.refModel != null) && (modelsToIgnore[this.refModel.name] === 'undefined'))
...
I updated the equality check to (removing quotes):
modelsToIgnore[this.refModel.name] === undefined
After that, SwaggerUI is able to show the embedded models.

OData REST Filter for deeply nested data

I have a working REST request that returns a large results collection. (trimmed here)
The original URL is:
http://intranet.domain.com//_api/SP.UserProfiles.PeopleManager/GetPropertiesFor(accountName=#v)?#v='domain\kens'&$select=AccountName,DisplayName,Email,Title,UserProfileProperties
The response is:
{
"d": {
"__metadata": {
"id": "stuff",
"uri": "morestuff",
"type": "SP.UserProfiles.PersonProperties"
},
"AccountName": "domain\\KenS",
"DisplayName": "Ken Sanchez",
"Email": "KenS#domain.com",
"Title": "Research Assistant",
"UserProfileProperties": {
"results": [
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "UserProfile_GUID",
"Value": "1c419284-604e-41a8-906f-ac34fd4068ab",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "SID",
"Value": "S-1-5-21-2740942301-4273591597-3258045437-1132",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "ADGuid",
"Value": "",
"ValueType": "Edm.String"
},
{
"__metadata": {
"type": "SP.KeyValue"
},
"Key": "AccountName",
"Value": "domain\\KenS",
"ValueType": "Edm.String"
}...
Is it possible to change the REST request with a $filter that only returns the Key Values from the results collection where Key=SID OR Key= other values?
I only need about 3 values from the results collection by name.

In OData, you can't filter an inner feed.
Instead you could try to query the entity set that UserProfileProperties comes from and expand the associated SP.UserProfiles.PersonProperties entity.
The syntax will need to be adjusted for your scenario, but I'm thinking something along these lines:
service.svc/UserProfileProperties?$filter=Key eq 'SID' and RelatedPersonProperties/AccountName eq 'domain\kens'&$expand=RelatedPersonProperties
That assumes you have a top-level entity set of UserProfileProperties and each is tied back to a single SP.UserProfiles.PersonProperties entity via a navigation property called (in my example) RelatedPersonProperties.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use script arguments in AWS DataPipeline SQLActivity? - postgresql

Related

Azure data factory Dynamic Content

AWS Stepfunction: Substring from input

TMSL Creating Multiple Partitions Unrecognised JSON property

Swagger UI doesn't show embedded json properties model

OData REST Filter for deeply nested data

Categories

Resources