We are processing data (records) through Kinesis Stream going to Kinesis Firehose and then outputting the data to a file in our S3 bucket.
Currently, however, all the records are on the same line in our output file, but we want each record to be separated so they are on their own line.
Instead of something like:
Store1, 100, Broccoli
Store1, 101, Avocado
Store1, 102, Apple
It currently looks like:
Store1, 100, BroccoliStore1, 101, AvocadoStore1, 102, Apple
Here is our CloudFormation template:
Resources:
MyBucket:
Type: AWS::S3::Bucket
MyStream:
Type: AWS::Kinesis::Stream
Properties:
Name: my-stream
RetentionPeriodHours: 24
ShardCount: 5
MyFirehose:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamName: my-firehose
DeliveryStreamType: KinesisStreamAsSource
KinesisStreamSourceConfiguration:
KinesisStreamARN:
Fn::Sub: "${MyStream.Arn}"
RoleARN:
Fn::Sub: "${MyRole.Arn}"
S3DestinationConfiguration:
BufferingHints:
IntervalInSeconds: 60
SizeInMBs: 50
CompressionFormat: UNCOMPRESSED
Prefix: concessions/
BucketARN:
Fn::Sub: "${MyBucket.Arn}"
RoleARN:
Fn::Sub: "${MyRole.Arn}"
How can we add line separators so that the records show up on their own lines?
Who ever is feeding your kinesis stream should add '\n' at the end.
See Java example below:
PutRecordRequest putRecordRequest = new PutRecordRequest();
putRecordRequest.setFirehoseName("incoming-stream");
String data = "some data" + "\n"; // add \n as a record separator
Record record = new Record();
record.setData(ByteBuffer.wrap(data.getBytes(StandardCharsets.UTF_8)));
putRecordRequest.setRecord(record);
firehoseClient.putRecord(putRecordRequest);
See source.
Related
I've got an YAML file that contains information from a Powerbi file. we need to add some of the metadata like Tags (we're talking here about tens of dosens if not hundreds.
what I thought is to extract this in an excel - write it there and then inject it back to yaml.
Any way to do that automatically?
here's a sample:
processTime: 08-06-22
report:
reportId: 34lkn34l5k
reportVersion: '1'
reportDeveloper: dev1
reportDomain: ''
linkedDataset: dataset
filters:
filterId: Filtera1
table: DIM_E
column: CLIENT
type: Categorical
metadata:
label: Company
tags: []
filterId: Filter2
table: DIM_E
column: DEPARTMENT
type: Categorical
metadata:
label: Department
tags: []
pages:
pageId: ReportSection
displayName: test
visuals:
visualId: 9dc8
visualType: donutChart
measures: Headcount
columns: - EMPLO
title: Headcount
metadata:
label: ''
tags: []
domain: diversity
hierarchy: []
I've been using ruamel yaml to edit my YAML files and dump them back. I need help understanding how to keep the same structure as the original file has, because all I do is duplicate it, edit, and write it again.
For example, this is the original file:
ElasticLoadBalancingV2Listener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
LoadBalancerArn: !Ref ElasticLoadBalancingV2LoadBalancer
Port: !FindInMap [NLBPorts, Port1, Port]
Protocol: "TCP"
DefaultActions:
-
Order: 1
TargetGroupArn: !Ref ElasticLoadBalancingV2TargetGroup1
Type: "forward"
The new file doesn't look the same:
ElasticLoadBalancingV2Listener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
LoadBalancerArn: !Ref ElasticLoadBalancingV2LoadBalancer
Port: !FindInMap [NLBPorts, Port1, Port]
Protocol: "TCP"
DefaultActions:
- Order: 1
TargetGroupArn: !Ref ElasticLoadBalancingV2TargetGroup1
Type: "forward"
The biggest issue is that I used all sorts of tricks that ruamel has to fix this, but each time some different part of the yaml breaks.
This is my function:
def editEndpointServiceTemplate(endpoint_service_template_path):
yaml = YAML()
yaml.preserve_quotes = True
# yaml.compact(seq_seq=False, seq_map=False)
# yaml.indent(mapping=4, sequence=3, offset=0)
#Load yaml file
with open(endpoint_service_template_path) as fp:
data = yaml.load(fp)
#Edit the yaml
data['Description'] = "CloudFormation"
#Write new yaml file
with open(endpoint_service_template_path, 'w') as fp:
yaml.dump(data, fp)
As you can see with the commented commands, I tinkered around with the settings but couldn't find the sweet spot.
In this case it is obvious that your function did not produce the output you present
(different indent, missing word "CloudFormation"), but in general you should take care
in questions this is the same, and that your program is complete so that results can
more easily be reproduced.
ruamel.yaml does not have built in functions for all kinds of seldom seen format, but yours
is relatively close to the output using the method .indent(mapping=4, sequence=4, offset=2) and
transform by checking line by line.
As it is less likely that withing string scalars you have a sequence indicators followed by three spaces ("- ")
(which additionally would have to occur wrapped to be the first non-space on a line), you better do
.indent(mapping=4, sequence=4, offset=0) transform that:
import sys
import ruamel.yaml
yaml_str = """\
ElasticLoadBalancingV2Listener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
LoadBalancerArn: !Ref ElasticLoadBalancingV2LoadBalancer
Port: !FindInMap [NLBPorts, Port1, Port]
Protocol: "TCP"
DefaultActions:
-
Order: 1
TargetGroupArn: !Ref ElasticLoadBalancingV2TargetGroup1
Type: "forward"
"""
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=0)
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
data['Description'] = "CloudFormation"
def break_seq(s):
result = []
PAT = '- '
for line in s.splitlines():
ls_line = line.lstrip()
if ls_line.startswith(PAT):
line = line.replace(PAT, ' - \n' + ' ' * (line.index(PAT) + 4))
result.append(line)
return '\n'.join(result)
yaml.dump(data, sys.stdout, transform=break_seq)
which gives:
ElasticLoadBalancingV2Listener:
Type: "AWS::ElasticLoadBalancingV2::Listener"
Properties:
LoadBalancerArn: !Ref ElasticLoadBalancingV2LoadBalancer
Port: !FindInMap [NLBPorts, Port1, Port]
Protocol: "TCP"
DefaultActions:
-
Order: 1
TargetGroupArn: !Ref ElasticLoadBalancingV2TargetGroup1
Type: "forward"
Description: CloudFormation
The above can be done by "hacking" the routines that serialize to sequences, but
it is often easier to just transform the output, although not as efficient in
time/space usage.
Unless the program consuming this output uses an incomplete YAML parser, the
actual data structure loaded will not change, it is just less readable for people
unused to such uncommon formatting.
I read the docs on how to Run an Amazon ECS Task When a File is Uploaded to an Amazon S3 Bucket. However, this document stops short of explaining how to get the bucket/key values from the triggering event from within the Fargate task code itself. How can that be done?
I am not sure if you still need the answer for this one. But I did something similar to what Steven1978 mentioned but only using CloudFormation.
The config you're looking for is the InputTransformer. Check this example for a YAML CloudFormation template for an Event Rule:
rEventRuleForFileUpload:
Type: AWS::Events::Rule
Properties:
Description: "EventRule"
State: "ENABLED"
EventPattern:
source:
- "aws.s3"
detail-type:
- 'AWS API Call via CloudTrail'
detail:
eventSource:
- s3.amazonaws.com
eventName:
- "PutObject"
- "CompleteMultipartUpload"
requestParameters:
bucketName: "{YOUR_BUCKET_NAME}"
Targets:
- Id: '{YOUR_ECS_CLUSTER_ID}'
Arn: !Sub "arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:cluster/${NAME_OF_YOUR_CLUSTER_RESOURCE}"
RoleArn: !GetAtt {YOUR_ROLE}.Arn
EcsParameters:
TaskCount: 1
TaskDefinitionArn: !Ref {YOUR_TASK_DEFINITION}
LaunchType: FARGATE
{... WHATEVER CONFIG YOU MIGHT HAVE...}
InputTransformer:
InputPathsMap:
s3_bucket: "$.detail.requestParameters.bucketName"
s3_key: "$.detail.requestParameters.key"
InputTemplate: '{ "containerOverrides": [ { "name": "{THE_NAME_OF_YOUR_CONTAINER_DEFINITION}", "environment": [ { "name": "EVENT_BUCKET", "value": <s3_bucket> }, { "name": "EVENT_OBJECT_KEY", "value": <s3_key> }] } ] }'
With this approach, you'll be able to get the s3 bucket name (EVENT_BUCKET) and the s3 object key (EVENT_OBJECT_KEY) as environment variables inside your container.
The info isn't very clear, indeed, but here are some sources I used to finally get it working:
Container Override;
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerOverride.html
InputTransformer:
https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_InputTransformer.html#API_InputTransformer_Contents
I am having issues deploying a lambda with a handler in a nested directory using sam.
I perform the following steps:
package:
sam package --template template.yaml --output-template-file packaged.yaml --s3-bucket
Creates a packaged.yaml that I use in the next step.
deploy:
aws cloudformation deploy --template-file /Users/localuser/Do/learn-sam/dynamo-stream-lambda/packaged.yaml --stack-name barkingstack
ERROR
Failed to create the changeset: Waiter ChangeSetCreateComplete failed: Waiter encountered a terminal failure state Status: FAILED. Reason: Transform AWS::Serverless-2016-10-31 failed with: Invalid Serverless Application Specification document. Number of errors found: 1. Resource with id [PublishNewBark] is invalid. Missing required property 'Handler'.
Cloudformation/SAM Template
AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Globals:
Function:
Runtime: nodejs8.10
Timeout: 300
Resources:
PublishNewBark:
Type: AWS::Serverless::Function
FunctionName: publishNewBark
CodeUri: .
Handler: src/index.handler
Role: "<ROLE_ARN>"
Description: Reads from the DynamoDB Stream and publishes to an SNS topic
Events:
- ReceiveBark:
Type: DynamoDB
Stream: !GetAtt BarkTable.StreamArn
StartingPosition: TRIM_HORIZON
BatchSize: 1
BarkTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: BarkTable
KeySchema:
- KeyType: HASH
AttributeName: id
AttributeDefinitions:
- AttributeName: id
AttributeType: S
StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES
ProvisionedThroughput:
WriteCapacityUnits: 5
ReadCapacityUnits: 5
WooferTopic:
Type: AWS::SNS::Topic
Properties:
DisplayName: wooferTopic
TopicName: wooferTopic
Subscription:
- Endpoint: <my_email>
Protocol: email
DIRECTORY STRUCTURE
root_directory/
events/ (for sample events)
policies/ (for IAM Role to be created for the lambda using CLI)
src/index.js
package.json
node_modules
template.yaml
HANDLER CODE
async function handler (event, context) {
console.log(JSON.stringify(event, null, 2))
return {}
}
module.exports = {handler}
I believe you have to put everything except the resource type under "Properties".
Your function declaration should be:
PublishNewBark:
Type: AWS::Serverless::Function
Properties:
FunctionName: publishNewBark
CodeUri: .
Handler: src/index.handler
Role: "<ROLE_ARN>"
Description: Reads from the DynamoDB Stream and publishes to an SNS topic
Events:
- ReceiveBark:
Type: DynamoDB
Stream: !GetAtt BarkTable.StreamArn
StartingPosition: TRIM_HORIZON
BatchSize: 1
I'm attempting to export a DynamoDb StreamArn from a stack created in CloudFormation, then reference the export using !ImportValue in the serverless.yml.
But I'm getting this error message:
unknown tag !<!ImportValue> in "/codebuild/output/src/serverless.yml"
The cloudformation and serverless.yml are defined as below. Any help appreciated.
StackA.yml
AWSTemplateFormatVersion: 2010-09-09
Description: Resources for the registration site
Resources:
ClientTable:
Type: AWS::DynamoDB::Table
DeletionPolicy: Retain
Properties:
TableName: client
AttributeDefinitions:
- AttributeName: id
AttributeType: S
KeySchema:
- AttributeName: id
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 2
WriteCapacityUnits: 2
StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES
Outputs:
ClientTableStreamArn:
Description: The ARN for My ClientTable Stream
Value: !GetAtt ClientTable.StreamArn
Export:
Name: my-client-table-stream-arn
serverless.yml
service: my-service
frameworkVersion: ">=1.1.0 <2.0.0"
provider:
name: aws
runtime: nodejs6.10
iamRoleStatements:
- Effect: Allow
Action:
- dynamodb:DescribeStream
- dynamodb:GetRecords
- dynamodb:GetShardIterator
- dynamodb:ListStreams
- dynamodb:GetItem
- dynamodb:PutItem
Resource: arn:aws:dynamodb:*:*:table/client
functions:
foo:
handler: foo.main
events:
- stream:
type: dynamodb
arn: !ImportValue my-client-table-stream-arn
batchSize: 1
Solved by using ${cf:stackName.outputKey}
I struggled with this as well, and what did trick for me was:
functions:
foo:
handler: foo.main
events:
- stream:
type: dynamodb
arn:
!ImportValue my-client-table-stream-arn
batchSize: 1
Note, that intrinsic functions ImportValue is on a new line and indented, otherwise the whole event is ignored when cloudformation-template-update-stack.json is generated.
It appears that you're using the !ImportValue shorthand for CloudFormation YAML. My understanding is that when CloudFormation parses the YAML, and !ImportValue actually aliases Fn::ImportValue. According to the Serverless Function documentation, it appears that they should support the Fn::ImportValue form of imports.
Based on the documentation for Fn::ImportValue, you should be able to reference the your export like
- stream:
type: dynamodb
arn: {"Fn::ImportValue": "my-client-table-stream-arn"}
batchSize: 1
Hope that helps solve your issue.
I couldn't find it clearly documented anywhere but what seemed to resolve the issue for me is:
For the Variables which need to be exposed/exported in outputs, they must have an "Export" property with a "Name" sub-property:
In serverless.ts
resources: {
Resources: resources["Resources"],
Outputs: {
// For eventbus
EventBusName: {
Export: {
Name: "${self:service}-${self:provider.stage}-UNIQUE_EVENTBUS_NAME",
},
Value: {
Ref: "UNIQUE_EVENTBUS_NAME",
},
},
// For something like sqs, or anything else, would be the same
IDVerifyQueueARN: {
Export: {
Name: "${self:service}-${self:provider.stage}-UNIQUE_SQS_NAME",
},
Value: { "Fn::GetAtt": ["UNIQUE_SQS_NAME", "Arn"] },
}
},
}
Once this is deployed you can check if the exports are present by running in the terminal (using your associated aws credentials):
aws cloudformation list-exports
Then there should be a Name property in a list:
{
"ExportingStackId": "***",
"Name": "${self:service}-${self:provider.stage}-UNIQUE_EVENTBUS_NAME", <-- same as given above (but will be populated with your service and stage)
"Value": "***"
}
And then if the above is successful, you can reference it with "Fn::ImportValue" like so, e.g.:
"Resource": {
"Fn::ImportValue": "${self:service}-${self:provider.stage}-UNIQUE_EVENTBUS_NAME", <-- same as given above (but will be populated with your service and stage)
}