Passing Cloud Storage custom metadata into Cloud Storage Notification - google-cloud-storage

We have a Python script that copies/creates files in a GCS bucket.
# let me know if my setting of the custom-metadata is correct
blob.metadata = { "file_capture_time": some_timestamp_var }
blob.upload(...)
We want to configure the bucket such that it generates Cloud Storage notifications whenever an object is created. We also want the custom metadata above to be passed along with the Pub/Sub message to the topic and use that as an ordering key in the Subscription side. How can we do this?

The recommended way to receive notification when an event occurs on the intended GCS bucketis to create a Cloud Pub/Sub topic for new objects and to configure your GCS bucket to publish messages to that topic when new objects are created.
Initially, make sure you've activated the Cloud Pub/Sub API, and use the gsutil command similar to below:
gsutil notification create -f json -e OBJECT_FINALIZE gs://example-bucket
The -e specifies that you're only interested in OBJECT_FINALIZE messages (objects being created)
The -f specifies that you want the payload of the messages to be the object metadata for the JSON API
The -m specifies a key:value attribute that is appended to the set of attributes sent to Cloud Pub/Sub for all events associated with this notification config.
You may specify this parameter multiple times to set multiple attributes.
The full Firebase example which explains the parsing the filename and other info from its context/data with
Here is a good example with a similar context.

Related

KMS KeyPolicy for CloudTrail read/write and EventBridge read?

I have the following resources in a CDK project:
from aws_cdk import (
aws_cloudtrial as cloudtrail,
aws_events as events,
aws_events_targets as targets,
aws_kms as kms
)
#Create a Customer-Managed Key (CMK) for encrypting the CloudTrail logs
mykey = kms.Key(self, "key", alias="somekey")
#Create a CloudTrail Trail, an S3 bucket, and a CloudWatch Log Group
trail = cloudtrail.Trail(self, "myct", send_to_cloud_watch_logs=True, management_events=cloudtrail.ReadWriteType.WRITE_ONLY)
#Create an EventBridge Rule to do something when certain events get matched in the CloudWatch Log Group
rule = events.Rule(self, "rule", event_pattern=events.eventPattern(
#the contents of the eventPattern don't matter for this example
), targets= [
#the contents of the targets don't matter either
])
The problem is, if I pass my key to the trail with the encryption_key=mykey parameter, CloudTrail complains that it can't use the key.
I've tried many different KMS policies, but other than making it wide open to the entire world, I can't figure out how to enable my CloudTrail Trail to read/write using the key (it has to put data into the S3 bucket), and allow CloudWatch and EventBridge to decrypt the encrypted data in the S3 bucket.
The documentation on this is very poor, and depending on which source I look at, they use different syntax and don't explain why they do things. Like, here's just one example from a CFT:
Condition:
StringLike:
'kms:EncryptionContext:aws:cloudtrail:arn': !Sub 'arn:aws:cloudtrail:*:${AWS::AccountId}:trail/*'
OK, but what if I need to connect up EventBridge and CloudWatch Logs, too? No example, no mention of it, as if this use case doesn't exist.
If I omit the encryption key, everything works fine - but I do need the data encrypted at rest in S3, since it's capturing sensitive operations in my master payer account.
Is there any shorthand for this in CDK, or is there an example in CFT (or even outside of IaC tools entirely) of the proper key policy to use in this scenario?
I tried variations on mykey.grant_decrypt(trail.log_group), mykey.grant_encrypt_decrypt(trail), mykey.grant_decrypt(rule), etc. and all of them throw an inscrutable stack trace saying something is undefined, so apparently those methods just don't work.

Using Mirth Connect Destination Mappings for AWS Access Key Id results in Error

We use vault to store our credentials, I've successfully grabbed S3 Access key ID and Secret Access key using the vault API, and used channelMap.put to create mappings: ${access_key} and ${secret_key}.
aws_s3_file_writer
However when I use these in the S3 file writer I get the error:
"The AWS Access Key Id you provided does not exist in our records."
I know the Access Key Id is valid, it works if I plug it in directly in the S3 file writer destination.
I'd appreciate any help on this. thank you.
UPDATE: I had to convert the results to a string, that fixed it.
You can try using the variable to a higher map. You can use globalChannelMap, globalMap or configurationMap. I would use this last one since it can store password not in plain text mode. You are currently using a channelMap, it scope is only applied to the current message while it is traveling through the channel.
You can check more about variable maps and their scopes in Mirth User guide, Section Variable Maps, page 393. I think that part of the manual is really important to understand.
See my comment, it was a race condition between Vault, Mirth and AWS.

How to propagate PubSub metadata with Apache Beam?

Context: I have a pipeline that listen to pub sub, the message to pubsub is published by an object change notification from a google cloud storage. The pipeline process the file using a XmlIO splitting it, so far so good.
The problem is: In the pubsub message (and in the object stored in the google cloud storage) I have some metadata that I would like to merge with the data from the XmlIO to compose the elements that the pipeline will process, how can I achieve this?
You can create a custom window and windowfn that stores the metadata from the pubsub message that you want to use later to enrich the individual records.
Your pipeline will look as follows:
ReadFromPubsub -> Window.into(CopyMetadataToCustomWindowFn) -> ParDo(ExtractFilenameFromPubsubMessage) -> XmlIO -> ParDo(EnrichRecordsWithWindowMetadata) -> Window.into(FixedWindows.of(...))
To start, you'll want to create a subclass of IntervalWindow that stores the metadata that you need. After that, create a subclass of WindowFn where in #assignWindows(...) you copy the metadata from the pubsub message into the IntervalWindow subclass you created. Apply your new windowfn using the Window.into(...) transform. Now each of the records that flow through the XmlIO transform will be within your custom windowfn that contains the metadata.
For the second step, you'll need to extract the relevant filename from the pubsub message to pass to the XmlIO transform as input.
For the third step, you want to extract out the custom metadata from the window in a ParDo/DoFn that is after the XmlIO. The records within XmlIO will preserve the windowing information that was passed through it (note that not all transforms do this but almost all do). You can state that your DoFn needs the window to be passed to your #ProcessElement, for example:
class EnrichRecordsWithWindowMetadata extends DoFn<...> {
#ProcessElement
public void processElement(#Element XmlRecord xmlRecord, MyCustomMetadataWindow metadataWindow) {
... enrich record with metadata on window ...
}
}
Finally, it is a good idea to revert to one of the standard windowfns such as FixedWindows since the metadata on the window is no longer relevant.
You can use directly pub/sub notification from Google Cloud Storage instead of introducing OCN in middle.
Google also suggest to use pub/sub. If you receive the pub/sub notification you can get the message attributes in it.
data = request.get_json()
object_id = data['message']['attributes']['objectGeneration']
bucket_name = data['message']['attributes']['bucketId']
object_name = data['message']['attributes']['objectId']

Setting "default" metadata for all+new objects in a GCS bucket?

I run a static website (blog) on Google Cloud Storage.
I need to set a default metadata header for cache-control header for all existing and future objects.
However, editing object metadata instructions show the gsutil setmeta -h "cache-control: ..." command, which doesn't seem to be neither applying to "future" objects in the bucket, nor giving me a way to set a
bucket-wide policy that can be inherited to existing/future objects (since the command is executed per-object).
This is surprising to me because there are features like gsutil defaclwhich let you set a policy for the bucket that is inherited by objects created in the future.
Q: Is there a metadata policy for the entire bucket that would apply to all existing and future objects?
There is no way to set default metadata on GCS objects. You have to set the metadata at write time, or you can update it later (e.g., using gsutil setmeta).
Extracted from this question
According to the documentation, if an object does not have a Cache-Control entry, the default value when serving that object would be public,max-age=3600 if the object is publicly readable.
In the case that you still want to modify this meta-data, you could do that using the JSON API inside a Cloud Funtion that would be triggered every time a new object is created or an existing one is overwritten.

AWS Cloudwatch: How to add the instance name / custom fields to the log?

We currently have multiple cloudwatch log streams per ec2 instance. This is horrible to debug; queries for "ERROR XY" across all instances would involve either digging into each log stream (time consuming) or using aws cli (time consuming queries).
I would prefer to have a log stream combining the log data of all instances of a specific type, let's say all "webserver" instances log their "apache2" log data to one central stream and "php" log data to another central stream.
Obviously, I still want to be able to figure out which log entry stems from which instance - as I would be with central logging via syslogd.
How can I add the custom field "instance id" to the logs in cloudwatch?
The best way to organize logs in CloudWatch Logs is as follows:
The log group represents the log type. For example: webserver/prod.
The log stream represents the instance id (i.e. the source).
For querying, I highly recommend using the Insights feature (I helped build it when I worked # AWS). The log stream name will be available with each log record as a special #logStream field.
You can query across all instances like this:
filter #message like /ERROR XY/
Or inside one instance like this:
filter #message like /ERROR XY/ and #logStream = "instance_id"