How to store AWS S3 object data to a postgres DB

How to store AWS S3 object data to a postgres DB - postgresql

I'm working on a Golang application where users will be able to upload files:Images & PDFs.
The files will be stored in AWS S3 bucket which I've implemented. However I dont know how to go about retrieving identifiers for the stored items to save them in Postgres.
I was thinking of using an item.ID but the AWS sdk for go method does not provide an object ID:
for _,item:=range response.Contents{
log.Printf("Name : %s\n",item.Key)
log.Printf("ID : %s\n",*item.)
}
What other options are available to retrieve stored object references from AWS S3?

A common approach is to event source a lambda with an S3 bucket event. This way, you can get more details about the object created within your bucket. Then you can make this lambda function to persist the object metadata into postgres
Another option would be simply to append the object key you are using in your SDK to the bucket name you're targeting, then the final result would be full URI that points to the object stored. Something like this
s3://{{BUCKET_NAME/{{OBJECT_KEY}}

Related

Using Mirth Connect Destination Mappings for AWS Access Key Id results in Error

We use vault to store our credentials, I've successfully grabbed S3 Access key ID and Secret Access key using the vault API, and used channelMap.put to create mappings: ${access_key} and ${secret_key}.
aws_s3_file_writer
However when I use these in the S3 file writer I get the error:
"The AWS Access Key Id you provided does not exist in our records."
I know the Access Key Id is valid, it works if I plug it in directly in the S3 file writer destination.
I'd appreciate any help on this. thank you.
UPDATE: I had to convert the results to a string, that fixed it.

You can try using the variable to a higher map. You can use globalChannelMap, globalMap or configurationMap. I would use this last one since it can store password not in plain text mode. You are currently using a channelMap, it scope is only applied to the current message while it is traveling through the channel.
You can check more about variable maps and their scopes in Mirth User guide, Section Variable Maps, page 393. I think that part of the manual is really important to understand.

See my comment, it was a race condition between Vault, Mirth and AWS.

Setting "default" metadata for all+new objects in a GCS bucket?

I run a static website (blog) on Google Cloud Storage.
I need to set a default metadata header for cache-control header for all existing and future objects.
However, editing object metadata instructions show the gsutil setmeta -h "cache-control: ..." command, which doesn't seem to be neither applying to "future" objects in the bucket, nor giving me a way to set a
bucket-wide policy that can be inherited to existing/future objects (since the command is executed per-object).
This is surprising to me because there are features like gsutil defaclwhich let you set a policy for the bucket that is inherited by objects created in the future.
Q: Is there a metadata policy for the entire bucket that would apply to all existing and future objects?

There is no way to set default metadata on GCS objects. You have to set the metadata at write time, or you can update it later (e.g., using gsutil setmeta).

Extracted from this question
According to the documentation, if an object does not have a Cache-Control entry, the default value when serving that object would be public,max-age=3600 if the object is publicly readable.
In the case that you still want to modify this meta-data, you could do that using the JSON API inside a Cloud Funtion that would be triggered every time a new object is created or an existing one is overwritten.

How to access the latest uploaded object in google cloud storage bucket using python in tensorflow model

I am woking on tensorflow model where I want to make use of the latest ulpoad object, in order get output from that uploaded object. Is there way to access latest object uploaded to Google cloud storage bucket using python.

The below is what I use for grabbing the latest updated object.
Instantiate your client
from google.cloud import storage
# first establish your client
storage_client = storage.Client()
Define bucket_name and any additional paths via prefix
# get your blobs
bucket_name = 'your-glorious-bucket-name'
prefix = 'special-directory/within/your/bucket' # optional
Iterate the blobs returned by the client
Storing these as tuple records is quick and efficient.
blobs = [(blob, blob.updated) for blob in storage_client.list_blobs(
bucket_name,
prefix = prefix,
)]
Sort the list on the second tuple value
# sort and grab the latest value, based on the updated key
latest = sorted(blobs, key=lambda tup: tup[1])[-1][0]
string_data = latest.download_as_string()
Metadata key docs and Google Cloud Storage Python client docs.
One-liner
# assumes storage_client as above
# latest is a string formatted response of the blob's data
latest = sorted([(blob, blob.updated) for blob in storage_client.list_blobs(bucket_name, prefix=prefix)], key=lambda tup: tup[1])[-1][0].download_as_string()

There is no a direct way to get the latest uploaded object from Google Cloud Storage. However, there is a workaround using the object's metadata.
Every object that it is uploaded to the Google Cloud Storage has different metadata. For more information you can visit Cloud Storage > Object Metadata documentation. One of the metadatas is "Last updated". This value is a timestamp of the last time the object was updated. Which can happen only in 3 occasions:
A) The object was uploaded for the first time.
B) The object was uploaded and replaced because it already existed.
C) The object's metadata changed.
If you are not updating the metadata of the object, then you can use this work around:
Set a variable with very old date_time object (1900-01-01 00:00:00.000000). There is no chance of an object to have this update metadata.
Set a variable to store the latest's blob's name and set it to "NONE"
List all the blobs in the bucket Google Cloud Storage Documentation
For each blob name load the updated metadata and convert it to date_time object
If the blob's update metadata is greater than the one you have already, then update it and save the current name.
This process will continue until you search all the blobs and only the latest one will be saved in the variables.
I have did a little bit of coding my self and this is my GitHub code example that worked for me. Take the logic and modify it based on your needs. I would also suggest to test it locally and then use it in your code.
BUT, in case you update the blob's metadata manually then this is another workaround:
If you update the blob's any metadata, see this documentation Viewing and Editing Object Metadata, then the "Last update" timestamp of that blob will also get updated so running the above method will NOT give you the last uploaded object but the last modified which are different. Therefore you can add a custom metadata to your object every time you upload and that custom metadata will be the timestamp at the time you upload the object. So no matter what happen to the metadata later, the custom metadata will always keep the time that the object was uploaded. Then use the same method as above but instead of getting blob.update get the blob.metadata and then use that date with the same logic as above.
Additional notes:
To use custom metadata you need to use the prefix x-goog-meta- as it is stated in Editing object metadata section in Viewing and Editing Object Metadata documentation.
So the [CUSTOM_METADATA_KEY] should be something like x-goog-meta-uploaded and [CUSTOM_METADATA_VALUE] should be [CURRENT_TIMESTAMP_DURING_UPLOAD]

Google Cloud Storage : Can we Get file or Search file based on meta data?

While Uploading Object , I have assigned metadata using x-goog-meta-<keyname>.
Currently to get file , we have to use Get Object using Key/Filename.
I want know is it possible like Get Object using META-DATA ?
Is there any way we can directly get/search file by passing metadata ?
Thanks!

No, you cannot.
You can retrieve the metadata via the object name, but you cannot retrieve the object name via the metadata.
If you really needed to, you could create a second bucket that contained objects with the metadata names with data or metadata that referred to the original object name in the first bucket.

Determine S3 file last modified timestamp

I have a Scala Play 2 app and using AWS S3 API to read from S3 files. I have a need to determine when the last modified timestamp is for a file, what's the best way to do that? Is it using getObjectMetadata or perhaps listObjects or ? If possible, I would like to determine the timestamps for multiple files in one call. Are there other open source libraries built on top of AWS S3 APIs?

A representation of S3 Object in AWS Java SDK is S3ObjectSummary, which has method getLastModified. It returns the modified timestamp.
Ideally just list all of the files using listObjects and than call getObjectSummaries on a returned object.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse