Copying data from S3 to Redshift - postgresql

I feel like this should be a lot easier than it's been on me.
copy table
from 's3://s3-us-west-1.amazonaws.com/bucketname/filename.csv'
CREDENTIALS 'aws_access_key_id=my-access;aws_secret_access_key=my-secret'
REGION 'us-west-1';
Note I added the REGION section after having a problem but did nothing.
Where I am confused though is that in the bucket properties there is only the https://path/to/the/file.csv. I can only assume that all the documentation that I have read calling for the path to start with s3://... that I would just change https to s3 like shown in my example.
However I get this error:
"Error : ERROR: S3ServiceException:
The bucket you are attempting to access must be addressed using the specified endpoint.
Please send all future requests to this endpoint.,Status 301,Error PermanentRedirect,Rid"
I am using navicat for PostgreSQL to connect to Redshift and Im running on a mac.

The S3 path should be 's3://bucketname/filename.csv'. Try this.

Yes, It should be a lot easier :-)
I have only seen this error when your S3 bucket is not in US Standard. In such cases you need to use endpoint based address e.g. http://s3-eu-west-1.amazonaws.com/mybucket/myfile.txt.
You can find the endpoints for your region in this documentation page,
http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

Related

Using Mirth Connect Destination Mappings for AWS Access Key Id results in Error

We use vault to store our credentials, I've successfully grabbed S3 Access key ID and Secret Access key using the vault API, and used channelMap.put to create mappings: ${access_key} and ${secret_key}.
aws_s3_file_writer
However when I use these in the S3 file writer I get the error:
"The AWS Access Key Id you provided does not exist in our records."
I know the Access Key Id is valid, it works if I plug it in directly in the S3 file writer destination.
I'd appreciate any help on this. thank you.
UPDATE: I had to convert the results to a string, that fixed it.
You can try using the variable to a higher map. You can use globalChannelMap, globalMap or configurationMap. I would use this last one since it can store password not in plain text mode. You are currently using a channelMap, it scope is only applied to the current message while it is traveling through the channel.
You can check more about variable maps and their scopes in Mirth User guide, Section Variable Maps, page 393. I think that part of the manual is really important to understand.
See my comment, it was a race condition between Vault, Mirth and AWS.

How to avoid Mongo DB NoSQL blind (sleep) injection

While scanning my Application for vulnerability, I have got one high risk error i.e.
Blind MongoDB NoSQL Injection
I have checked what exactly request is sent to database by tool which performed scanning and found while Requesting GET call it had added below line to GET request.
{"$where":"sleep(181000);return 1;"}
Scan received a "Time Out" response, which indicates that the injected "Sleep" command succeeded.
I need help to fix this vulnerability. Can anyone help me out here? I just wanted to understand what I need to add in my code to perform this check before connecting to database?
Thanks,
Anshu
Similar to SQL injection, or any other type of Code Injection, don't copy untrusted content into a string that will be executed as a MongoDB query.
You apparently have some code in your app that naively accepts user input or some other content and runs it as a MongoDB query.
Sorry, it's hard to give a more specific answer, because you haven't shown that code, or described what you intended it to do.
But generally, in every place where you use external content, you have to imagine how it could be misused if the content doesn't contain the format you assume it does.
You must instead validate the content, so it can only be in the format you intend, or else reject the content if it's not in a valid format.

Azure Copy Activity Rest Results Unexpected

I'm attempting to pull data from the Square Connect v1 API using ADF. I'm utilizing a Copy Activity with a REST source. I am successfully pulling back data, however, the results are unexpected.
The endpoint is /v1/{location_id}/payments. I have three parameters, shown below.
I can successfully pull this data via Postman.
The results are stored in a Blob and are as if I did not specify any parameters whatsoever.
Only when I hardcode the parameters into the relative path
do I get correct results.
I feel I must be missing a setting somewhere, but which one?
You can try setting the values you want into a setVariable activity, and then have your copyActivity reference those variables. This will tell you whether it is an issue with the dynamic content or not. I have run into some unexpected behavior myself. The benefit of the intermediate setVariable activity is twofold. Firstly it coerces the datatype, secondly, it lets you see what the value is.
My apologies for not using comments. I do not yet have enough points to comment.

Different S3 behavior using different endpoints?

I'm currently writing code to use Amazon's S3 REST API and I notice different behavior where the only difference seems to be the Amazon endpoint URI that I use, e.g., https://s3.amazonaws.com vs. https://s3-us-west-2.amazonaws.com.
Examples of different behavior for the the GET Bucket (List Objects) call:
Using one endpoint, it includes the "folder" in the results, e.g.:
/path/subfolder/
/path/subfolder/file1.txt
/path/subfolder/file2.txt
and, using the other endpoint, it does not include the "folder" in the results:
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Using one endpoint, it represents "folders" using a trailing / as shown above and, using the other endpoint, it uses a trailing _$folder$:
/path/subfolder_$folder$
/path/subfolder/file1.txt
/path/subfolder/file2.txt
Why the differences? How can I make it return results in a consistent manner regardless of endpoint?
Note that I get these same odd results even if I use Amazon's own command-line AWS S3 client, so it's not my code.
And the contents of the buckets should be irrelevant anyway.
Your assertion notwithstanding, your issue is exactly about the content of the buckets, and not something S3 is doing -- the S3 API has no concept of folders. None. The S3 console can display folders, but this is for convenience -- the folders are not really there -- or if there are folder-like entities, they're irrelevant and not needed.
In Amazon S3, buckets and objects are the primary resources, where objects are stored in buckets. Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
So why are you seeing this?
Either you've been using EMR/Hadoop, or some other code written by someone who took a bad example and ran with it... or is doing something differently than it should have been done for quite some time.
Amazon EMR is a web service that uses a managed Hadoop framework to process, distribute, and interact with data in AWS data stores, including Amazon S3. Because S3 uses a key-value pair storage system, the Hadoop file system implements directory support in S3 by creating empty files with the <directoryname>_$folder$ suffix.
https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-empty-files/
This may have been something the S3 console did many years ago, and apparently (since you don't report seeing them in the console) it still supports displaying such objects as folders in the console... but the S3 console no longer creates them this way, if it ever did.
I've mirrored the bucket "folder" layout exactly
If you create a folder in the console, an empty object with the key "foldername/" is created. This in turn is used to display a folder that you can navigate into, and upload objects with keys beginning with that folder name as a prefix.
The Amazon S3 console treats all objects that have a forward slash "/" character as the last (trailing) character in the key name as a folder
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
If you just create objects using the API, then "my/object.txt" appears in the console as "object.txt" inside folder "my" even though there is no "my/" object created... so if the objects are created with the API, you'd see neither style of "folder" in the object listing.
That is probably a bug in the API endpoint which includes the "folder" - S3 internally doesn't actually have a folder structure, but instead is just a set of keys associated with files, where keys (for convenience) can contain slash-separated paths which then show up as "folders" in the web interface. There is the option in the API to specify a prefix, which I believe can be any part of the key up to and including part of the filename.
EMR's s3 client is not the apache one, so I can't speak accurately about it.
In ASF hadoop releases (and HDP, CDH)
The older s3n:// client uses $folder$ as its folder delimiter.
The newer s3a:// client uses / as its folder marker, but will handle $folder$ if there. At least it used to; I can't see where in the code it does now.
The S3A clients strip out all folder markers when you list things; S3A uses them to simulate empty dirs and deletes all parent markers when you create child file/dir entries.
Whatever you have which processes GET should just ignore entries with "/" or $folder at the end.
As to why they are different, the local EMRFS is a different codepath, using dynamo for implementing consistency. At a guess, it doesn't need to mock empty dirs, as the DDB tables will host all directory entries.

Azure Rest Service to create OS disk

I'm trying to add a disk to a Subscription using the Add Disk REST service ( http://msdn.microsoft.com/en-us/library/windowsazure/jj157178.aspx )
I tried pretty much every combination explained but no matter what I do, the disk is listed as a Data Disk.
Trying use fiddler to inspect how the Azure PowerShell (https://www.windowsazure.com/en-us/manage/downloads/ ) just results in an error.
According to MS, you should specify HasOperatingSystem but you don’t supply it when using Microsoft’s PScmdlet. If you do a List Disks ( http://msdn.microsoft.com/en-us/library/windowsazure/jj157176 ) it should send this too, but the only way to distinct Data disks from OS disk’s is weather ”OS” is null or contains ”windows”/”Linux”. Given that information I tried creating the disk with/without OS and/or HasOperatingSystem in all combinations, and no matter what I always end up being a Data disk.
Using Microsoft PowerShell CDMLets allow using both HTTP and HTTPS in URI, so tried both of those too.
Does anyone have a WORKING example of the xml to send, to create an OS disk?
<Disk xmlns="http://schemas.microsoft.com/windowsazure">
<HasOperatingSystem>true</HasOperatingSystem>
<Label>d2luZ3VuYXY3MDEtbmF2NzAxLTAtMjAxMjA4MjcxNTA5NTU=</Label>
<MediaLink>http://winguvhd.blob.core.windows.net/nav701/nav701-0-20120827150955_osdisk.vhd</MediaLink>
<Name>wingunav701-nav701-0-20120827150955</Name>
<OS>Windows</OS>
</Disk>
As I mentioned in my comment, there is indeed an issue with the documentation.
Try this as your request payload. This was provided to me by the person from Microsoft who wrote Windows Azure PowerShell Cmdlets:
<Disk xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/windowsazure">
<OS>Windows</OS>
<Label>mydisk.vhd</Label>
<MediaLink>https://vmdemostorage.blob.core.windows.net/uploads/mydisk.vhd</MediaLink>
<Name>mydisk.vhd</Name>
</Disk>
I just tried using the XML above, and I can see an OS Disk in my subscription.