Using PowerShell to upload to AWS S3 - powershell

Hopefully this is a quick fix (most likely user error) I am using PowerShell to upload to AWS S3, I'm attempting to copy x amount of .mp4s from a folder to an S3 location, I'm able to copy individual files successfully using the below command:
aws s3 cp .\video1.mp4 s3://bucketname/root/source/
But when I try to copy all the files within that directory I get an error:
aws s3 cp F:\folder1\folder2\folder3\folder4\* s3://bucketname/root/source/
The user-provided path F:\folder1\folder2\folder3\folder4\* does not exist.
I've tried multiple variations on the above, no path just *, *.mp4, .*.mp4 (coming from a Linux background, using quotation marks etc) but I can't seem to get it working.
I was using this documentation initially https://www.tutorialspoint.com/how-to-copy-folder-contents-in-powershell-with-recurse-parameter I feel the answer is probably very simple but couldn't see what I was doing wrong.
Any help would be appreciated.
Thanks.

Related

Pull from and Push to S3 using Perl

everyone! I have what I assume to be a simple problem, but I could use a hand digging in. I have a server that preprocesses data before translation. This is done by a series of perl scripts developed over a decade ago (but they work!). This virtual server is being lifted into AWS. The change this makes for my scripts is that the location they pull from and the location they write to will be S3 buckets now.
The work flow is: copy all files in the source location to the local drive, preprocess the data file by file, and when complete move the preprocessed files to a final destination.
process_file ($workingDir, $dirEntry);
final_move;
move("$downloadDir/$dirEntry", "$archiveDir") or die "ERROR: Archive file $downloadDir/$dirEntry -> $archiveDir FAILED $!\n";
unlink("$workingDir/$dirEntry");
So, in this case $dir and $archiveDir are S3 buckets.
Any advice on adapting this is appreciated.
TIA,
VtR
You have a few options.
Use a system like s3fs-fuse to mount your S3 bucket as a local drive. This would presumably require the smallest changes to your existing code.
Use the AWS Command Line Interface to copy your files to your S3 bucket.
Use the Amazon API (through something like Paws) to upload your files to S3.

Stream output of aws cli sync command to powershell host

I'm using Powershell to write a folder synchronization tool to copy files from a local folder up to AWS S3 with the AWS CLI.
The script works as I can see files show up in S3, but the output of the aws sync command does not appear on screen (normally when aws sync is run from the command line it shows each file as it it uploads, the current status of all files/count, etc).
How do I get that to happen inside a Powershell script?
Here are some various things I've tried, but none of which worked:
aws s3 sync $local_folder $aws_bucket
$awsio = aws s3 sync $local_folder $aws_bucket
#Out-Host -InputObject $awsio
Write-Output $awsio
Turns out the answer was the first thing I tried which was just the normal command on its own line:
aws s3 sync $local_folder $aws_bucket
I think what happened is when I first tried that, it was doing something in the background before actually starting to run. So if I had waited longer I would have seen output appear on screen as I expected...

Data written with gsutil is not visible with gcsfuse

I have installed gcsfuse to support an app requiring a posix-like mount point.
Existing data written with gsutil is not visible, but data written via the browser (Cloud Storage > Storage Browser) is.
According to https://cloud.google.com/storage/docs/gcsfuse -
You can simultaneously read and write to Google Cloud Storage using the Fuse Adapter and tools like gsutil. For example, if you write an object using the Fuse Adapter, it will immediately be available to read with gsutil, or vice versa, without the need to re-mount the bucket or reboot the Compute Engine instance.
Has anyone been successful collaborating with gcsfuse and gsutil?
I feel like I'm missing something.
Thanks!
This is likely because gsutil doesn't create directory placeholder objects, and gcsfuse by default requires them in order for a directory to be visible. To confirm: when you write an object with gsutil in a directory that you can already see (e.g. the root), does it show up?
You can work around this in one of two ways:
Create the directory placeholders for the directories you're missing. The easiest way to do this for a missing object foo/bar/baz is using a gcsfuse mount:
mkdir -p foo/bar
Run gcsfuse with the --implicit-dirs flag. Make sure to read the documentation linked above for caveats, though.

Batch file uploading to cloud storage

Could anyone cut and paste a working request to upload several files to cloud storage in a batch. I am really struggling to get it working, there are no examples of file uploads and I'm really stuck. Could probably work it out if I had a working starting point. I'm starting to go crazy so any help would be much appreciated.
You can find an example at [1] and consult this other answer at [2] as reference.
I would suggest you to use gsutil to copy files even as an external call from your application (PHP exec() or system()) since this tool is optimised for parallel file transfer (-m option) and recursive folder copy (-R option) making it very simple and efficient.
For more help on gsutil copy command : gsutil cp help
Links:
[1] - https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch#example
[2] - Batch upload requests to Google Cloud Storage using javascript
Regards
Paolo

On resume gsutil seems to re-upload files

I'm trying to upload data to Google Cloud Storage from a disk with ~3000 files totalling 1TB. I'm using gsutil cp -R <disk-top-directory> <bucket>. My understanding is that, if gsutil is resumed/restarted, it uses checksums to determine when a file has already been uploaded and skips over it.
It doesn't appear to be doing this: it appears to be resuming the upload from the top and replacing the files all over again. When I run successive gsutil ls -Rl <bucket/disk-top-directory> ten minutes apart and compare them with diff, I see what appears to be the same files with the same sizes but a changed (newer) date. (i.e. consistent with the same file being re-uploaded.)
For example:
< 404104811 2014-04-08T14:13:44Z gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2
---
> 404104811 2014-04-08T14:43:48Z gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2
The machine I'm using to read the disk and transfer files is running Ubuntu 13.10. I installed gsutil using the pip instructions for Debian and Ubuntu.
Am I misunderstanding how gsutil's resumable transfers is supposed to work? If not, any diagnosis and fix to get the correct resume behavior? Thanks in advance!
You need to use the -n (No-clobber) switch to prevent the re-uploading of objects that already exist at the destination.
gsutil cp -Rn <disk-top-directory> <bucket>
From the help (gsutil help cp)
-n No-clobber. When specified, existing files or objects at the
destination will not be overwritten. Any items that are skipped
by this option will be reported as being skipped. This option
will perform an additional HEAD request to check if an item
exists before attempting to upload the data. This will save
retransmitting data, but the additional HTTP requests may make
small object transfers slower and more expensive.
Also according to this, when transfering files over 2MB, gsutil automatically uses a resumable transfer mode.
If you're open to working with the (still beta) gsutil v4, that version of gsutil has an rsync command. You can get this by running:
gsutil update gs://prerelease/gsutil_4.0beta2pre_minus_m_sugg.tar.gz
Please be sure to read the release notes before switching to this major new release, especially if you're using gsutil v3 in scripts.