Gatling: upload simulation.log to S3 bucket from simulation - scala

I would like to upload the simulation.log file of a scenario to an S3 bucket once my simulation is finished.
I was thinking of adding the upload in the after block of my simulation.
I didn't find any example that does that. I only found scripts external to the simulation taking care if it.
Is there any reason why I shouldn't do it?
If not, how can I get the absolute path of the simulation.log?

Unfortunately block 'after' execute early then generating report.
I would advise writing a script that runs after the test.
Go to directory with the last report.
cd target/gatling && cd "$(ls -td -- */ | head -n 1 | cut -d'/' -f1)"
Use aws cli for upload to s3
aws s3 cp simulation.log s3://my-bucket/

Related

How to download all bucket files. (The issue with the -m flag gsutil)

I am trying to copy all files from cloud storage bucket recursively and I am having problem with the -m flag as I have investigated.
The command that I am running
gsutil -m cp -r gs://{{ src_bucket }} {{ bucket_backup }}
I am getting something like this:
CommandException: 1 file/object could not be transferred.
where the number of files/objects differs every time.
After investigation I have tried to reduce number of threads/processes which used with the -m option, but this has not helped, so I am looking for some advice about this. I have 170 MiB data on the bucket which is approximately 300k files. I need to download them as fast as possible
UPD:
Logs with -L flag
[Errno 2] No such file or directory: '<path>/en_.gstmp' -> '<path>/en'
6 errors like that.
The root of the issue might be that both directory and file of the same name exist in the GCS bucket. Try executing the command with -L flag, so you will get additional logs on the execution and you will be able to find the file that is causing this error.
I would suggest you delete that file and make sure there is no directory in the bucket of that name and then upload this file to the bucket again.
Also check if any of the directory created with Jar name. Delete them and processed the copy files.
And check if the required file is already at destination and delete the file at destination and execute copy again.
There are alternatives to copy, for example, it is possible to transfer files using rsync, as described here.
You can also check similar threads: thread1 , thread2 & thread3

GSUTIL CP using file size

I am trying to copy files from a directory on my Google Compute Instance to Google Cloud Storage Bucket. I have it working, however there are ~35k files but only ~5k have an data in them.
Is there anyway to only copy files above a certain size?
I've not tried this but...
You should be able to do this using a resumable transfer and setting the threshold to 5k (defaults to 8Mib). See: https://cloud.google.com/storage/docs/gsutil/commands/cp#resumable-transfers
May be advisable to set BOTO_CONFIG specifically for this copy (a) to be intentional; (b) to remind yourself how it works. See: https://cloud.google.com/storage/docs/boto-gsutil
Resumable uploads has the added benefit, of course, of resuming if there are any failures.
Recommend: try this on a small subset and confirm it works to your satisfaction.
While it's not possible to do it only with gsutil, it's possible to do it by parsing the names and use the -I flag on the cp command to process them. If you're using a Linux Compute Engine instance you can perform it by using the du and awk commands:
du * | awk '{if ($1 > 1000) print $2 }' | gsutil -m cp -I gs://bucket2
The command will get the filesize of the files inside the current directory on your compute engine with du * and will only copy the files which size are larger than 1000 bytes to bucket2, you can change that value to adjust it to your needs.

AWS S3, Deleting files from local directory after upload

I have backup files in different directories in one drive. Files in those directories can be quite big up to 800GB or so. So I have a batch file with a set of scripts which upload/syncs files to S3.
See example below:
aws s3 sync R:\DB_Backups3\System s3://usa-daily/System/ --exclude "*" --include "*/*/Diff/*"
The upload time can vary but so far so good.
My question is, how do I edit the script or create a new one which checks in the s3 bucket that the files have been uploaded and ONLY if they have been uploaded then deleted them from the local drive, if not leave them on the drive?
(Ideally it would check each file)
I'm not familiar with aws s3, or aws cli command that can do that? Please let me know if I made myself clear or if you need more details.
Any help will be very appreciated.
Best would be to use mv with --recursive parameter for multiple files
When passed with the parameter --recursive, the following mv command recursively moves all files under a specified directory to a specified bucket and prefix while excluding some files by using an --exclude parameter. In this example, the directory myDir has the files test1.txt and test2.jpg:
aws s3 mv myDir s3://mybucket/ --recursive --exclude "*.jpg"
Output:
move: myDir/test1.txt to s3://mybucket2/test1.txt
Hope this helps.
As the answer by #ketan shows, Amazon aws client cannot do batch move.
You can use WinSCP put -delete command instead:
winscp.com /log=S3.log /ini=nul /command ^
"open s3://S3KEY:S3SECRET#s3.amazonaws.com/" ^
"put -delete C:\local\path\* /bucket/" ^
"exit"
You need to URL-encode special characters in the credentials. WinSCP GUI can generate an S3 script template, like the one above, for you.
Alternatively, since WinSCP 5.19, you can use -username and -password switches, which do not need any encoding:
"open s3://s3.amazonaws.com/ -username=S3KEY -password=S3SECRET" ^
(I'm the author of WinSCP)

using wget to overwrite file but use temporary filename until full file is received, then rename

I'm using wget in a cron job to fetch a .jpg file into a web server folder once per minute (with same filename each time, overwriting). This folder is "live" in that the web server also serves that image from there. However if someone web-browses to that page during the time the image is being fetched, it is considered a jpg with errors and says so in the browser. So what I need to do is, similar to when Firefox is downloading a file, wget should write to a temporary file, either in /var or in the destination folder but with a temporary name, until it has the whole thing, then rename in an atomic (or at least negligible-duration) step.
I've read the wget man page and there doesn't seem to be a command line option for this. Have I missed it? Or do I need to do two commands in my cron job, a wget and a move?
There is no way to do this purely with GNU Wget.
wget's job is to download files and it does that. A simple one line script can achieve what you're looking for:
$ wget -O myfile.jpg.tmp example.com/myfile.jpg && mv myfile.jpg{.tmp,}
Since mv is atomic, atleast on Linux, you get the atomic update of a ready file.
Just wanted to share my solution:
alias wget='func(){ (wget --tries=0 --retry-connrefused --timeout=30 -O download_pkg.tmp "$1" && mv download_pkg.tmp "${1##*/}") || rm download_pkg.tmp; unset -f func; }; func
it creates a function that receives a parameter "url" to download the file to a temporary name. If it is successful, it is renamed to the correct filename extracted from parameter $1 with ${1##*/}. and if it fails, deletes the temp file. If the operation is aborted, the temp file will be replace on the next run. after all, unset -f removes the function definition as the alias is executed.

Google Cloud Storage upload files modified today

I am trying to figure out if I can use the cp command of gsutil on the Windows platform to upload files to Google Cloud Storage. I have 6 folders on my local computer that get daily new pdf documents added to them. Each folder contains around 2,500 files. All files are currently on google storage in their respective folders. Right now I mainly upload all the new files using Google Cloud Storage Manager. Is there a way to create a batch file and schedule to run it automatically every night so it grabs only files that have been scanned today and uploads it to Google Storage?
I tried this format:
python c:\gsutil\gsutil cp "E:\PIECE POs\64954.pdf" "gs://dompro/piece pos"
and it uploaded the file perfectly fine.
This command
python c:\gsutil\gsutil cp "E:\PIECE POs\*.pdf" "gs://dompro/piece pos"
will upload all of the files into a bucket. But how do I only grab files that were changed or generated today? Is there a way to do it?
One solution would be to use the -n parameter on the gsutil cp command:
python c:\gsutil\gsutil cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
That will skip any objects that already exist on the server. You may also want to look at using gsutil's -m flag and see if that speeds the process up for you:
python c:\gsutil\gsutil -m cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
Since you have Python available to you, you could write a small Python script to find the ctime (creation time) or mtime (modification time) of each file in a directory, see if that date is today, and upload it if so. You can see an example in this question which could be adapted as follows:
import datetime
import os
local_path_to_storage_bucket = [
('<local-path-1>', 'gs://bucket1'),
('<local-path-2>', 'gs://bucket2'),
# ... add more here as needed
]
today = datetime.date.today()
for local_path, storage_bucket in local_path_to_storage_bucket:
for filename in os.listdir(local_path):
ctime = datetime.date.fromtimestamp(os.path.getctime(filename))
mtime = datetime.date.fromtimestamp(os.path.getmtime(filename))
if today in (ctime, mtime):
# Using the 'subprocess' library would be better, but this is
# simpler to illustrate the example.
os.system('gsutil cp "%s" "%s"' % (filename, storage_bucket))
Alternatively, consider using Google Cloud Store Python API directly instead of shelling out to gsutil.