Is there any way to remove file extensions when copying files with gsutil?
From local 0001:
0001/a/1.jpg
0001/b/2.png
To bucket 0002:
gs://0002/a/1
gs://0002/b/2
(I can remove the extensions locally but I will be losing the Content-Type when copying to GS)
gsutil doesn't have any mechanism for rewriting the file name in this way. You could write a shell loop that iterates over the files and removes the extensions in the file names being copied.
To preserve the Content-Type here are a couple of suggestions:
Set it explicitly on the command line, e.g.,
gsutil -h Content-Type:image/jpeg cp 0001/a/1.jpg gs://0001/a/1
Use the use_magicfile configuration (in the .boto config file), to cause the Content-Type to be detected by the "file" command. This only works if you're running on Unix or MacOS. In this case you'd still use the shell script to remove the filename extensions, but you wouldn't have to specify the -h Content-Type arg:
gsutil cp 0001/a/1.jpg gs://0001/a/1
Mike
Related
I have backup files in different directories in one drive. Files in those directories can be quite big up to 800GB or so. So I have a batch file with a set of scripts which upload/syncs files to S3.
See example below:
aws s3 sync R:\DB_Backups3\System s3://usa-daily/System/ --exclude "*" --include "*/*/Diff/*"
The upload time can vary but so far so good.
My question is, how do I edit the script or create a new one which checks in the s3 bucket that the files have been uploaded and ONLY if they have been uploaded then deleted them from the local drive, if not leave them on the drive?
(Ideally it would check each file)
I'm not familiar with aws s3, or aws cli command that can do that? Please let me know if I made myself clear or if you need more details.
Any help will be very appreciated.
Best would be to use mv with --recursive parameter for multiple files
When passed with the parameter --recursive, the following mv command recursively moves all files under a specified directory to a specified bucket and prefix while excluding some files by using an --exclude parameter. In this example, the directory myDir has the files test1.txt and test2.jpg:
aws s3 mv myDir s3://mybucket/ --recursive --exclude "*.jpg"
Output:
move: myDir/test1.txt to s3://mybucket2/test1.txt
Hope this helps.
As the answer by #ketan shows, Amazon aws client cannot do batch move.
You can use WinSCP put -delete command instead:
winscp.com /log=S3.log /ini=nul /command ^
"open s3://S3KEY:S3SECRET#s3.amazonaws.com/" ^
"put -delete C:\local\path\* /bucket/" ^
"exit"
You need to URL-encode special characters in the credentials. WinSCP GUI can generate an S3 script template, like the one above, for you.
Alternatively, since WinSCP 5.19, you can use -username and -password switches, which do not need any encoding:
"open s3://s3.amazonaws.com/ -username=S3KEY -password=S3SECRET" ^
(I'm the author of WinSCP)
I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.
This should work on my CentOS 6.6 but somehow the file name is not changed. What am I missing here?
rename -f 's/silly//' sillytest.zi
This should rename sillytest.zi to test.zi but the name is not changed. Of course I can use mv command but I want to apply to many files and patterns.
There are two different rename utilities commonly used on GNU/Linux systems.
util-linux version
On Red Hat-based systems (such as CentOS), rename is a compiled executable provided by the util-linux package. It’s a simple program with very simple usage (from the relevant man page):
rename from to file...
rename will rename the specified files by replacing the first occurrence of from in their name by to.
Newer versions also support a useful -v, --verbose option.
NB: If a file already exists whose name coincides with the new name of the file being renamed, then this rename command will silently (without warning) over-write the pre-existing file.
Example
Fix the extension of HTML files so that all .htm files have a four-letter .html suffix:
rename .htm .html *.htm
Example from question
To rename sillytest.zi to test.zi, replace silly with an empty string:
rename silly '' sillytest.zi
Perl version
On Debian-based systems ,rename is a Perl script which is much more capable
as you get the benefit of Perl’s rich set of regular expressions.
Its usage is (from its man page):
rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]
rename renames the filenames supplied according to the rule specified as the first argument.
This rename command also includes a -v, --verbose option. Equally useful is its -n, --no-act which can be used as a dry-run to see which files would be renamed. Also, it won’t over-write pre-existing files unless the -f, --force option is used.
Example
Fix the extension of HTML files:
rename s/\.htm$/.html/ *.htm
I want to copy a particular file using Makefile and then make this file executable. How can this be done?
The file I want to copy is a .pl file.
For copying I am using the general cp -rp command. This is done successfully. But now I want to make this file executable using Makefile
Its a bad practice to use cp and chmod, instead use install command.
all:
install -m 0777 hello ../hello
You can use -m option with install to set the permission mode, and even note that by using the install you will preserve not only the permission but also the owner of the file.
You can still use chmod accordingly but it would be a bad practice
all:
cp hello ../hello
chmod +x ../hello
Update: install vs cp
cp would simply copy files with current permissions, install not only copies, but also can change perms/ownership as arg flags. (This is what your requirement was)
One significant difference is that cp truncates the destination file and starts copying data from the source into the destination file. install, on the other hand, removes the destination file first.
This is significant because if the destination file is already in use, bad things could happen to whomever is using that file in case you cp a new file on top of it. e.g. overwriting an executable that is running might fail. Truncating a data file that an existing process is busy reading/writing to could cause pretty weird behavior. If you just remove the destination file first, as install does, things continue much like normal - the removed file isn't actually removed until all processes close that file.[source]
For more details check these,
install vs. cp; and mmap
How is install -c different from cp
I have created an AppleScript that mounts a network smb share, creates folders if they don't exist then copy files to these new folders.
I am using:
duplicate items of folder <source> to <destination> with replacing
This will copy over and replace all the files. Is there a way to only duplicate newer files?
Should I be using rsync rather than duplicate?
I'd definitely use rsync, with possibly the -a flag (archive option, it will work recursively along with other mirroring options, check the man page for better options for you)
rsync -a (source) (destination)
Call from applescript using the do shell script command, making sure you pass in posix paths.
eg,
set source_path to quoted form of POSIX path to source
set dest_path to quoted form of POSIX path to destination
do shell script "rsync -a " & source_path & " " & dest_path