Copying a file with gsutil while appending a Datetime - google-cloud-storage

I need to copy a gs file named "myfile.csv" to another file "myfile_[datetime].csv" where datetime is the date and time the operation occurred.
I was wondering how to do it with the gsutil.

As mentioned from the comment above:
gsutil cannot do that for you. Depending on the operating system, you can use command-line argument expansion/replacement, shell scripts and/or command-line tools to generate the desired filename format.
Posting OP's solution as an answer:
#!/bin/bash
dt=$(date '+%Y-%m-%d-%H-%M-%S')
gsutil cp gs://myfolder/myfile.csv gs://myfolder/myfile2_$dt.csv

Related

Can we wget with file list and renaming destination files?

I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.

Keep original documents' dates with PSFTP

I have downloaded some files with PSFTP from a SQL Server. The problem is that PSFTP changes the dates of creation/update and last modified of the files when downloading them in a local folder. For me it is important to keep the original dates. Is there any command to set/change it? Thanks
This is the script of the batch file
psftp.exe user#host -i xxx.ppk -b abc.scr
This is the scriptof the SCR file
cd /path remote folder
lcd path local folder
mget *.csv
exit
I'm not familiar with PSFTP and after looking at the docs I don't see any option to do this. However, you can use the -p flag of pscp to preserve dates and times.
See docs here.
(note it's a lower-case p, the other case is for specifying the port)

Google Cloud Storage upload files modified today

I am trying to figure out if I can use the cp command of gsutil on the Windows platform to upload files to Google Cloud Storage. I have 6 folders on my local computer that get daily new pdf documents added to them. Each folder contains around 2,500 files. All files are currently on google storage in their respective folders. Right now I mainly upload all the new files using Google Cloud Storage Manager. Is there a way to create a batch file and schedule to run it automatically every night so it grabs only files that have been scanned today and uploads it to Google Storage?
I tried this format:
python c:\gsutil\gsutil cp "E:\PIECE POs\64954.pdf" "gs://dompro/piece pos"
and it uploaded the file perfectly fine.
This command
python c:\gsutil\gsutil cp "E:\PIECE POs\*.pdf" "gs://dompro/piece pos"
will upload all of the files into a bucket. But how do I only grab files that were changed or generated today? Is there a way to do it?
One solution would be to use the -n parameter on the gsutil cp command:
python c:\gsutil\gsutil cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
That will skip any objects that already exist on the server. You may also want to look at using gsutil's -m flag and see if that speeds the process up for you:
python c:\gsutil\gsutil -m cp -n "E:\PIECE POs\*" "gs://dompro/piece pos/"
Since you have Python available to you, you could write a small Python script to find the ctime (creation time) or mtime (modification time) of each file in a directory, see if that date is today, and upload it if so. You can see an example in this question which could be adapted as follows:
import datetime
import os
local_path_to_storage_bucket = [
('<local-path-1>', 'gs://bucket1'),
('<local-path-2>', 'gs://bucket2'),
# ... add more here as needed
]
today = datetime.date.today()
for local_path, storage_bucket in local_path_to_storage_bucket:
for filename in os.listdir(local_path):
ctime = datetime.date.fromtimestamp(os.path.getctime(filename))
mtime = datetime.date.fromtimestamp(os.path.getmtime(filename))
if today in (ctime, mtime):
# Using the 'subprocess' library would be better, but this is
# simpler to illustrate the example.
os.system('gsutil cp "%s" "%s"' % (filename, storage_bucket))
Alternatively, consider using Google Cloud Store Python API directly instead of shelling out to gsutil.

Linux zip command - adding date elements to file name

occasionally I run a backup of my phpbb forum files from the Shell command line:
zip -r forum_backup ~/public_html/forum/*
I'd like to add date elements to the file name, so that the zip file created is automatically formed as
forum_backup_05182013.zip
any other similar current date format would also be acceptable
now=$(date +"%m%d%Y")
zip -r forum_backup_$now ~/public_html/forum/
Without defining a variable first you can do it in one line with
zip -r "forum_backup_$(date +"%Y-%m-%d").zip" filelist
As taken from here
the following shell command, change the format as you want
FORMAT="%Y%m%d"
_DATE=$(date +"$FORMAT" )
zip -r "forum_bakcup_${_DATE}" ~/public_html/forum/*

Remove file extensions with gsutil

Is there any way to remove file extensions when copying files with gsutil?
From local 0001:
0001/a/1.jpg
0001/b/2.png
To bucket 0002:
gs://0002/a/1
gs://0002/b/2
(I can remove the extensions locally but I will be losing the Content-Type when copying to GS)
gsutil doesn't have any mechanism for rewriting the file name in this way. You could write a shell loop that iterates over the files and removes the extensions in the file names being copied.
To preserve the Content-Type here are a couple of suggestions:
Set it explicitly on the command line, e.g.,
gsutil -h Content-Type:image/jpeg cp 0001/a/1.jpg gs://0001/a/1
Use the use_magicfile configuration (in the .boto config file), to cause the Content-Type to be detected by the "file" command. This only works if you're running on Unix or MacOS. In this case you'd still use the shell script to remove the filename extensions, but you wouldn't have to specify the -h Content-Type arg:
gsutil cp 0001/a/1.jpg gs://0001/a/1
Mike