I have created an AppleScript that mounts a network smb share, creates folders if they don't exist then copy files to these new folders.
I am using:
duplicate items of folder <source> to <destination> with replacing
This will copy over and replace all the files. Is there a way to only duplicate newer files?
Should I be using rsync rather than duplicate?
I'd definitely use rsync, with possibly the -a flag (archive option, it will work recursively along with other mirroring options, check the man page for better options for you)
rsync -a (source) (destination)
Call from applescript using the do shell script command, making sure you pass in posix paths.
eg,
set source_path to quoted form of POSIX path to source
set dest_path to quoted form of POSIX path to destination
do shell script "rsync -a " & source_path & " " & dest_path
Related
Some IDEs support a feature usually called "master filelist", that the user provides a simple text file containing all files for a project, thus the IDE only parses the listed files.
Is it possible with vscode workspace? Note that I am aware of the "Exclude" feature of vscode, but it is not convenient for my use case.
Thanks.
After trying many methods (all in vain), I came up with the following workaround: make symlinks to all files in the master filelist.
Suppose that the files in the filelist (${ABS_INCLUDE}) are with absolute pathes, and suppose they share a root directory (which can be always true), then first create a dedicated root directory (SYM_ROOT_DIR) for vscode workspace, and then create symlinks for each files under the new root directory, e.g.,
mkdir -p ${SYM_ROOT_DIR}
while IFS= read -r line
do
OLD_DIR=$(dirname "$line")
BASENAME=$(basename "$line")
SYM_DIR=$(echo "${OLD_DIR}" | sed "s#${ABS_ROOT_DIR}#${SYM_ROOT_DIR}#")
mkdir -p ${SYM_DIR}
ln -s ${line} ${SYM_DIR}/${BASENAME}
done < ${ABS_INCLUDE}
I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.
I have downloaded some files with PSFTP from a SQL Server. The problem is that PSFTP changes the dates of creation/update and last modified of the files when downloading them in a local folder. For me it is important to keep the original dates. Is there any command to set/change it? Thanks
This is the script of the batch file
psftp.exe user#host -i xxx.ppk -b abc.scr
This is the scriptof the SCR file
cd /path remote folder
lcd path local folder
mget *.csv
exit
I'm not familiar with PSFTP and after looking at the docs I don't see any option to do this. However, you can use the -p flag of pscp to preserve dates and times.
See docs here.
(note it's a lower-case p, the other case is for specifying the port)
My company has a local production server I want to download files from that have a certain naming convention. However, I would like to exclude certain elements based on a portion of the name. Example:
folder client_1234
file 1234.jpg
file 1234.ai
file 1234.xml
folder client_1234569
When wget is ran I want it to bypass all folders and files with "1234". I have researched and ran across ‘--exclude list’ but that appears to be only for directories and ‘reject = rejlist’ which appears to be for file extensions. Am I missing something in the manual here
EDIT:
this should work.
wget has options -A <accept_list> and -R <reject_list>, which from the manual page, appear to allow either suffixes or patterns. These are separate from the -I <include_dirs> and -X <exclude_dirs> options, which, as you note, only deal with directories. Given the example you list, something along the lines of -A "folder client_1234*" -A "file 1234.*" might be what you need, although I'm not entirely sure that's exactly the naming convention you're after...
Is there any way to remove file extensions when copying files with gsutil?
From local 0001:
0001/a/1.jpg
0001/b/2.png
To bucket 0002:
gs://0002/a/1
gs://0002/b/2
(I can remove the extensions locally but I will be losing the Content-Type when copying to GS)
gsutil doesn't have any mechanism for rewriting the file name in this way. You could write a shell loop that iterates over the files and removes the extensions in the file names being copied.
To preserve the Content-Type here are a couple of suggestions:
Set it explicitly on the command line, e.g.,
gsutil -h Content-Type:image/jpeg cp 0001/a/1.jpg gs://0001/a/1
Use the use_magicfile configuration (in the .boto config file), to cause the Content-Type to be detected by the "file" command. This only works if you're running on Unix or MacOS. In this case you'd still use the shell script to remove the filename extensions, but you wouldn't have to specify the -h Content-Type arg:
gsutil cp 0001/a/1.jpg gs://0001/a/1
Mike