rsync using both --update and --append - append

I have a problem using rsync. Actually I want to synchronize 2 folders, one on the server (192.168.1.5) and one on my pc, the rsync command that I use is:
rsync -rvzru --exclude=._* xxx#192.168.1.5::Folder /Users/blabla/boh/received
It works fine, if a file has been modified on my server it modifies it also on my pc, skipping all the others. Now I want to introduce the resume feature, so I add --append like this:
rsync -rvzru --exclude=._* --append xxx#192.168.1.5::Folder /Users/blabla/boh/received
I also tried with --partial, but it doesn't work at all, it skips the uncompleted files while updates all the others if there was a modification. If I remove -U from the command the resume works fine, but if I modify a file on the server, I can't see the modification on my PC.
rsync -rvzr --exclude=._* --append xxx#192.168.1.5::Folder /Users/blabla/boh/received
There's a way to let this work both resuming and updating new files? Thanks

Related

What am I screwing up trying to download particular file types with wget?

I am attempting to regularly archive a few file types hosted on a community website where our admin has been MIA for years, in case he dies or just stops paying for the hosting.
I am able to download all of the files I need using wget -r -np -nd -e robots=off -l 0 URL but this leaves me with about 60,000 extra files to waste time both downloading and deleting.
I am really only looking for files with the extensions "tbt" and "zip". When I add in -A tbt,zip to the input, wget then only downloads a single file, "index.html.tmp". It immediately deletes this file because it doesn't match the file type specified, and then the process stops entirely, with wget announcing that it is finished. It does not attempt to download any of the other files that it grabs when the -A flag is not included.
What am I doing wrong? Why does specifying file types in the way that I did cause it to finish after only looking at one file?
Possibly you're hitting the same problem I've hit when trying to do something similar. When using --accept, wget determines whether a links refers to a file or directory based on whether or not it ends with a /.
For example, say I have a directory named files, and a web page that has:
Lots o' files!
If I were to request this with wget -r, then I wget would happily GET /files, see that it was an HTML document containing a bunch of links, and continue to download those links.
However, if I add -A zip to my command line, and run wget with --debug, I see:
appending ‘http://localhost:8080/files’ to urlpos.
[...]
Deciding whether to enqueue "http://localhost:8080/files".
http://localhost:8080/files (files) does not match acc/rej rules.
Decided NOT to load it.
In other words, wget thinks this is a file (no trailing /) and it doesn't match our acceptance criteria, so it gets rejected.
If I modify the remote file so that it looks like...
Lots o' files!
...then wget will follow the link and download files as desired.
I don't think there's a great solution to this problem if you need to use wget. As I mentioned in my comment, there are other tools available that may handle this situation more gracefully.
It's also possible you're experiencing a different issue; the output of adding --debug to your command line clarify things in that case.
I also experienced this issue, on a page where all the download links looked something like this: filedownload.ashx?name=file.mp3. The solution was to match for both the linked file, and the downloaded file. So my wget accept flag looked like this: -A 'ashx,mp3'. I also used the --trust-server-names flag. This catches all the .ashx that are linked in the webpage, then when wget does the second check, all the mp3 files that were downloaded will stay.
As an alternative to --trust-server-names, you may also find the --content-disposition flag helpful. Both flags help rename the file that gets downloaded from filedownload.ashx?name=file.mp3 to just file.mp3.

Prevent downtime using lftp mirror

I'm using lftp to deploy a website via Travis CI. There is a build process before the deployment, for that reason a build directory is present and pushed to the root of the ftp server.
lftp $FTP_URL -e "glob -d mirror build . --reverse --delete-first --parallel=10 && exit"
It works quite well, but I dislike to have a downtime / temporary PHP parse errors because of missing files on my website. What is the best way to work arround that issue?
My first approach was an option to set a temporary directory, but the lftp man page says there is only a options for temporary files. I still tried the option but it didn't help.
My second approach was to use "mirror build temp" to use a temporary folder and then replace the root with it. The problem here is, that I cannot exclude the temp folder while deleting the old files and folders like rm -rf *.
For small changes not involving adding/removing php files set xfer:use-temp-file should be sufficient. Also don't use --remove-first, as it causes lftp to delete obsolete files before uploading.
For larger changes I'd create a separate directory for each version of the site and redirect the web server to the directory using .htaccess mod_rewrite or some other configuration file. This technique will allow atomic switch to the new version (and back if needed). Besides, you will be able to do final pre-production testing of the new version if you redirect to the new version conditionally based on your IP address or using some other rule.
If you don't want to re-upload whole site for each new version and the FTP server supports FXP with itself, then you can copy old version to a new directory using mirror old_directory ftp://user#example.com/new_directory, then update the new directory using mirror -eR local_dir new_directory.
This is a zero downtown pattern - each placeholder should be replaced:
lftp $FTP_URL -e "mirror {SOURCE} {TARGET}-new-{TIMESTAMP} --reverse --delete-first;
mv {TARGET} {TARGET}-old-{TIMESTAMP};
mv {TARGET}-new-{TIMESTAMP} {TARGET};
rm -rf {TARGET}-old-{TIMESTAMP};
exit"

Can we wget with file list and renaming destination files?

I have this wget command:
sudo wget --user-agent='some-agent' --referer=http://some-referrer.html -N -r -nH --cut-dirs=x --timeout=xxx --directory-prefix=/directory/for/downloaded/files -i list-of-files-to-download.txt
-N will check if there is actually a newer file to download.
-r will turn the recursive retrieving on.
-nH will disable the generation of host-prefixed directories.
--cut-dirs=X will avoid the generation of the host's subdirectories.
--timeout=xxx will, well, timeout :)
--directory-prefix will store files in the desired directorty.
This works nice, no problem.
Now, to the issue:
Let's say my files-to-download.txt has these kind of files:
http://website/directory1/picture-same-name.jpg
http://website/directory2/picture-same-name.jpg
http://website/directory3/picture-same-name.jpg
etc...
You can see the problem: on the second download, wget will see we already have a picture-same-name.jpg, so it won't download the second or any of the following ones with the same name. I cannot mirror the directory structure because I need all the downloaded files to be in the same directory. I can't use the -O option because it clashes with --N, and I need that. I've tried to use -nd, but doesn't seem to work for me.
So, ideally, I need to be able to:
a.- wget from a list of url's the way I do now, keeping my parameters.
b.- get all files at the same directory and being able to rename each file.
Does anybody have any solution to this?
Thanks in advance.
I would suggest 2 approaches -
Use the "-nc" or the "--no-clobber" option. From the man page -
-nc
--no-clobber
If a file is downloaded more than once in the same directory, >Wget's behavior depends on a few options, including -nc. In certain >cases, the local file will be
clobbered, or overwritten, upon repeated download. In other >cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the >same file in the same directory will result in the original copy of file >being preserved and the second copy
being named file.1. If that file is downloaded yet again, the >third copy will be named file.2, and so on. (This is also the behavior >with -nd, even if -r or -p are in
effect.) When -nc is specified, this behavior is suppressed, >and Wget will refuse to download newer copies of file. Therefore, ""no->clobber"" is actually a misnomer in
this mode---it's not clobbering that's prevented (as the >numeric suffixes were already preventing clobbering), but rather the >multiple version saving that's prevented.
When running Wget with -r or -p, but without -N, -nd, or -nc, >re-downloading a file will result in the new copy simply overwriting the >old. Adding -nc will prevent this
behavior, instead causing the original version to be preserved >and any newer copies on the server to be ignored.
When running Wget with -N, with or without -r or -p, the >decision as to whether or not to download a newer copy of a file depends >on the local and remote timestamp and
size of the file. -nc may not be specified at the same time as >-N.
A combination with -O/--output-document is only accepted if the >given output file does not exist.
Note that when -nc is specified, files with the suffixes .html >or .htm will be loaded from the local disk and parsed as if they had been >retrieved from the Web.
As you can see from this man page entry, the behavior might be unpredictable/unexpected. You will need to see if it works for you.
Another approach would be to use a bash script. I am most comfortable using bash on *nix, so forgive the platform dependency. However the logic is sound, and with a bit of modifications, you can get it to work on other platforms/scripts as well.
Sample pseudocode bash script -
for i in `cat list-of-files-to-download.txt`;
do
wget <all your flags except the -i flag> $i -O /path/to/custom/directory/filename ;
done ;
You can modify the script to download each file to a temporary file, parse $i to get the filename from the URL, check if the file exists on the disk, and then take a decision to rename the temp file to the name that you want.
This offers much more control over your downloads.

How to make a file executable using Makefile

I want to copy a particular file using Makefile and then make this file executable. How can this be done?
The file I want to copy is a .pl file.
For copying I am using the general cp -rp command. This is done successfully. But now I want to make this file executable using Makefile
Its a bad practice to use cp and chmod, instead use install command.
all:
install -m 0777 hello ../hello
You can use -m option with install to set the permission mode, and even note that by using the install you will preserve not only the permission but also the owner of the file.
You can still use chmod accordingly but it would be a bad practice
all:
cp hello ../hello
chmod +x ../hello
Update: install vs cp
cp would simply copy files with current permissions, install not only copies, but also can change perms/ownership as arg flags. (This is what your requirement was)
One significant difference is that cp truncates the destination file and starts copying data from the source into the destination file. install, on the other hand, removes the destination file first.
This is significant because if the destination file is already in use, bad things could happen to whomever is using that file in case you cp a new file on top of it. e.g. overwriting an executable that is running might fail. Truncating a data file that an existing process is busy reading/writing to could cause pretty weird behavior. If you just remove the destination file first, as install does, things continue much like normal - the removed file isn't actually removed until all processes close that file.[source]
For more details check these,
install vs. cp; and mmap
How is install -c different from cp

Ctools do not show up in pentaho UI

I am using Pentaho CE 5 on windows. I would like to use CTools but I can't make them show up in the File -> New menu to use them.
Being behind a proxy, I can not use the Marketplace plugin, so I have tried a manual installation.
First, I tried to use the ctools-installer.sh. I have run the following command line in cygwin (wget and unzip are installed):
./ctools-installer.sh -s /cygdrive/d/Users/[user]/Mes\ Programmes/pentaho/biserver-ce/pentaho-solutions/ -w /cygdrive/d/Users/[user]/Mes\ programmes/pentaho/biserver-ce/tomcat/webapps/pentaho/
The script starts, asks me what module I want to install, and begins the downloads.
For each module, I get an output like (set -x added to the script) :
echo -n 'Downloading CDF...' Downloading CDF...+ wget -q --no-check-certificate 'http://ci.analytical-labs.com/job/Webdetails-CDF-5-Release/lastSuccessfulBuild/artifact/bi-platform-v2-plugin/dist/zip/dist.zip'
-O .tmp/cdf/dist.zip SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
'[' '!' -z '' ']'
rm -f .tmp/dist/marketplace.xml
unzip -o .tmp/cdf/dist.zip -d .tmp End-of-central-directory signature not found. Either this file is not a zipfile, or it
constitutes one disk of a multi-part archive. In the latter case
the central directory and zipfile comment will be found on the last
disk(s) of this archive. unzip: cannot find zipfile directory in
.tmp/cdf/dist.zip,
and cannot find .tmp/cdf/dist.zip.zip, period.
chmod -R u+rwx .tmp
echo Done Done
Then the script ends. I have seen on this page (pentaho-bi-suite) that it is the normal output. Nevertheless, it seems a bit strange to me and when I start my pentaho server (login: admin/password), I cannot see any new tools in the menus.
After a look to a few other tutorials and the script itself, I have downloaded the .zip snapshots for every tool and unzipped them in the system directory of my pentaho server. Same result.
I would like to make the .sh works, what can I try or adjust ?
Thanks
EDIT 05/06/2014
I checked the dist.zip files dowloaded by the script and they are all empty. It seems that wget cannot fetch the zip files, and therefore the installation fails.
When I try to get any webpage through wget, it fails. I think it is because of the proxy.
Here is my .wgetrc file, located in my user's cygwin home folder:
use_proxy=on
http_proxy=http://[url]:[port]
https_proxy=http://[url]:[port]
proxy_user=[user]
proxy_password=[password]
How could I make this work?
EDIT 10/06/2014
In the end, I have changed my network connection settings to bypass the proxy. It seems that there is an offline mode for the installer, so one can download all needed files on a proxy-free environment and then run the script offline.
I guess this is related with the -r option.
I consider this post solved, since it not a CTools issue anymore.
Difficult to identify the issue in the above procedure..
but you can refer this blog he is key member of pentaho itself..
In the end, I have changed my network connection settings to bypass the proxy. It seems that there is an offline mode for the installer, so one can download all needed files on a proxy-free environment and then run the script offline. I guess this is related with the -r option.
I consider this post solved, since it is not a CTools issue anymore.
You can manually install the components from http://www.webdetails.pt/ctools/ or if you have pentaho 5.1 or above, you add the following parameters to CATALINA_OPTS option (in start-pentaho.bat or start-pentaho.sh):
-Dhttp.proxyHost= -Dhttp.proxyPort= -Dhttp.nonProxyHosts="localhost|127.0.0.1|10...*"
http://docs.treasuredata.com/articles/pentaho-dataintegration#tips-how-can-i-use-pentaho-through-a-proxy