How can I set a temp directory for uncompleted downloads in Wget? - command-line

I'm trying to mirror files on FTP server.
Those files can be very large so downloads might be interrupted.
I'd like to keep the original files while downloading partial files to a temporary folder and once completed override local older versions.
Can I do this? how?
Is there another easy to use (command line) tool that I can use?

First, download the files to a temp directory. Use -c so you can resume.
After the download has completed, use copy, rename or rsync to copy the files to the final place.
Note: Consider using rsync for the whole process because it was designed for just this use case and it will cause much less strain on the server and the Internet. Most site admins are happy if you ask them for an rsync access just for this reason.

Looking at the wget manual I can't see this functionality, however you could write a bash script to do what you want, which would essentially run an individual wget for each file then move it using normal mv command.
Alternativly have a look at rsync according to the manual there is a paramater that sets a temp dir
-T --temp-dir=DIR create temporary files in directory DIR
I am not 100% sure wheather this is where it puts the files during downloads as not had chance to try it out.

Related

How to use rsync to synchronise a folder that is constantly updating with images/video files on a server

I have used rsync to move files across previously however I want to know the best solution for moving across files from a directory that has new files added to it regularly and have this sync on the other remote server what is the best approach to do this?
You could use cron to update your folder constantly, for example like in this case: Using crontab to execute script every minute and another every 24 hours

How to delete all GitLab files above 100 MB

I am trying to import a Gitlab project to Github and have had it fail due to not being able to import files above 100 MB (this is a Github rule). I deleted the first one that caused a problem, but then upon restarting another file was too large. Is there any way to automate deleting all Gitlab projects above this threshold? Alternatively is there any way to look at the file size of all files in a list so I know which ones to manually delete?
Did you stop tracking the file in git in addition to deleting it in your working directory? (i.e. using the command git rm --cached yourlargefile)? If not, the file is still being tracked, so it is still causing the import into Github to fail.
To answer your second question, calling ls -l from a terminal will list all the files in the working directory along with their file size in bytes, ls -lh will do the same, but with more human-readable file sizes (e.g. KB or MB, as applicable). If you need to scan your whole project, ls -lhR will recursively list all files in the entire directory tree.

Automatically copying new files to another folder (CentOS 6.3)?

Is there a command I can enter via SSH (CentOS 6.3) that will monitor a certain directory, and if any new files/folders are created in in, copy those files to another folder at all?
I have looked at various sync programes, and rather than mirror the folder I need to keep a copy of any new files/folders even if they are deleted from the original directory.
I am hoping the cp command can be used somehow, but I couldn't work out how to do it myself.
Thanks for any help, and please let me know if you need further information to help or there is a better way to achieve my needs.
Rsync would do it, it doesn't have to delete files from the destination if they are deleted from the source.
Do you need it to run periodically or monitor constantly for changes? If the latter you might want to look into something using inotify or FAM.

How to force a directory to stay in exact sync with subversion server

I have a directory structure containing a bunch of config files for an application. The structure is maintained in Subversion, and then a few systems have that directory struture checked out. Developers make changes to the struture in the repository, and a script on the servers just runs an "svn update" periodically.
However, sometimes we have people who will inadvertently remove a .svn directory under one of the directories, or stick a file in that doesn't belong. I do what I can to cut off the hands of the procedural unfaithful, but I'd still prefer for my update script to be able to gracefully (well, automatically) handle these changes.
So, what I need is a way to delete files which are not in subversion, and a way to go ahead and stomp on a local directory which is in the way of something in the repository. So, warnings like
Fetching external item into '/path/to/a/dir'
svn: warning: '/path/to/a/dir' is not a working copy
and
Fetching external item into '/path/to/another/dir'
svn: warning: Failed to add directory '/path/to/another/dir': an unversioned directory of the same name already exists
should be automatically resolved.
I'm concerned that I'll have to either parse the svn status output in a script, or use the svn C API and write my own "cleanup" program to make this work (and yes, it has to work this way; rsync / tar+scp, and whatever else aren't options for a variety of reasons). But if anyone has a solution (or partial solution) which takes care of the issue, I'd appreciate hearing about it. :)
How about
rm -rf $project
svn checkout svn+ssh://server/usr/local/svn/repos/$project
I wrote a perl script to first run svn cleanup to handle any locks, and then parse the --xml output of svn status, removing anything which has a bad status (except for externals, which are a little more complicated)
Then I found this:
http://svn.apache.org/repos/asf/subversion/trunk/contrib/client-side/svn-clean
Even though this doesn't do everything I want, I'll probably discard the bulk of my code and just enhance this a little. My XML parsing is not as pretty as it could be, and I'm sure this is somewhat faster than launching a system command (which matters on a very large repository and a command which is run every five minutes).
I ultimately found that script in the answer to this question - Automatically remove Subversion unversioned files - hidden among all the suggestions to use Tortoise SVN.

Jenkins: FTP / SSH deployment, including deletion and moving of files

I was wondering how to get my web-projects deployed using ftp and/or ssh.
We currently have a self-made deployment system which is able to handle this, but I want to switch to Jenkins.
I know there are publishing plugins and they work well when it comes to uploading build artifacts. But they can't delete or move files.
Do you have any hints, tipps or ideas regarding my problem?
The Publish Over SSH plugin enables you to send commands using ssh to the remote server. This works very well, we also perform some moving/deleting files before deploying the new version, and had no problems whatsoever using this approach.
The easiest way to handle deleting and moving items is by deleting everything on the server before you deploy a new release using one of the 'Publish over' extensions. I'd say that really is the only way to know the deployed version is the one you want. If you want more versioning-system style behavior you either need to use a versioning system or maybe rsync that will cover part of it.
If your demands are very specific you could develop your own convention to mark deletions and have them be performed by a separate script (like you would for database changes using Liquibase or something like that).
By the way: I would recommend not automatically updating your live sites after every build using the 'publish over ...' extension. In case we really want to have a live site automatically updated we rely on the Promoted Builds Plugin to keep it nearly fully-automated but add a little safety.
I came up with a simple solution to remove deleted files and upload changes to a remote FTP server as a build action in Jenkins using a simple lftp mirror script. Lftp Manual Page
In Short, you create a config file in your jenkins user directory ~/.netrc and populate it with your FTP credentials.
machine ftp.remote-host.com
login mySuperSweetUsername
password mySuperSweetPassword
Create an lftp script deploy.lftp and drop it in the root of your .git repo
set ftp:list-options -a
set cmd:fail-exit true
open ftp.remote-host.com
mirror --reverse --verbose --delete --exclude .git/ --exclude deploy.lftp --ignore-time --recursion=always
Then add an "Exec Shell" build action to execute lftp on the script.
lftp -f deploy.lftp
The lftp script will
mirror: copy all changed files
reverse: push local files to a remote host. a regular mirror pulls from remote host to local.
verbose: dump all the notes about what files were copied where to the build log
delete: remove remote files no longer present in the git repo
exclude: don't publish .git directory or the deploy.lftp script.
ignore-time: won't publish based on file creation time. If you don't have this, in my case, all files got published since a fresh clone of the git repo updated the file create timestamps. It still works quite well though and even files modified by adding a single space in them were identified as different and uploaded.
recursion: will analyze every file rather than depending on folders to determine if any files in them were possibly modified. This isn't technically necessary since we're ignoring time stamps but I have it in here anyway.
I wrote an article explaining how I keep FTP in sync with Git for a WordPress site I could only access via FTP. The article explains how to sync from FTP to Git then how to use Jenkins to build and deploy back to FTP. This approach isn't perfect but it works. It only uploads changed files and it deletes files off the host that have been removed from the git repo (and vice versa)