How to make wget *not* overwrite/ignore files? - wget

I have a list of 400 websites from where I'm downloading PDF's. 100 of these websites share the same pdf name: templates.pdf
When running wget, it either ignores the pdfs that have name "templates" or overwrites them. I was searching for a command that would make a new templates2.pdf for 2 hours, but I couldn't find anything.

The default behavior of wget is to use the .1, .2 prefixes when a file is downloaded multiple times into the same target directory. This appears to be what you are asking for. (The poorly named -nc option causes subsequent downloads of files with the same name to be ignored, which you don't want).
If the default behavior is not what you want, the -O option looks promising, as it allows you to _choose) the output file name. Here is a brief article explaining its use.
Of course, if you go the -O route, you'd need to ensure the output file does not exist, and do the suffix incrementing on your own.

Related

How do I configure mplayer to use a default edl file name?

I want to configure mplayer to look for an edl when playing a video. Specifically, I want it to use "show.edl" when playing "show.mp4", assuming both are in the same directory. Very similar to how it looks for subtitles.
I can add a default edl in the config file by adding the following:
edl=default.edl
And this will look for the file "default.edl" IN THE CURRENT DIRECTORY, rather than in the directory where the media file is. And it isn't named after the media file either, and thus even if it did look in the right place, I'd have one single edl file for every media file in that directory.
Not really what I wanted.
So, is there a way, in the "~/.mplayer/config" file, to specify the edl relative to the input file name?
Mplayer's config file format doesn't seem to support any sort of replacement syntax. So there's no way to do this?
MPlayer does not have a native method to specify strings in the config file relative to the input file name. So there's no native way to deal with this.
There's a variety of approaches you could use to get around that. Writing a wrapper around mplayer to parse out the input file and add an "-edl=" parameter is fairly general, but will fail on playlists, and I'm sure lots of other edge cases. The most general solution would of course be to add the functionality to mplayer's config parser (m_parse.c, iirc.)
The simplest, though, is to (ab)use media-specific configuration files.
pros:
Doesn't require recompiling mplayer!
Well defined and limited failure modes. I.E. the ways it fails and when it fails are easily understood, and there aren't hidden "oops, didn't expect that" behaviors hidden anywhere.
Construction and updating of the edl files is easily automated.
cons:
Fails if you move the media around, as the config files need to full path to the edl file to function correctly.
Requires you have a ".conf" file as well as an EDL file, which adds clutter to the file system.
Malicious config files in the media directory may be a security issue. (Though if you're allowing general upload of media files, you probably have bigger problems. mplayer is not at all a security-hardened codebase, nor generally are the codecs it uses.)
To make this work:
Add "use-filedir-conf=yes" to "/etc/mplayer.conf" or "~/.mplayer/config". This is required, as looking in the media directory for config files is turned off by default,
For each file "clip.mp4" which has an edl "clip.edl" in the same directory, create a file "clip.mp4.conf" which contains the single line "edl=/path/to/clip.edl". The complete path is required.
Enjoy!
Automatic creation and updating of the media-specific .conf files is left as an exercise for the student.

How to rename partly the downloaded file using wget?

I'd like to download many files (about 10000) from ftp-server. Names of the files are too long. I'd like to save them only with the date in names. For example: ABCDE201604120000-abcde.nc I prefer to be 20160412.nc
Is it possible?
I am not sure if wget provides similar functionality, nevertheless with curl, one can profit from the relatively rich syntax it provides in order to specify the URL of interest. For example:
curl \
"https://ftp5.gwdg.de/pub/misc/openstreetmap/SOTMEU2014/[53-54].{mp3,mp4}" \
-o "file_#1.#2"
will download files 53.mp3, 53.mp4, 54.mp3, 54.mp4. The output file is specified as file_#1.#2 - here, #1 is replaced by curl with the value of the sequence [53-54] corresponding to the file being downloaded. Similarly, #2 is replace with either mp3 or mp4. Thus, e.g., 53.mp3 will be saved as file_53.mp3.
ewcz's answer works fine if you can enumerate the file names as shown in the post. However, if the filenames are difficult to enumerate, for example, because the integers are sparsely populated, this solution would result in a lot of 404 Not Found requests.
If this is the case, then it is probably better to download all the files recursively, as you have shown, and rename them afterwards. If the file names follow a fixed pattern, you can select the substring from the original name and use it as the new name. In the given example, the new file names start at position 5 and are 8 characters long. The following bash command renames all *.nc files in the current directory.
for f in *.nc; do mv "$f" "${f:5:8}.nc" ; done
If the filenames do not follow a fix pattern and might vary in length, you can use more complex pattern substitution using sed, see SO post for an example.

How to only include some directories in ctags?

there is an --exclude option but that to exclude the directories/files. I work on a big project and want to only include the directories that has source code and not build stuff.
How to do that? What should I include in my .ctags file?
I use:
find FILES | ctags -L -
where FILES is the appropriate arguments to make find return only the files I want to index.
Exuberant Ctags (5.8) is now old and unmaintained, though. It still works for me, so I've not switched; but the last time I checked "Universal Ctags" appeared to be the way forwards, so I would suggest starting there:
https://ctags.io
https://github.com/universal-ctags/ctags
n.b. I experienced a curious bug with Exuberant Ctags 5.8 whereby find . resulted in some corrupted tag entries, but find * did not; so you might want to use the latter if using this approach. I didn't need to index any dot files at the root level, so I'm not sure offhand what happens for .* -- I don't think I tried it. Absolute paths were also fine, but then the TAGS file isn't portable. Potentially not an issue in the newer fork.

grabbing all .nc files from URL to get data using matlab

I d like to get all .nc files from URL to get and read the data using matlab. However, the filename is always very long and vary amongst all files.
For instance, I have
url = 'http://sourcename/filename.nc'
the sourcename is always the same, however the filename is very long and vary, so I would like to just use * to be able to grab whatever .nc file in the url
e.g.
url = 'http://sourcename/*.nc'
but this does not work and I am guessing I need to get the exact name - so I am not sure what to do here?
On the other hand, it could be also interesting for me to get the name of each file and record it, but not sure how to do that either.
Thanks a lot in advance!!
HTTP does not implement a filesystem abstraction. This means that each of those URLs that you request could be handled in a completely different way. There is also in many cases no way to get a list of allowable URLs off of a parent (a directory listing, in other words).
It may be the case for you that http://sourcename/ actually returns an index document containing a list of the files. In that case, first fetch that document. Then you'll have to parse the contents to extract the list of files. Then you can loop over those files, form new URLs for each one, and fetch them in sequence.
If you have a list of the file names in a text file, you can use the wget utility to process the file and fetch all the listed files. This file would be formatted as follows:
http://url.com/file1.nc
http://url.com/file2.nc
(etc)
You would then invoke wget as follows:
$ wget -i url-file.txt
Alternatively, you may be able to use wget to fetch the files recursively, if they are all located in the same directory on the web server, e.g.:
$ wget -r -l1 http://url.com/directory
The -r flag says to recurse, the -l1 flag says to go no deeper than 1 level when recursing.
This solution is external to Matlab, but once you have all of the files downloaded, you can work with them all locally.
wget is a fairly standard utility available on linux systems. It is also available for OSX and Windows as well. The wget homepage is here: https://www.gnu.org/software/wget/

Concatenate content of TAGS files from different directories

I'm referring to TAGS file generated by ctags or etags in order to have some code navigation in Emacs with M-..
The typical project looks like this:
Large standard library (more than 100 files, but rarely updated).
Project-specific library (updated on the daily basis).
I would like the project to be able to use two (or maybe more TAGS files), but regenerate only the portion of them, only the ones used inside the particular project. How would I approach this problem?
etags --help:
-i FILE, --include=FILE
Include a note in tag file indicating that, when searching for
a tag, one should also consult the tags file FILE after
checking the current file.