updating data from different URL using wget

updating data from different URL using wget - wget

What's the best way of updating data files from a website that has moved on to a new domain, with changes in their folder structure.
The old URL for example is http://folder.old-domain.com while the new URL is http://new-domain.com/directory1/directory2. My data is stored locally in ~/Data_Backup/folder.old-domain.com folder.
Data was originally downloaded using:
$ wget -S -t 0 -c --mirror –w 2 –k http://folder.old-domain.com
I was thinking of using mv to rename the old folder to follow the new URL pattern, but is there a better way of doing this?
Will this work? I'm not particular with the directory structure. What's important is to update the contents of the target folder (and its sub-folders.)
$ wget -S -t 0 -c -m –w 2 –k -N -np -P ~/Data_Backup/folder.old-domain.com http://new-domain.com/directory/directory
Thanks in advance.

Got it!
I need to add the following options:
-nH --cut-dirs=2
and now it works.

Related

Follow only certain links with Wget but download every host from those links

So, let's say I want to mirror a site with Wget. I want wget to follow and download all the links from http://www.example.com/example/ or http://example.example.com/. How can I do this? I tried this command but it doesn't seem to be working the way I want it to work.
wget -r --mirror -I '/example' -H -D'example.example.com' 'https://www.example.com/'

You want to start with 'https://www.example.com/', want to save files from 'http://www.example.com/example/' and 'http://example.example.com/` ?
Then leave away -H. -I is ambiguous here - does it apply to both domains or just the first ? And btw, -r is included in --mirror.
Check out --accept-regex and --reject-regex for a finer-grained control, e.g. --accept-regex="(www.example.com/example/|example.example.com/)".

How do I wget a page from archive.org without the directory?

I'm trying to download a webpage from archive.org (ie http://wayback.archive.org/web/20110410223952id_/http://www.goldalert.com/gold-price-hovers-at-1460-as-ecb-hikes-rates-2/ ) with wget. I want to download it in /00001/index.html. How would I go about doing this?
I tried wget -p -k http://wayback.archive.org/web/20110410223952id_/http://www.goldalert.com/gold-price-hovers-at-1460-as-ecb-hikes-rates-2/ -O 00001/index.html but that didn't work. I than cd into the directory and removed the 00001 from the O flag. It didn't work either. I than just removed the -O flag. It worked but I get the whole archive.org directory (ie wayback.archive.org new directory web new diretory etc...) and the filename's not changed :(
What do I do?
Sorry for the obviously noob question.

wget http://wayback.archive.org/web/20110410223952id_/http://www.goldalert.com/gold-price-hovers-at-1460-as-ecb-hikes-rates-2/ -O 00001/index.html
Solve my own question. So simple.

wget corrupted file .zip file error

I am using wget to try and download two .zip files (SWVF_1_44.zip and SWVF_44_88.zip) from this site: http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE
when I run:
wget -r -l1 -H -t1 -nd -N -np -A.zip -erobots=off "http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_TYPE:STATE/SWVF_1_44.zip"
I get a downloaded zip file that has a screwed up name (f#p=111%3A1%3A0%3A%3ANO%3ARP%3AP1_TYPE%3ASTATE%2FSWVF_1_44) and it cannot be opened.
Any thoughts on where my code is wrong?

There's nothing "wrong" with your code. Wget is simply assuming you want to save the file in the same name that appears in the url. Use the -O option to specify an output file:
wget blahblahblah -O useablefilename.zip

how to use wget on a site with many folders and subfolders

I try to download this site, with this code:
wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off tenshi.spb.ru/anime-ost/
But I only get the index and enter inside the first folder, not the subfolder, help me?

I use this command to download sites including their subfolders:
wget --mirror -p --convert-links -P . [site address]
A little explanation:
--mirror is a shortcut for -N -r -l inf --no-remove-listing.
--convert-links makes links in downloaded HTML or CSS point to local files
-p allows you to get all images, etc. needed to display HTML pages
-P specifies the next argument is the directory the files will be saved to
I found the command at:
http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/

You use -l 1 also known as --level=1 which limits recursion to one level. Set that to a higher level to download more pages. BTW, I like long options like --level because its easier to see what you are doing without going back to man pages.

How do I move/copy a symlink to a different folder as a symlink under Solaris?

It is an odd behaviour seen only on Solaris that when I try to copy a symbolic link with the "cp -R -P" command to some other folder with a different name, it copies the entire directory/file it's pointing to.
For example:
link -> dir
cp -R -P link folder/new_link

I believe the "-d" argument is what you need.
As per the cp man page:
-d same as --no-dereference --preserve=link
Example:
cp -d -R -P link folder/new_link
I was using "cp -d" and that worked for me.

The cp man page seems to say that you want to use an '-H' to preserve symlinks within the source directory.

You might consider copying via tar, like tar -cf - srcdir|(cd somedir;tar -xf -)

Try using cpio (with the -p (pass) option) or the old tar in a pipe trick.

Categories

HOME

axios

operator-keyword

spring-authorization-s...

google-analytics-4

rgdal

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

updating data from different URL using wget - wget

Got it! I need to add the following options: -nH --cut-dirs=2 and now it works.

Related

Follow only certain links with Wget but download every host from those links

How do I wget a page from archive.org without the directory?

wget corrupted file .zip file error

how to use wget on a site with many folders and subfolders

How do I move/copy a symlink to a different folder as a symlink under Solaris?

Categories

Resources