I am relatively new to the scene and am not that experienced with WGet (currently using VisualWGet, but also have cmd based WGet). I'm trying to download many (182,218 to be precise) images. I can get as far as downloading the first directory and all of the images within, then it will download only one image of each directory afterwards.
I am making sure to use a recursive search, but it seems like it does not want to enter the other directories after it exits out of the first one.
here's the process:
Downloads everything in directory 0
back tracks to parent directory
downloads first image in directory 1
downloads first image in directory 2
etc
The directory i'm trying to download from is http://66.11.126.173/images/ and each directory doesn't seem to be a link, rather an image that doesn't link to another directory.
The images are listed in directories as such
http://66.11.126.173/images/0/
http://66.11.126.173/images/1/
http://66.11.126.173/images/2/
etc
each directory has 31 variations of the same image and there are 5878 directories and I start my downloads in images/0/ otherwise it will want to download the index.html file for /images/
any help will be greatly appreciated.
Related
I'm currently mirroring www.typingstudy.com
wget --mirror --page-requisites --convert-link --no-clobber --no-parent --domains typingstudy.com https://www.typingstudy.com/
And wget creates directories, which contain html files on the site, only at the end of the scrapping and, accordingly, when it tries to download those html files, before the creation of the directories in which this files are located, wget says:powershell output
Sometimes it downloads only 1 file, like at this example, to the directory "part" and refuses to see this directory while trying downloading all other ~10 files from this exact directory, saying that this directory does not exist: enter image description here
Can someone help me understand what's wrong with my commands? Or is it a bug of wget?(Probably not)
Thanks in advance.
When I start the downloading process again - everything is neat, wget downloads all those ~10 other html files to created in the previous download session ("part") directories. So the problem is that I need to start the downloading 2 times, at least in case of this site.
And I totally do not understand why this is happening.
So, I'm making an updater for my game using wget. The latest version of the game (with all it's files), is on my server. The thing is that I want wget to download only the files that are different from a directory on the server into the www folder in the root of the game files (This has to be recursive, since not all of the game's files are stored directly in that folder). So, when the command is being ran it should check to see if the file's hashsum (if possible, otherwise it should check the size) on the server matches the one in the game's files. If it doesn't, it should download the file from the server and replace the one in the game's directory. This way, the player won't have to re-download all of the game's files.
Here's the command I used for testing:
wget.exe -N -r ftp://10.42.0.1/gamupd/arpg -O ./www
The server is running on my local network, so I use that IP address.
The thing is that it doesn't save the contents of /gamupd/arpg to the www folder, instead it seems to copy the directory tree of arpg.
Maybe the timestamping flag will satisfy you. From wget -help:
-N, --timestamping don't re-retrieve files unless newer than
local
I'm trying to automate uploading and downloading from an ftp site using cURL inside MAtlab, but I'm having difficulties. Essentially I want one computer continuously uploading new files to an ftp, yet since there is a disk quota on the ftp, I want another computer continuously downloading and removing those same files from the ftp.
Easy enough, but my problem arises from wanting to make sure that I don't download a file that is still being uploaded, thereby resulting in an incomplete file.
First off, is there a way in cURL to make it so that the file wouldn't be available for download from the ftp site until the entire file has been uploaded?
One way around this is that I could upload files to one directory, and once they are finished uploading, then I could transfer them to a "Finished" directory on the ftp site. Then the download program would only look for files inside that "Finished" directory. However, I don't know how to transfer files within an ftp site using cURL.
Is it possible to transfer files between directories on an ftp site using cURL without having to download the file first?
And if anyone else has better ideas on how to perform this task, I'd love to hear em!
Thanks!
You can upload the files using a special name and then rename it when done, and have the download client only download files with that special "upload completed" name style.
Or you move them between directories just as you say (which is essentially a rename as well, just changing the directory too).
With the command line curl, you can perform "raw" commands after the upload with the -Q option and you can even find a tiny example in the curl FAQ: http://curl.haxx.se/docs/faq.html#Can_I_use_curl_to_delete_rename
how download site for viewing offline without specific folder
for example i want download the site without http://site.com/forum/ sub-directory
wget --help
might lead you to
-nH, --no-host-directories don't create host directories.
I'd try that first, but I'm not sure whether it will do what you want.
I am using CGI.pm version 3.10 for file upload using Perl. I have a Perl script which uploads the file and one of my application keeps track of different revisions of the uploaded document with check-in check-out facility.
Re-creational steps:
I have done a checkout(download a file) using my application (which is web based uses apache).
Logout from current user session.
Login again with same credentials and then check-in (upload) a new file.
Output:
Upload successful
Perl upload script shows the correct uploaded data
New revision of the file created
Output is correct and expected except the one case which is the issue
Issue:
The content of the newly uploaded file are same as the content of the last uploaded revision in DB.
I am using a temp folder for copying the new content and if I print the new content in upload script then it comes correct. I have no limit on CGI upload size. It seems somewhere in CGI environment it fails might be the version i am using. I am not using taint mode.
Can anybody helps me to understand what might be the possible reason?
Sounds like you're getting the old file name stuck in the file upload field. Not sure if that can happen for filefield but this is a feature for other field types.
Try adding the -nosticky pragma, eg, use CGI qw(-nosticky :all);. Another pragma to try is -private_tempfiles, which should prevent the user from "eavesdropping" even on their own uploads.
Of course, it could be that you need to localize (my) some variable or add -force to the filefield.
I found the issue. The reason was destination path of the copied file was not correct, this was because my application one of event maps the path of copied file to different directory and this path is storing in user session. This happens only when I run the event just before staring upload script. This was the reason that it was hard to catch. As upload script is designed to pick the new copied file from same path so it always end up uploading the same file in DB with another revision. The new copied file lying in new path.
Solved by mapping correct path before upload.
Thanks