I am trying to use wget for Windows (on Windows 7) to find and download a file that I don't know the full name of (I have a partial name, and I know the form of the unknown part of the name). I am using an input file with a list of the possible file names, and I want to abort wget when the file is found (the rest of the possibilities will give 404 errors). How can I cause wget to abort automatically when that one file is found?
Related
I can not download files from a server of my work because the names of the files have reserved characters (error not controlled by the company and by the erroneous named by the clients that uploads attachments) and for some reason the 404 error even though the files exist on the server, by the way I use wget for this task.
This is the executing line that starts the download (list.txt contains url lines from the server to the file in question- example: https://example.com/files/122301/8+.pdf)
wget.exe -x -i "C:\clon\list.txt" -P "C:\clon\destino" -nv -o "C:\clon\log.txt"
I do not know the functionality of the parameters given in wget in addition to the source / destination routes such as the log but some files contain '}' or '+' in their file names and therefore (I think) the missing files are not downloaded ( I have 93% downloaded from all files)
Examples of files including these characters:
/FC04-6198}+.pdf
/8+.pdf
/PT05+2236.pdf
Try placing these parameters "--content-disposition" or "--restrict-file-names" but nothing.
I expect to get a way to ignore the reserved characters to be able to download them.
I am attempting to regularly archive a few file types hosted on a community website where our admin has been MIA for years, in case he dies or just stops paying for the hosting.
I am able to download all of the files I need using wget -r -np -nd -e robots=off -l 0 URL but this leaves me with about 60,000 extra files to waste time both downloading and deleting.
I am really only looking for files with the extensions "tbt" and "zip". When I add in -A tbt,zip to the input, wget then only downloads a single file, "index.html.tmp". It immediately deletes this file because it doesn't match the file type specified, and then the process stops entirely, with wget announcing that it is finished. It does not attempt to download any of the other files that it grabs when the -A flag is not included.
What am I doing wrong? Why does specifying file types in the way that I did cause it to finish after only looking at one file?
Possibly you're hitting the same problem I've hit when trying to do something similar. When using --accept, wget determines whether a links refers to a file or directory based on whether or not it ends with a /.
For example, say I have a directory named files, and a web page that has:
Lots o' files!
If I were to request this with wget -r, then I wget would happily GET /files, see that it was an HTML document containing a bunch of links, and continue to download those links.
However, if I add -A zip to my command line, and run wget with --debug, I see:
appending ‘http://localhost:8080/files’ to urlpos.
[...]
Deciding whether to enqueue "http://localhost:8080/files".
http://localhost:8080/files (files) does not match acc/rej rules.
Decided NOT to load it.
In other words, wget thinks this is a file (no trailing /) and it doesn't match our acceptance criteria, so it gets rejected.
If I modify the remote file so that it looks like...
Lots o' files!
...then wget will follow the link and download files as desired.
I don't think there's a great solution to this problem if you need to use wget. As I mentioned in my comment, there are other tools available that may handle this situation more gracefully.
It's also possible you're experiencing a different issue; the output of adding --debug to your command line clarify things in that case.
I also experienced this issue, on a page where all the download links looked something like this: filedownload.ashx?name=file.mp3. The solution was to match for both the linked file, and the downloaded file. So my wget accept flag looked like this: -A 'ashx,mp3'. I also used the --trust-server-names flag. This catches all the .ashx that are linked in the webpage, then when wget does the second check, all the mp3 files that were downloaded will stay.
As an alternative to --trust-server-names, you may also find the --content-disposition flag helpful. Both flags help rename the file that gets downloaded from filedownload.ashx?name=file.mp3 to just file.mp3.
I am using Pentaho CE 5 on windows. I would like to use CTools but I can't make them show up in the File -> New menu to use them.
Being behind a proxy, I can not use the Marketplace plugin, so I have tried a manual installation.
First, I tried to use the ctools-installer.sh. I have run the following command line in cygwin (wget and unzip are installed):
./ctools-installer.sh -s /cygdrive/d/Users/[user]/Mes\ Programmes/pentaho/biserver-ce/pentaho-solutions/ -w /cygdrive/d/Users/[user]/Mes\ programmes/pentaho/biserver-ce/tomcat/webapps/pentaho/
The script starts, asks me what module I want to install, and begins the downloads.
For each module, I get an output like (set -x added to the script) :
echo -n 'Downloading CDF...' Downloading CDF...+ wget -q --no-check-certificate 'http://ci.analytical-labs.com/job/Webdetails-CDF-5-Release/lastSuccessfulBuild/artifact/bi-platform-v2-plugin/dist/zip/dist.zip'
-O .tmp/cdf/dist.zip SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
'[' '!' -z '' ']'
rm -f .tmp/dist/marketplace.xml
unzip -o .tmp/cdf/dist.zip -d .tmp End-of-central-directory signature not found. Either this file is not a zipfile, or it
constitutes one disk of a multi-part archive. In the latter case
the central directory and zipfile comment will be found on the last
disk(s) of this archive. unzip: cannot find zipfile directory in
.tmp/cdf/dist.zip,
and cannot find .tmp/cdf/dist.zip.zip, period.
chmod -R u+rwx .tmp
echo Done Done
Then the script ends. I have seen on this page (pentaho-bi-suite) that it is the normal output. Nevertheless, it seems a bit strange to me and when I start my pentaho server (login: admin/password), I cannot see any new tools in the menus.
After a look to a few other tutorials and the script itself, I have downloaded the .zip snapshots for every tool and unzipped them in the system directory of my pentaho server. Same result.
I would like to make the .sh works, what can I try or adjust ?
Thanks
EDIT 05/06/2014
I checked the dist.zip files dowloaded by the script and they are all empty. It seems that wget cannot fetch the zip files, and therefore the installation fails.
When I try to get any webpage through wget, it fails. I think it is because of the proxy.
Here is my .wgetrc file, located in my user's cygwin home folder:
use_proxy=on
http_proxy=http://[url]:[port]
https_proxy=http://[url]:[port]
proxy_user=[user]
proxy_password=[password]
How could I make this work?
EDIT 10/06/2014
In the end, I have changed my network connection settings to bypass the proxy. It seems that there is an offline mode for the installer, so one can download all needed files on a proxy-free environment and then run the script offline.
I guess this is related with the -r option.
I consider this post solved, since it not a CTools issue anymore.
Difficult to identify the issue in the above procedure..
but you can refer this blog he is key member of pentaho itself..
In the end, I have changed my network connection settings to bypass the proxy. It seems that there is an offline mode for the installer, so one can download all needed files on a proxy-free environment and then run the script offline. I guess this is related with the -r option.
I consider this post solved, since it is not a CTools issue anymore.
You can manually install the components from http://www.webdetails.pt/ctools/ or if you have pentaho 5.1 or above, you add the following parameters to CATALINA_OPTS option (in start-pentaho.bat or start-pentaho.sh):
-Dhttp.proxyHost= -Dhttp.proxyPort= -Dhttp.nonProxyHosts="localhost|127.0.0.1|10...*"
http://docs.treasuredata.com/articles/pentaho-dataintegration#tips-how-can-i-use-pentaho-through-a-proxy
Setup:
Windows 7 Enterprise.
Matlab 7.10.0 (R2010a).
mcc compiler: Microsoft Visual C++ 2008 Express.
What's happening:
My project runs fine when running it through Matlab, but when trying to run the .exe through the command prompt after using mcc to compile, the command prompt generates an error.
The mcc command I issue is:
mcc -m -v STARTUP1.m -o EXE_REDUC
The error I receive in the command prompt is:
??? Error using ==> textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
I have a file called LoadXLS.m that loads and reads a .csv file using:
fid = fopen(file,'r');
temp_data = textscan(fid,...args...);
And then I process temp_data.
The csv file I'm trying to load is called spec.csv. It is located two directories down from where I have STARTUP1.m stored. The location of STARTUP1.m is also the place that the mcc generated files are stored to. I have used the pathtool to "Add with subfolders" this location, but am aware that those locations are not transferred to mbuild when compiling.
What I've Tried:
I have gone in and added print statements to print the value of fid to make sure it is valid. When I run it in Matlab, it has a valid value, however when I run in the command prompt it always returns an invalid value of -1.
I have removed all addpath() calls, I have tried adding the STARTUP1.m directory to the mcc ctf archive using:
mcc -m -v -a 'C:\Users\...path...\STARTUP1.m_location' STARTUP1.m -o EXE_REDUC;
However when I do this, I get a different error when running in the command prompt:
Cannot open CTF archive file
'C:\...path...\AppData\Local\Temp\mathworks_tmp_7532_28296'
or
'C:\...path...\AppData\Local\Temp\mathworks_tmp_7532_28296.zip'
??? Undefined function or variable 'matlabrc'.
To fix this, I've tried adding the pragma
%#function matlabrc
to the top of STARTUP1.m to try and enforce its inclusion, but had no success.
I also copied the spec.csv file to a new directory in the ctfroot and changed
fid = fopen(...)
to:
[tempFile, message] = fopen(fullfile(ctfroot, 'Added Config Files', ad.spec_file));
The message is:
message is: No such file or directory
Objective:
Rearranging file locations is a sufficient workaround while the exectuable only runs on my computer, however the idea is to take this standalone and distribute it to multiple people on many different computers. I would like to be able to have a top folder with a startup file and within this folder, have as many subfolders as the package requires. The startup file should be able to access all subfolders and files within them as necessary.
I read something about the exectuable actually running from a "secret location" on the machine here: http://matlab.wikia.com/wiki/FAQ
I would just like to be able to group one entire folder tree with all its files into a package containing the executable and be able to run it anywhere.
More info:
When I put the spec.csv file in the same directory as STARUTP1.m, it finds it fine using mcc without the -a 'path' option and using the following in the LoadXLS.m file:
[tempFile, message] = fopen(ad.spec_file,'r');
This project contains GUIs, generates PDFs, generates plots, and also creates a zip directory.
Thank you in advance.
In SQL I'm using xp_cmdShell to run FTP commands. I have no problem getting the list of files or copying files to the local server, but I want to compare copied file size to the original to make sure the get has been successful.
Any ideas on how to compare file sizes?
From a command prompt you can use the DOS File Compare command (fc). In your case you probably want to do a binary compare (there is no file size compare). I binary compare should work in your case.
Most DOS commands will return some code that let s you know the status.
http://www.computerhope.com/fchlp.htm
EDIT
Sorry, I read your question and realized you want to compare it against a file on the ftp server. I think this is a moot point since if ftp reports a successful file transfer there is no reason to compare (unless your source of comparison for not the ftp site). Does that make sense?
What you could do it use the FTP command ls command.
ftp> ls <filename>
where ftp> is the ftp prompt and not part of the command. This command gives you the file size in bytes. Then you need to use the dos command for the local file. Here is a StackOverflow question (and answer) about that.
Windows command for file size only?