wget read input from standard input

wget read input from standard input - wget

From the wget man page
§ 2.4 Logging and Input File Options
‘-i file’
‘--input-file=file’
Read urls from a local or external file. If ‘-’ is specified as file, urls
are read from the standard input. (Use ‘./-’ to read from a file literally
named ‘-’.)
If this function is used, no urls need be present on the command line. If
there are urls both on the command line and in an input file, those on the
command lines will be the first ones to be retrieved. If ‘--force-html’ is
not specified, then file should consist of a series of URLs, one per line.
I tried doing
wget -i - www.google.com
It downloaded a file index.html, but then it hangs. Even after I pressed
"Enter" several times, it still hangs. Why?

Because you have not closed your 'file' wget continues waiting for you to type more into stdin. To terminate, press ctrl + d [EOF terminator]

Related

Exiftool: Want to output to one text file using -w command

I'm currently trying to use exiftool on Windows command prompt to read meta data from multiple files, then output to a single text file.
The exact command I last tried looked like this:
exiftool.exe -FileName -GPSPosition -CreateDate -d "%m:%d:%Y %H:%M:%S" -c "%d° %d' %.2f"\" -charset UTF-8 -ext jpg -w _Coordinate_Date.txt S:\Nick\Test\
When I run this, I get 7 individual text files with the content for one corresponding file in each of them. However, I simply want to output all of it to one single text file. Any help is greatly appreciated

The -w (textout) option can only be used to write multiple files. It is not meant to be used to output to a single file. As per the docs on -w:
It is not possible to specify a simple filename as an argument -- creating a single output file from multiple source files is typically done by shell redirection
Which is what you're doing with the >> ./output.txt part of your command. The -w _Coordinate_Date.txt isn't doing anything and I would think throw an Invalid TAG name: "w _Coordinate_Date.txt" error if quoted together like that as it gets treated as a single arugment. The -w option requires two arguments, the -w and either an extension or a format string.

I actually figured it out, if you wrap the entire -w _Coordinate_Date.txt command in quotations and append it to a file, you can throw all of the output into one text file.
i.e. "-w _Coordinate_Date.txt >> ./output.txt"

Using wget to recursively fetch .txt files in .php file, but filters break the command

I am looking to download all quality_variant_[accession_name].txt files from the Salk Arabidopsis 1001 Genomes site using wget in Bash shell.
Main page with list of accessions: http://signal.salk.edu/atg1001/download.php
Each accession links to a page (e.g., http://signal.salk.edu/atg1001/data/Salk/accession.php?id=Aa_0 where Aa_0 is the accession ID) containing three more links: unsequenced_[accession], quality_variant_[accession], and quality_variant_filtered_[accession]
I am only interested in the quality_variant_[accession] link (not quality_variant_filtered_[accession] link), which takes you to to a .txt file with sequence data (e.g., http://signal.salk.edu/atg1001/data/Salk/quality_variant_Aa_0.txt)
Running the command below, the files of interest are eventually outputted (but not downloaded because of the --spider argument), demonstrating that wget can move through the page's hyperlinks to the files I want.
wget --spider --recursive "http://signal.salk.edu/atg1001/download.php
I have not let the command run long enough to determine whether the files of interest are downloaded, but the command below does begin to download the site recursively.
# Arguments in brackets do not impact the performance of the command
wget -r [-e robots=off] [-m] [-np] [-nd] "http://signal.salk.edu/atg1001/download.php"
However, whenever I try to apply filters to pull out the .txt files of interest, whether with --accept-regex, --accept, or many other variants, I cannot get past the initial .php file.
# This and variants thereof do not work
wget -r -A "quality_variant_*.txt" "http://signal.salk.edu/atg1001/download.php"
# Returns:
# Saving to: ‘signal.salk.edu/atg1001/download.php.tmp’
# Removing signal.salk.edu/atg1001/download.php.tmp since it should be rejected.
I could make a list of the accession names and loop through those names modifying the URL in the wget command, but I was hoping for a dynamic one-liner that could extract all files of interest even if accession IDs are added over time.
Thank you!
Note: the data files of interest are contained in the directory http://signal.salk.edu/atg1001/data/Salk/, which is also home to a .php or static HTML page that is displayed when that URL is visited. This URL cannot be used in the wget command because, although the data files of interest are contained here server side, the HTML page contains no reference to these files but rather links to a different set of .txt files that I don't want.

Aborting wget on file found (Windows)

I am trying to use wget for Windows (on Windows 7) to find and download a file that I don't know the full name of (I have a partial name, and I know the form of the unknown part of the name). I am using an input file with a list of the possible file names, and I want to abort wget when the file is found (the rest of the possibilities will give 404 errors). How can I cause wget to abort automatically when that one file is found?

What am I screwing up trying to download particular file types with wget?

I am attempting to regularly archive a few file types hosted on a community website where our admin has been MIA for years, in case he dies or just stops paying for the hosting.
I am able to download all of the files I need using wget -r -np -nd -e robots=off -l 0 URL but this leaves me with about 60,000 extra files to waste time both downloading and deleting.
I am really only looking for files with the extensions "tbt" and "zip". When I add in -A tbt,zip to the input, wget then only downloads a single file, "index.html.tmp". It immediately deletes this file because it doesn't match the file type specified, and then the process stops entirely, with wget announcing that it is finished. It does not attempt to download any of the other files that it grabs when the -A flag is not included.
What am I doing wrong? Why does specifying file types in the way that I did cause it to finish after only looking at one file?

Possibly you're hitting the same problem I've hit when trying to do something similar. When using --accept, wget determines whether a links refers to a file or directory based on whether or not it ends with a /.
For example, say I have a directory named files, and a web page that has:
Lots o' files!
If I were to request this with wget -r, then I wget would happily GET /files, see that it was an HTML document containing a bunch of links, and continue to download those links.
However, if I add -A zip to my command line, and run wget with --debug, I see:
appending ‘http://localhost:8080/files’ to urlpos.
[...]
Deciding whether to enqueue "http://localhost:8080/files".
http://localhost:8080/files (files) does not match acc/rej rules.
Decided NOT to load it.
In other words, wget thinks this is a file (no trailing /) and it doesn't match our acceptance criteria, so it gets rejected.
If I modify the remote file so that it looks like...
Lots o' files!
...then wget will follow the link and download files as desired.
I don't think there's a great solution to this problem if you need to use wget. As I mentioned in my comment, there are other tools available that may handle this situation more gracefully.
It's also possible you're experiencing a different issue; the output of adding --debug to your command line clarify things in that case.

I also experienced this issue, on a page where all the download links looked something like this: filedownload.ashx?name=file.mp3. The solution was to match for both the linked file, and the downloaded file. So my wget accept flag looked like this: -A 'ashx,mp3'. I also used the --trust-server-names flag. This catches all the .ashx that are linked in the webpage, then when wget does the second check, all the mp3 files that were downloaded will stay.
As an alternative to --trust-server-names, you may also find the --content-disposition flag helpful. Both flags help rename the file that gets downloaded from filedownload.ashx?name=file.mp3 to just file.mp3.

CMD: Export all the screen content to a text file

In command prompt - How do I export all the content of the screen to a text file(basically a copy command, just not by using right-clicking and the clipboard)
This command works, but only for the commands you executed, not the actual output as well
doskey /HISTORY > history.txt

If you want to append a file instead of constantly making a new one/deleting the old one's content, use double > marks. A single > mark will overwrite all the file's content.
Overwrite file
MyCommand.exe>file.txt
^This will open file.txt if it already exists and overwrite the data, or create a new file and fill it with your output
Append file from its end-point
MyCommand.exe>>file.txt
^This will append file.txt from its current end of file if it already exists, or create a new file and fill it with your output.
Update #1 (advanced):
My batch-fu has improved over time, so here's some minor updates.
If you want to differentiate between error output and normal output for a program that correctly uses Standard streams, STDOUT/STDERR, you can do this with minor changes to the syntax. I'll just use > for overwriting for these examples, but they work perfectly fine with >> for append, in regards to file-piping output re-direction.
The 1 before the >> or > is the flag for STDOUT. If you need to actually output the number one or two before the re-direction symbols, this can lead to strange, unintuitive errors if you don't know about this mechanism. That's especially relevant when outputting a single result number into a file. 2 before the re-direction symbols is for STDERR.
Now that you know that you have more than one stream available, this is a good time to show the benefits of outputting to nul. Now, outputting to nul works the same way conceptually as outputting to a file. You don't see the content in your console. Instead of it going to file or your console output, it goes into the void.
STDERR to file and suppress STDOUT
MyCommand.exe 1>nul 2>errors.txt
STDERR to file to only log errors. Will keep STDOUT in console
MyCommand.exe 2>errors.txt
STDOUT to file and suppress STDERR
MyCommand.exe 1>file.txt 2>nul
STDOUT only to file. Will keep STDERR in console
MyCommand.exe 1>file.txt
STDOUT to one file and STDERR to another file
MyCommand.exe 1>stdout.txt 2>errors.txt
The only caveat I have here is that it can create a 0-byte file for an unused stream if one of the streams never gets used. Basically, if no errors occurred, you might end up with a 0-byte errors.txt file.
Update #2
I started noticing weird behavior when writing console apps that wrote directly to STDERR, and realized that if I wanted my error output to go to the same file when using basic piping, I either had to combine streams 1 and 2 or just use STDOUT. The problem with that problem is I didn't know about the correct way to combine streams, which is this:
%command% > outputfile 2>&1
Therefore, if you want all STDOUT and STDERR piped into the same stream, make sure to use that like so:
MyCommand.exe > file.txt 2>&1
The redirector actually defaults to 1> or 1>>, even if you don't explicitly use 1 in front of it if you don't use a number in front of it, and the 2>&1 combines the streams.
Update #3 (simple)
Null for Everything
If you want to completely suppress STDOUT and STDERR you can do it this way. As a warning not all text pipes use STDOUT and STDERR but it will work for a vast majority of use cases.
STD* to null
MyCommand.exe>nul 2>&1
Copying a CMD or Powershell session's command output
If all you want is the command output from a CMD or Powershell session that you just finished up, or any other shell for that matter you can usually just select that console from that session, CTRL + A to select all content, then CTRL + C to copy the content. Then you can do whatever you like with the copied content while it's in your clipboard.

Just see this page
in cmd type:
Command | clip
Then open a *.Txt file and Paste. That's it. Done.

If you are looking for each command separately
To export all the output of the command prompt in text files. Simply follow the following syntax.
C:> [syntax] >file.txt
The above command will create result of syntax in file.txt. Where new file.txt will be created on the current folder that you are in.
For example,
C:Result> dir >file.txt
To copy the whole session, Try this:
Copy & Paste a command session as follows:
1.) At the end of your session, click the upper left corner to display the menu.
Then select.. Edit -> Select all
2.) Again, click the upper left corner to display the menu.
Then select.. Edit -> Copy
3.) Open your favorite text editor and use Ctrl+V or your normal
Paste operation to paste in the text.

If your batch file is not interactive and you don't need to see it run then this should work.
#echo off
call file.bat >textfile.txt 2>&1
Otherwise use a tee filter. There are many, some not NT compatible. SFK the Swiss Army Knife has a tee feature and is still being developed. Maybe that will work for you.

How about this:
<command> > <filename.txt> & <filename.txt>
Example:
ipconfig /all > network.txt & network.txt
This will give the results in Notepad instead of the command prompt.

From command prompt Run as Administrator. Example below is to print a list of Services running on your PC run the command below:
net start > c:\netstart.txt
You should see a copy of the text file you just exported with a listing all the PC services running at the root of your C:\ drive.

If you want to output ALL verbosity, not just stdout. But also any printf statements made by the program, any warnings, infos, etc, you have to add 2>&1 at the end of the command line.
In your case, the command will be
Program.exe > file.txt 2>&1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse