How to see the search found using curl o wget? - wget

I need to see the searches found using curl or wget , when it find results with '301' status code.
This is my variable using curl.
website=$(curl -s --head -w %{http_code} https://launchpad.net/~[a-z]/+archive/pipelight -o /dev/null | sed 's#404##g')
echo $website
301
The above works, but only display if the site exists with '301' status code.
I want
echo $website
https://launchpad.net/~mqchael/+archive/pipelight

You can add the "effective URL" to your output. Change %{http_code} to "%{http_code} %{url_effective} ".
From there, it's just a matter of fussing with the regular expression. Change the sed string to 's#404 [^ ]* ##g'. That will eliminate (assuming you don't already know) not just the 404s, but will also eat the URL that follows it.
So:
curl -s --head -w "%{http_code} %{url_effective} " https://launchpad.net/~[a-z]/+archive/pipelight -o /dev/null | sed 's#404 [^ ]* ##g''s#404 [^ ]* ##g'
will give you:
301 https://launchpad.net/~j/+archive/pipelight
You may want to replace the HTTP codes with new-lines, after that.

Related

How to remove word between the slashes?

I need regexp which would remove tinrab in all directories
grep -rl "tinrab" /home/miki/go/src/github.com/meower
/home/miki/go/src/github.com/meower/pusher-service/main.go
/home/miki/go/src/github.com/meower/Dockerfile
/home/miki/go/src/github.com/meower/event/nats.go
/home/miki/go/src/github.com/meower/event/event.go
/home/miki/go/src/github.com/meower/meow-service/handlers.go
/home/miki/go/src/github.com/meower/meow-service/main.go
/home/miki/go/src/github.com/meower/query-service/handlers.go
/home/miki/go/src/github.com/meower/query-service/main.go
/home/miki/go/src/github.com/meower/search/elastic.go
/home/miki/go/src/github.com/meower/search/repository.go
/home/miki/go/src/github.com/meower/Gopkg.lock
/home/miki/go/src/github.com/meower/Gopkg.toml
/home/miki/go/src/github.com/meower/db/repository.go
/home/miki/go/src/github.com/meower/db/postgres.go
/home/miki/go/src/github.com/meower/.git/logs/HEAD
/home/miki/go/src/github.com/meower/.git/logs/refs/heads/master
/home/miki/go/src/github.com/meower/.git/logs/refs/remotes/origin/HEAD
/home/miki/go/src/github.com/meower/.git/config
For example in Dockerfile I have this line
WORKDIR /go/src/github.com/tinrab/meower
My goal is to have line without tinrab
WORKDIR /go/src/github.com/meower
This regexp
replaceAll("/tinrab*/ *", " ")
should somehow be implemented.
Any ideas?
If I try
grep -rl "tinrab" /home/miki/go/src/github.com/meower | xargs replaceAll("\/tinrab\/", "")
I got
bash: syntax error near unexpected token `('
You said you wanted a regexp /tinrab*/ *, but based on the examples you gave I guess /tinrab will do what you want..
grep -rl "tinrab" /home/miki/go/src/github.com/meower | \
xargs -r sed -i 's!/tinrab!!g'
The general format of the sed expression I used is: s!<from>!<to>!g which works the same as the replaceAll("<from>", "<to>") pseudocode you considered.

Wget: Filenames without the query string

I want to download a list of webpages from a file. How can I stop Wget appending the query strings on to the saved files?
wget http://www.example.com/index.html?querystring
I need this to be downloaded as index.html, not index.html?querystring
There is the -O option:
wget -O file.html http://www.example.com/index.html?querystring
so you can alter a little bit your script to pass to the -O argument the right file name.
I've finally resigned to using the -O and just wrapped it in a bash function to make it easier. I put this in my ~/.bashrc file:
wget-rmq ()
{
[ -z "$1" ] && echo 'error: wget-rmq requires a URL to retrieve as the first arg'
local output_filename="$(echo $1 | sed 's/?.*//g' | sed 's|https.*/||g')"
wget -O "${output_filename}" "${1}"
}
Then when I want to download a file:
wget-rmq http://www.example.com/index.html?querystring
The replacement regex is fairly simple. If any ?s appear in the URL before the query string begins then it will break. In practice that hasn't happened though since URL encoding requires ? to be in URLs as %3F, but I wanted to note the possibility.

Change multiple files

The following command is correctly changing the contents of 2 files.
sed -i 's/abc/xyz/g' xaa1 xab1
But what I need to do is to change several such files dynamically and I do not know the file names. I want to write a command that will read all the files from current directory starting with xa* and sed should change the file contents.
I'm surprised nobody has mentioned the -exec argument to find, which is intended for this type of use-case, although it will start a process for each matching file name:
find . -type f -name 'xa*' -exec sed -i 's/asd/dsg/g' {} \;
Alternatively, one could use xargs, which will invoke fewer processes:
find . -type f -name 'xa*' | xargs sed -i 's/asd/dsg/g'
Or more simply use the + exec variant instead of ; in find to allow find to provide more than one file per subprocess call:
find . -type f -name 'xa*' -exec sed -i 's/asd/dsg/g' {} +
Better yet:
for i in xa*; do
sed -i 's/asd/dfg/g' $i
done
because nobody knows how many files are there, and it's easy to break command line limits.
Here's what happens when there are too many files:
# grep -c aaa *
-bash: /bin/grep: Argument list too long
# for i in *; do grep -c aaa $i; done
0
... (output skipped)
#
You could use grep and sed together. This allows you to search subdirectories recursively.
Linux: grep -r -l <old> * | xargs sed -i 's/<old>/<new>/g'
OS X: grep -r -l <old> * | xargs sed -i '' 's/<old>/<new>/g'
For grep:
-r recursively searches subdirectories
-l prints file names that contain matches
For sed:
-i extension (Note: An argument needs to be provided on OS X)
Those commands won't work in the default sed that comes with Mac OS X.
From man 1 sed:
-i extension
Edit files in-place, saving backups with the specified
extension. If a zero-length extension is given, no backup
will be saved. It is not recommended to give a zero-length
extension when in-place editing files, as you risk corruption
or partial content in situations where disk space is exhausted, etc.
Tried
sed -i '.bak' 's/old/new/g' logfile*
and
for i in logfile*; do sed -i '.bak' 's/old/new/g' $i; done
Both work fine.
#PaulR posted this as a comment, but people should view it as an answer (and this answer works best for my needs):
sed -i 's/abc/xyz/g' xa*
This will work for a moderate amount of files, probably on the order of tens, but probably not on the order of millions.
Another more versatile way is to use find:
sed -i 's/asd/dsg/g' $(find . -type f -name 'xa*')
I'm using find for similar task. It is quite simple: you have to pass it as an argument for sed like this:
sed -i 's/EXPRESSION/REPLACEMENT/g' `find -name "FILE.REGEX"`
This way you don't have to write complex loops, and it is simple to see, which files you are going to change, just run find before you run sed.
u can make
'xxxx' text u search and will replace it with 'yyyy'
grep -Rn '**xxxx**' /path | awk -F: '{print $1}' | xargs sed -i 's/**xxxx**/**yyyy**/'
There's some good answers above. I thought I'd throw in one more that is succinct and parallelizable, using GNU parallel, which I often prefer to xargs:
parallel sed -i 's/abc/xyz/g' {} ::: xa*
Combine this with the -j N option to run N jobs in parallel.
If you are able to run a script, here is what I did for a similar situation:
Using a dictionary/hashMap (associative array) and variables for the sed command, we can loop through the array to replace several strings. Including a wildcard in the name_pattern will allow to replace in-place in files with a pattern (this could be something like name_pattern='File*.txt' ) in a specific directory (source_dir).
All the changes are written in the logfile in the destin_dir
#!/bin/bash
source_dir=source_path
destin_dir=destin_path
logfile='sedOutput.txt'
name_pattern='File.txt'
echo "--Begin $(date)--" | tee -a $destin_dir/$logfile
echo "Source_DIR=$source_dir destin_DIR=$destin_dir "
declare -A pairs=(
['WHAT1']='FOR1'
['OTHER_string_to replace']='string replaced'
)
for i in "${!pairs[#]}"; do
j=${pairs[$i]}
echo "[$i]=$j"
replace_what=$i
replace_for=$j
echo " "
echo "Replace: $replace_what for: $replace_for"
find $source_dir -name $name_pattern | xargs sed -i "s/$replace_what/$replace_for/g"
find $source_dir -name $name_pattern | xargs -I{} grep -n "$replace_for" {} /dev/null | tee -a $destin_dir/$logfile
done
echo " "
echo "----End $(date)---" | tee -a $destin_dir/$logfile
First, the pairs array is declared, each pair is a replacement string, then WHAT1 will be replaced for FOR1 and OTHER_string_to replace will be replaced for string replaced in the file File.txt. In the loop the array is read, the first member of the pair is retrieved as replace_what=$i and the second as replace_for=$j. The find command searches in the directory the filename (that may contain a wildcard) and the sed -i command replaces in the same file(s) what was previously defined. Finally I added a grep redirected to the logfile to log the changes made in the file(s).
This worked for me in GNU Bash 4.3 sed 4.2.2 and based upon VasyaNovikov's answer for Loop over tuples in bash.
The Silver Searcher Solution
I'm adding another option for those people who don't know about the amazing tool called The Silver Searcher (command line tool is ag).
Note: You can use grep and other tools to do the same thing here, but The Silver Searcher is fantastic :)
TLDR
ag -l 'abc' | xargs sed -i 's/abc/xyz/g'
Install The Silver Searcher
sudo apt install silversearcher-ag # Debian / Ubuntu
sudo pacman -S the_silver_searcher # Arch / EndeavourOS
sudo yum install epel-release the_silver_searcher # RHEL / CentOS
Demo Files
Paste the following into your terminal to create some demonstration files:
mkdir /tmp/food
cd /tmp/food
content="Everybody loves to abc this food!"
echo "$content" > ./milk
echo "$content" > ./bread
mkdir ./fastfood
echo "$content" > ./fastfood/pizza
echo "$content" > ./fastfood/burger
mkdir ./fruit
echo "$content" > ./fruit/apple
echo "$content" > ./fruit/apricot
Using 'ag'
The following ag command will recursively find all the files that contain the string 'abc'. It ignores the .git directory, .gitignore files, and other ignore files:
$ ag 'abc'
milk
1:Everybody loves to abc this food!
bread
1:Everybody loves to abc this food!
fastfood/burger
1:Everybody loves to abc this food!
fastfood/pizza
1:Everybody loves to abc this food!
fruit/apple
1:Everybody loves to abc this food!
fruit/apricot
1:Everybody loves to abc this food!
To just list the files that contain the string 'abc', use the -l switch:
$ ag -l 'abc'
bread
fastfood/burger
fastfood/pizza
fruit/apricot
milk
fruit/apple
Changing Multiple Files
Finally, using xargs and sed, we can replace the 'abc' string with another string:
ag -l 'abc' | xargs sed -i 's/abc/eat/g'
In the above command, ag is listing all the files that contain the string 'abc'. The xargs command is splitting the file names and piping them individually into the sed command.

Regex with wget?

I'm using wget to download some useful website:
wget -k -m -r -q -t 1 http://www.web.com/
but I want replace some bad words with my own choice (like Yahoo pipes regex)
If you want to regexp out words from within the page you are fetching with wget, you should pipe the output through sed.
For example:
wget -k -m -r -q -t 1 -O - http://www.web.com/ | sed 's/cat/dog/g' > output.html
Use the -O - flag to write the output to stdout, and the -q flag to make wget run in quiet mode.
Haven't got a shell atm to check my syntax but that should set you on the right path!
You can use sed -i.
find www.web.com -type f -exec sed -i 's/word1\|word2\|word3//ig' {} +
word1, word2, word3, etc. are the words to delete.

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?
The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.
So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.
Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo