Regex with wget? - wget

I'm using wget to download some useful website:
wget -k -m -r -q -t 1 http://www.web.com/
but I want replace some bad words with my own choice (like Yahoo pipes regex)

If you want to regexp out words from within the page you are fetching with wget, you should pipe the output through sed.
For example:
wget -k -m -r -q -t 1 -O - http://www.web.com/ | sed 's/cat/dog/g' > output.html
Use the -O - flag to write the output to stdout, and the -q flag to make wget run in quiet mode.
Haven't got a shell atm to check my syntax but that should set you on the right path!

You can use sed -i.
find www.web.com -type f -exec sed -i 's/word1\|word2\|word3//ig' {} +
word1, word2, word3, etc. are the words to delete.

Related

How to find and replace a string in multiple files in unix?

I can do the find and replace a string in multiple files with the below command.
find . -name '*.py' | xargs sed -i 's/foo/faa/g'
Can we do the same by using Perl?
Try this command...:
find . -name '*.py' | xargs perl -p -i -e 's/foo/faa/g'
N.B.: If you want to make a backup copy of your files before changing them, provide -i flag with an extension... I.E.: -i.bak...

Unable to get Grep get information in Terminal

I'm unable to get 'get' in terminal using Grep.
This code used to work on Lion but in Maverick the GET doesn't show...
sudo tcpdump -i en1 -n -s 0 -w - | grep -a -o -E "Host\:\ .*|GET\ \/.*"
Any help or suggestions maybe?
Try:
sudo tcpdump -s 0 -A | egrep --color=never -a -o "Host\: .*|GET\ \/.*"
The -w - writes the raw packets whereas the -A decodes to ASCII; handy for web pages (per man)
I found that if grep was outputting color, the Host: lines were output as empty lines.

sed: can't read /home/me/weather: No such file or directory

I have the following extract from a script that fetches weather information from accuweather:
wget -O ./weather_raw $address
if [[ -s ./weather_raw ]]; then
egrep 'Currently|Forecast<\/title>|_31x31.gif' ./weather_raw > ./weather
sed -i '/AccuWeather\|Currently in/d' ./weather
sed -i -e 's/^[ \t]*//g' -e 's/<title>\|<\/title>\|<description>\|<\/description>//g' ./weather
sed -i -e 's/<img src="/\n/g' ./weather
sed -i '/^$/d' ./weather
sed -i -e 's/_31x31.*$//g' -e 's/^.*\/icons\///g' ./weather
sed -i -e '1s/.$//' -e '3s/.$//' -e '6s/.$//' ./weather
for (( i=2; i<=8; i+=3 ))
do
im=$(sed -n ${i}p ./weather)
sed -i $i"s/^.*$/$(test_image $im)/" ./weather
done
fi
The command that triggers the code above is in a conkyrc file and its ~/.conkyblue/accu_weather/rss/acc_rss. When I run the conkyrc script from the prompt, I get an error
sed: can't read /home/me/weather: No such file or directory
And indeed when I check, the "weather" file is not created. However if run the command ~/.conkyblue/accu_weather/rss/acc_rss directly from the prompt, it works as expected and create and puts content into the /home/me/weather file.
I don't know anything about the sed command although I'm trying to learn it as a result of this bother.
What could be the problem with the code. I don't think its a permission issue since the folder its writing into is my home folder and I of-course own it.
Thanks
It should have been created by egrep.
When you run your script, the weather directory will be created in the pwd of the script process.
Check and see why egrep does not create the file, or in which directory it created it.

Change multiple files

The following command is correctly changing the contents of 2 files.
sed -i 's/abc/xyz/g' xaa1 xab1
But what I need to do is to change several such files dynamically and I do not know the file names. I want to write a command that will read all the files from current directory starting with xa* and sed should change the file contents.
I'm surprised nobody has mentioned the -exec argument to find, which is intended for this type of use-case, although it will start a process for each matching file name:
find . -type f -name 'xa*' -exec sed -i 's/asd/dsg/g' {} \;
Alternatively, one could use xargs, which will invoke fewer processes:
find . -type f -name 'xa*' | xargs sed -i 's/asd/dsg/g'
Or more simply use the + exec variant instead of ; in find to allow find to provide more than one file per subprocess call:
find . -type f -name 'xa*' -exec sed -i 's/asd/dsg/g' {} +
Better yet:
for i in xa*; do
sed -i 's/asd/dfg/g' $i
done
because nobody knows how many files are there, and it's easy to break command line limits.
Here's what happens when there are too many files:
# grep -c aaa *
-bash: /bin/grep: Argument list too long
# for i in *; do grep -c aaa $i; done
0
... (output skipped)
#
You could use grep and sed together. This allows you to search subdirectories recursively.
Linux: grep -r -l <old> * | xargs sed -i 's/<old>/<new>/g'
OS X: grep -r -l <old> * | xargs sed -i '' 's/<old>/<new>/g'
For grep:
-r recursively searches subdirectories
-l prints file names that contain matches
For sed:
-i extension (Note: An argument needs to be provided on OS X)
Those commands won't work in the default sed that comes with Mac OS X.
From man 1 sed:
-i extension
Edit files in-place, saving backups with the specified
extension. If a zero-length extension is given, no backup
will be saved. It is not recommended to give a zero-length
extension when in-place editing files, as you risk corruption
or partial content in situations where disk space is exhausted, etc.
Tried
sed -i '.bak' 's/old/new/g' logfile*
and
for i in logfile*; do sed -i '.bak' 's/old/new/g' $i; done
Both work fine.
#PaulR posted this as a comment, but people should view it as an answer (and this answer works best for my needs):
sed -i 's/abc/xyz/g' xa*
This will work for a moderate amount of files, probably on the order of tens, but probably not on the order of millions.
Another more versatile way is to use find:
sed -i 's/asd/dsg/g' $(find . -type f -name 'xa*')
I'm using find for similar task. It is quite simple: you have to pass it as an argument for sed like this:
sed -i 's/EXPRESSION/REPLACEMENT/g' `find -name "FILE.REGEX"`
This way you don't have to write complex loops, and it is simple to see, which files you are going to change, just run find before you run sed.
u can make
'xxxx' text u search and will replace it with 'yyyy'
grep -Rn '**xxxx**' /path | awk -F: '{print $1}' | xargs sed -i 's/**xxxx**/**yyyy**/'
There's some good answers above. I thought I'd throw in one more that is succinct and parallelizable, using GNU parallel, which I often prefer to xargs:
parallel sed -i 's/abc/xyz/g' {} ::: xa*
Combine this with the -j N option to run N jobs in parallel.
If you are able to run a script, here is what I did for a similar situation:
Using a dictionary/hashMap (associative array) and variables for the sed command, we can loop through the array to replace several strings. Including a wildcard in the name_pattern will allow to replace in-place in files with a pattern (this could be something like name_pattern='File*.txt' ) in a specific directory (source_dir).
All the changes are written in the logfile in the destin_dir
#!/bin/bash
source_dir=source_path
destin_dir=destin_path
logfile='sedOutput.txt'
name_pattern='File.txt'
echo "--Begin $(date)--" | tee -a $destin_dir/$logfile
echo "Source_DIR=$source_dir destin_DIR=$destin_dir "
declare -A pairs=(
['WHAT1']='FOR1'
['OTHER_string_to replace']='string replaced'
)
for i in "${!pairs[#]}"; do
j=${pairs[$i]}
echo "[$i]=$j"
replace_what=$i
replace_for=$j
echo " "
echo "Replace: $replace_what for: $replace_for"
find $source_dir -name $name_pattern | xargs sed -i "s/$replace_what/$replace_for/g"
find $source_dir -name $name_pattern | xargs -I{} grep -n "$replace_for" {} /dev/null | tee -a $destin_dir/$logfile
done
echo " "
echo "----End $(date)---" | tee -a $destin_dir/$logfile
First, the pairs array is declared, each pair is a replacement string, then WHAT1 will be replaced for FOR1 and OTHER_string_to replace will be replaced for string replaced in the file File.txt. In the loop the array is read, the first member of the pair is retrieved as replace_what=$i and the second as replace_for=$j. The find command searches in the directory the filename (that may contain a wildcard) and the sed -i command replaces in the same file(s) what was previously defined. Finally I added a grep redirected to the logfile to log the changes made in the file(s).
This worked for me in GNU Bash 4.3 sed 4.2.2 and based upon VasyaNovikov's answer for Loop over tuples in bash.
The Silver Searcher Solution
I'm adding another option for those people who don't know about the amazing tool called The Silver Searcher (command line tool is ag).
Note: You can use grep and other tools to do the same thing here, but The Silver Searcher is fantastic :)
TLDR
ag -l 'abc' | xargs sed -i 's/abc/xyz/g'
Install The Silver Searcher
sudo apt install silversearcher-ag # Debian / Ubuntu
sudo pacman -S the_silver_searcher # Arch / EndeavourOS
sudo yum install epel-release the_silver_searcher # RHEL / CentOS
Demo Files
Paste the following into your terminal to create some demonstration files:
mkdir /tmp/food
cd /tmp/food
content="Everybody loves to abc this food!"
echo "$content" > ./milk
echo "$content" > ./bread
mkdir ./fastfood
echo "$content" > ./fastfood/pizza
echo "$content" > ./fastfood/burger
mkdir ./fruit
echo "$content" > ./fruit/apple
echo "$content" > ./fruit/apricot
Using 'ag'
The following ag command will recursively find all the files that contain the string 'abc'. It ignores the .git directory, .gitignore files, and other ignore files:
$ ag 'abc'
milk
1:Everybody loves to abc this food!
bread
1:Everybody loves to abc this food!
fastfood/burger
1:Everybody loves to abc this food!
fastfood/pizza
1:Everybody loves to abc this food!
fruit/apple
1:Everybody loves to abc this food!
fruit/apricot
1:Everybody loves to abc this food!
To just list the files that contain the string 'abc', use the -l switch:
$ ag -l 'abc'
bread
fastfood/burger
fastfood/pizza
fruit/apricot
milk
fruit/apple
Changing Multiple Files
Finally, using xargs and sed, we can replace the 'abc' string with another string:
ag -l 'abc' | xargs sed -i 's/abc/eat/g'
In the above command, ag is listing all the files that contain the string 'abc'. The xargs command is splitting the file names and piping them individually into the sed command.

Sed not working with find command

I'm trying to run this sed script on all the files in a directory:
sed.s:
/<constants>/a\
<const type="profElem" name="mission_description" value="NCEP and NCAR Reanalysis Monthly Means and Other Derived Variables"/>
but whenever I run:
find . -exec sed -f sed.s -i {} \;
I get the error:
sed: -i may not be used with stdin
How do I get this to work?
It appears that your version of sed requires you to pass an extension for backups to the -i option. If you feel pretty confident in your command you could try to give it a zero-length extension like so:
find . -exec sed -f sed.s -i '' {} \;