Combining two sed commands - sed

I have a file r. I want to replace the words File and MINvac.pdb in it with nothing. The commands I used are
sed -i 's/File//g' /home/kanika/standard_minimizer_prosee/r
and
sed -i 's/MINvac.pdb//g' /home/kanika/standard_minimizer_prosee/r
I want to combine both sed commands into one, but I don't know the way. Can anyone help?
The file looks like this:
-6174.27 File10MINvac.pdb
-514.451 File11MINvac.pdb
4065.68 File12MINvac.pdb
-4708.64 File13MINvac.pdb
6674.54 File14MINvac.pdb
8563.58 File15MINvac.pdb

sed is a scripting language. You separate commands with semicolon or newline. Many sed dialects also allow you to pass each command as a separate -e option argument.
sed -i 's/File//g;s/MINvac\.pdb//g' /home/kanika/standard_minimizer_prosee/r
I also added a backslash to properly quote the literal dot before pdb, but in this limited context that is probably unimportant.
For completeness, here is the newline variant. Many newcomers are baffled that the shell allows literal newlines in quoted strings, but it can be convenient.
sed -i 's/File//g
s/MINvac\.pdb//g' /home/kanika/standard_minimizer_prosee/r
Of course, in this limited case, you could also combine everything into one regex:
sed -i 's/\(File\|MINvac\.pdb\)//g' /home/kanika/standard_minimizer_prosee/r
(Some sed dialects will want this without backslashes, and/or offer an option to use extended regular expressions, where they should be omitted. BSD sed, and thus also MacOS sed, demands a mandatory argument to sed -i which can however be empty, like sed -i ''.)

Use the -e flag:
sed -i -e 's/File//g' -e 's/MINvac.pdb//g' /home/kanika/standard_minimizer_prosee/r
Once you get more commands than are convenient to define with -es, it is better to store the commands in a separate file and include it with the -f flag.
In this case, you'd make a file containing:
s/File//g
s/MINvac.pdb//g
Let's call that file 'sedcommands'. You'd then use it with sed like this:
sed -i -f sedcommands /home/kanika/standard_minimizer_prosee/r
With only two commands, it's probably not worthwhile using a separate file of commands, but it is quite convenient if you have a lot of transformations to make.

Related

sed remove line if neither pattern provided don't match

I am trying to create a filter command to reduce the lines from a log file, assume each line contains partition made of date,
/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
....
/iamthepathxx/20210619/filexx.txt
then from thousands of lines I only want to keep the ones with two string in the path
/202106
/202105
and remove any other lines
I have tried following command
sed -i -e '\(/202105\|/202106\)!d' ~/log.txt
above command threw
sed: -e expression #1, char 24: unterminated address regex
You can use
sed -i '/\/20210[56]/!d' ~/log.txt
Or, if you need to use more specific alternatives and further enhance the pattern:
sed -i -E '/\/(202105|202106)/!d' ~/log.txt
Details:
-i - GNU sed option for inline file replacement
-E - option enabling POSIX ERE regex syntax
/\/20210[56]/ - regex that matches /20210 and then either 5 or 6
\/(202105|202106) - the POSIX ERE pattern that matches / and then either 202105 or 202106
!d - removes the lines not matching the pattern.
See the online demo:
#!/bin/bash
s='/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
/iamthepathxx/20210619/filexx.txt'
sed '/\/20210[56]/!d' <<< "$s"
Output:
/iamthepathxx/20210619/filexx.txt
sed is the wrong tool for this. If you want a script that's as fragile as the sed one then use grep as it's the tool that exists solely to do a simple g/re/p (hence the name) like you're doing:
$ grep '/20210[56]' file
/iamthepathxx/20210619/filexx.txt
or if you want a more robust solution that focuses just on the part of the line you want to match and so will avoid false matches, then use awk:
$ awk -F '/' '$3 ~ /^20210[56]/' file
/iamthepathxx/20210619/filexx.txt
This might work for you (GNU sed):
sed -ni '\#/20210[56]#p' file
This uses seds -n grep-like option to turn off implicit printing and -i option to edit the file in place.
Normally sed uses the /.../ to match but other delimiters may be used if the first is escaped e.g. \#...#.
So the above solution will filter the existing file down to lines that contain either /202105 or /202106.
N.B. grep will almost certainly be faster in finding the above lines however the use of the -i option may be the ultimate reason for choosing sed (although the same outcome can be achieved by tacking on the > tmpFile && mv tmpFile file to a grep solution).

(Gnu) sed command to change a matching part of a line

Is there a way in (Gnu) sed to replace all characters in a matching part of a string? For example I might have a list of file paths with several (arbitrary number of) paths in each line, e.g.:
/a/b/c/d/e /f/g/XXX/h/i /j/k/l/m
/n/o/p /q/r/s/t/u /v/x/x/y
/z/XXX/a/b /c/d/e/f
I would like to replace all the slashes in paths containing XXX keping all the others untouched, e.g.:
/a/b/c/d/e #f#g#XXX#h#i /j/k/l/m
/n/o/p /q/r/s/t/u /v/x/x/y
#z#XXX#a#b /c/d/e/f
Unfortunately I cannot come up with a solution. Maybe it's even impossible with sed. But I'm curious if somebody find a way to solve the problem.
We can replace any / preceding XXX with no intervening spaces like this:
# Using extended regex syntax
s!/([^ ]*XXX)!#\1!
It's a very similar substitution for those that follow XXX.
Putting them together in a loop makes this program:
#!/bin/sed -rf
:loop
s!/([^ ]*XXX)!#\1!
s!(XXX[^ ]*)/!\1#!
tloop
Output:
/a/b/c/d/e #f#g#XXX#h#i /j/k/l/m
/n/o/p /q/r/s/t/u /v/x/x/y
#z#XXX#a#b /c/d/e/f
That said, it might be simpler to use a pipeline, to break the file paths into individual lines and then reassemble them after the substitution:
sed -e 's/ *$//;s/ */&\n/g' \
| sed -e '/XXX/y,/,#,' \
| sed -e ':a;/ $/{N;s/\n//;ba}'

What is wrong with this sed expression?

sed 's_((checksum|compressed)=\").*(\")_\1\2_' -i filename
I am using this command to replace the checksum and compressed filed with empty? But it didn't change anything?
for example, I want change this line " checksum="XXXXX" with checksum="", and also replace
compressed="XXXX" with compressed=""
What is wrong with my sed command?
It's because sed uses a funny regex dialect by default: you have to escape capturing brackets.
If you want to use "normal" regex that you're familiar with, use the -r flag (if you're on unix, GNU sed) or the -E flag (Mac OS X BSD sed):
sed -r 's_((checksum|compressed)=\").*(\")_\1\3_' -i filename
Additionally, note that you have three sets of capturing brackets in your sed, and I think you want to change the \1\2 to \1\3. (\1 contains checksum=", \2 contains checksum, and \3 contains ").
(For interest, here's how you would do it without the extended-regexp (-r/-E) flag, note that capturing brackets and the OR | are only considered in the regex sense if they are escaped:
sed 's_\(\(checksum\|compressed\)=\"\).*\(\"\)_\1\3_' -i filename
)
This might work for you:
echo 'checksum="XXXXX" compressed="YYYYYYY"' |
sed 's/\(checksum\|compressed\)="[^"]*"/\1=""/g'
checksum="" compressed=""
In sed (without the -r switch), ()|+?{}'s must have a \ prepended to give them the qualities of grouping. alternation, one or more, zero or one and intervals. .[]* work as metacharacters either way.
Try:
sed 's/\(\(checksum\|compressed\)\)="[^"]*"/\1=""/' -i filename

sed to remove URLs from a file

I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$

Single sed command for multiple substitutions?

I use sed to substitute text in files.
I want to give sed a file which contains all the strings to be searched and replaced in a given file.
It goes over .h and .cpp files. In each file it searches for file names which are included in it. If found, it substitutes for example "a.h" with "<a.h>" (without the quotes).
The script is this:
For /F %%y in (all.txt) do
for /F %%x in (allFilesWithH.txt) do
sed -i s/\"%%x\"/"\<"%%x"\>"/ %%y
all.txt - List of files to do the substitution in them
allFilesWithH.txt - All the include names to be searched
I don't want to run sed several times (as the number of files names in input.txt.) but I want to run a single sed command and pass it input.txt as input.
How can I do it?
P.S I run sed from VxWorks Development shell, so it doesn't have all the commands that the Linux version does.
You can eliminate one of the loops so sed only needs to be called once per file. Use
the -f option to specify more than one substitution:
For /F %%y in (all.txt) do
sed -i -f allFilesWithHAsSedScript.sed %%y
allFilesWithHAsSedScript.sed derives from allFilesWithH.txt and would contain:
s/\"file1\"/"\<"file1"\>"/
s/\"file2\"/"\<"file2"\>"/
s/\"file3\"/"\<"file3"\>"/
s/\"file4\"/"\<"file4"\>"/
(In the article Common threads: Sed by example, Part 3 there are many examples of sed scripts with explanations.)
Don't get confuSed (pun intended).
sed itself has no capability to read filenames from a file. I'm not familiar with the VxWorks shell, and I imagine this is something to do with the lack of answers... So here are some things that would work in bash - maybe VxWorks will support one of these things.
sed -i 's/.../...' `cat all.txt`
sed -i 's/.../...' $(cat all.txt)
cat all.txt | xargs sed -i 's/.../...'
And really, it's no big deal to invoke sed several times if it gets the job done:
cat all.txt | while read file; do sed -i 's/.../.../' $file; done
for file in $(cat all.txt); do # or `cat all.txt`
sed -i 's/.../.../' $file
done
What I'd do is change allFilesWithH.txt into a sed command using sed.
(When forced to use sed. I'd actually use Perl instead, it can also do the search for *.h files.)