How to use sed to remove dots between 2 patterns

How to use sed to remove dots between 2 patterns - sed

I've a file with the following text:
<RecordID>02.037.D00221700080.0</RecordID>
2.35
AB
<RecordID>02.037.D00221700080.1</RecordID>
2.45
BB
<RecordID>02.037.D00221700080.2</RecordID>
6.5
CC
I wish to remove the dots, between <RecordID> and </RecordID> to get this:
<RecordID>02037D002217000800</RecordID>
2.35
AB
<RecordID>02037D002217000801</RecordID>
2.45
BB
<RecordID>02037D002217000802</RecordID>
6.5
CC
I've tried different approaches with sed, all of them without results...
Thanks in advance!

Using sed:
sed '/<RecordID>/s/\.//g' file
<RecordID>02037D002217000800</RecordID>
2.35
AB
<RecordID>02037D002217000801</RecordID>
2.45
BB
<RecordID>02037D002217000802</RecordID>
6.5
CC

Use this Perl one-liner:
perl -pe '/RecordID/ and tr/.//d;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

Related

sed to ignore a pattern as well as match a pattern in same line

Input file
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
I need to only remove .HTML extension from below URL from above file:
perl
zoiduser
So that the final output should look like:
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
This is what I am doing:
sed '/"http\|"www\|"mailto/ ! s|\(.html\)||g' file
But it ignores the line as soon as it matches the first pattern i.e. avoid URLs that start with "http|"www|"mailto.

You can use
sed -E 's/("(http|www|mailto)[^"]*")|\.html/\1/g' file
Details:
-E - enables POSIX ERE syntax
("(http|www|mailto)[^"]*") - Group 1 (\1): " and then either http, www, or mailto and then zero or more chars other than " and then a "
| - or
\.html - .html string.
The replacement is Group 1 values.
See the online demo:
#!/bin/bash
s='perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net'
sed -E 's/("(http|www|mailto)[^"]*")|\.html/\1/g' <<< "$s"
Output:
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net

It is not recommended to parse HTML using shell utilities like sed, awk, perl etc. But if you really have to use negation of certain keywords then I would suggest this perl:
perl -pe 's/"(?!www|http|mailto)([^"]+)\.html/"\1/g' f.html
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
(?!www|http|mailto) is negative lookahead to fail the match if these keywords appear just after "

With your shown samples only, please try following awk code. Simple explanation would be, checking if line contains <a href="(http|mailto|www): in line and if this condition is TRUE then using sub function to substitute first .html" occurrence with "> and then print that line, next will skip printing lines where conditions are not met. 1 will print lines where conditions are not met.
awk '/<a href="(http|mailto|www):/ && sub(/.html">/,"\">"){print;next} 1' Input_file

Deleting lines between two characters using sed

I have multiple datasets in txt format which have a predictable content. I am trying to remove the first set of lines. The first line starts with >*chromosome and I want to delete everything until >*plasmid. I can either tell it to delete everything from > until it encounters it again or delete everything between the first > and the second >. I have been trying something like this:
sed -i.bak '/>/,/^\>*$/{d}' file.txt
This did not work the original code I found was:
sed -i.bak '/>/,/^\s*$/{d}' file.txt

Use this Perl one-liner:
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta
EXAMPLE:
# Create example input file:
cat > in.fasta <<EOF
>foo
TCGA
>chromosome
ACGT
>plasmid
CGTA
EOF
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta > out.fasta
Output in out.fasta:
>foo
TCGA
>plasmid
CGTA
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-0777 : Slurp files whole.
The regex uses these modifiers:
/m : Allow multiline matches.
/s : Allow . to match a newline.
^>chromosome.*(?=^>plasmid) : Regex that matches >chromosome starts starts at the beginning of the line, followed by 0 or more characters, and ending right at (but not including) the match to >plasmid at the beginning of the line. The expression (?=PATTERN) is zero-length positive lookahead.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Parse file and insert new line after each occurrence

On a Unix system I am trying to add a new line in a file using sed or perl but it seems I am missing something.
Supposing my file has multiple lines of texts, always ending like this {TNG:}}${1:F01.
I am trying to find a to way to add a new line after the }$, in this way {1 should always start on a new line.
I tried it by escaping $ sign using this:
perl -e '$/ = "\${"; while (<>) { s/\$}\{$/}\n{/; print; }' but it does not work.
Any ideas will be appreciated.

give this a try:
sed 's/{TNG:}}\$/&\n/' file > newfile
The sed will by default use BRE, that is, the {}s are literal characters. But we must escape the $.
kent$ cat f
{TNG:}}${1:F01.
kent$ sed 's/{TNG:}}\$/&\n/' f
{TNG:}}$
{1:F01.

With perl:
$ cat input.txt
line 1 {TNG:}}${1:F01
line 2 {TNG:}}${1:F01
$ perl -pe 's/TNG:\}\}\$\K/\n/' input.txt
line 1 {TNG:}}$
{1:F01
line 2 {TNG:}}$
{1:F01
(Read up on the -p and -n options in perlrun and use them instead of trying to do what they do in a one-liner yourself)

grep regex to perl or awk

I have been using Linux env and recently migrated to solaris. Unfortunately one of my bash scripts requires the use of grep with the P switch [ pcre support ] .As Solaris doesnt support the pcre option for grep , I am obliged to find another solution to the problem.And pcregrep seems to have an obvious loop bug and sed -r option is unsupported !
I hope that using perl or nawk will solve the problem on solaris.
I have not yet used perl in my script and am unware neither of its syntax nor the flags.
Since it is pcre , I beleive that a perl scripter can help me out in a matter of minutes. They should match over multiple lines .
Which one would be a better solution in terms of efficiency the awk or the perl solution ?
Thanks for the replies .

These are some grep to perl conversions you might need:
grep -P PATTERN FILE(s) ---> perl -nle 'print if m/PATTERN/' FILE(s)
grep -Po PATTERN FILE(s) ---> perl -nle 'print "$1\n" while m/(PATTERN)/g' FILE(s)
That's my guess as to what you're looking for, if grep -P is out of the question.

Here's a shorty:
grep -P /regex/ ====> perl -ne 'print if /regex/;'
The -n takes each line of the file as input. Each line is put into a special perl variable called $_ as Perl loops through the whole file.
The -e says the Perl program is on the command line instead of passing it a file.
The Perl print command automatically prints out whatever is in $_ if you don't specify for it to print out anything else.
The if /regex/ matches the regular expression against whatever line of your file is in the $_ variable.

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.

This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out

Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt

Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to use sed to remove dots between 2 patterns - sed

Using sed: sed '/<RecordID>/s/\.//g' file <RecordID>02037D002217000800</RecordID> 2.35 AB <RecordID>02037D002217000801</RecordID> 2.45 BB <RecordID>02037D002217000802</RecordID> 6.5 CC

Related

sed to ignore a pattern as well as match a pattern in same line

Deleting lines between two characters using sed

Parse file and insert new line after each occurrence

grep regex to perl or awk

Perl regex to act on a file from the command line

Categories

Resources