I've a file with the following text:
<RecordID>02.037.D00221700080.0</RecordID>
2.35
AB
<RecordID>02.037.D00221700080.1</RecordID>
2.45
BB
<RecordID>02.037.D00221700080.2</RecordID>
6.5
CC
I wish to remove the dots, between <RecordID> and </RecordID> to get this:
<RecordID>02037D002217000800</RecordID>
2.35
AB
<RecordID>02037D002217000801</RecordID>
2.45
BB
<RecordID>02037D002217000802</RecordID>
6.5
CC
I've tried different approaches with sed, all of them without results...
Thanks in advance!
Using sed:
sed '/<RecordID>/s/\.//g' file
<RecordID>02037D002217000800</RecordID>
2.35
AB
<RecordID>02037D002217000801</RecordID>
2.45
BB
<RecordID>02037D002217000802</RecordID>
6.5
CC
Use this Perl one-liner:
perl -pe '/RecordID/ and tr/.//d;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
Related
Input file
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
I need to only remove .HTML extension from below URL from above file:
perl
zoiduser
So that the final output should look like:
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
This is what I am doing:
sed '/"http\|"www\|"mailto/ ! s|\(.html\)||g' file
But it ignores the line as soon as it matches the first pattern i.e. avoid URLs that start with "http|"www|"mailto.
You can use
sed -E 's/("(http|www|mailto)[^"]*")|\.html/\1/g' file
Details:
-E - enables POSIX ERE syntax
("(http|www|mailto)[^"]*") - Group 1 (\1): " and then either http, www, or mailto and then zero or more chars other than " and then a "
| - or
\.html - .html string.
The replacement is Group 1 values.
See the online demo:
#!/bin/bash
s='perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net'
sed -E 's/("(http|www|mailto)[^"]*")|\.html/\1/g' <<< "$s"
Output:
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
It is not recommended to parse HTML using shell utilities like sed, awk, perl etc. But if you really have to use negation of certain keywords then I would suggest this perl:
perl -pe 's/"(?!www|http|mailto)([^"]+)\.html/"\1/g' f.html
perl http://zoidberg.sourceforge.net
zoiduser perl http://zoidberg.sourceforge.net
(?!www|http|mailto) is negative lookahead to fail the match if these keywords appear just after "
With your shown samples only, please try following awk code. Simple explanation would be, checking if line contains <a href="(http|mailto|www): in line and if this condition is TRUE then using sub function to substitute first .html" occurrence with "> and then print that line, next will skip printing lines where conditions are not met. 1 will print lines where conditions are not met.
awk '/<a href="(http|mailto|www):/ && sub(/.html">/,"\">"){print;next} 1' Input_file
I have multiple datasets in txt format which have a predictable content. I am trying to remove the first set of lines. The first line starts with >*chromosome and I want to delete everything until >*plasmid. I can either tell it to delete everything from > until it encounters it again or delete everything between the first > and the second >. I have been trying something like this:
sed -i.bak '/>/,/^\>*$/{d}' file.txt
This did not work the original code I found was:
sed -i.bak '/>/,/^\s*$/{d}' file.txt
Use this Perl one-liner:
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta
EXAMPLE:
# Create example input file:
cat > in.fasta <<EOF
>foo
TCGA
>chromosome
ACGT
>plasmid
CGTA
EOF
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta > out.fasta
Output in out.fasta:
>foo
TCGA
>plasmid
CGTA
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-0777 : Slurp files whole.
The regex uses these modifiers:
/m : Allow multiline matches.
/s : Allow . to match a newline.
^>chromosome.*(?=^>plasmid) : Regex that matches >chromosome starts starts at the beginning of the line, followed by 0 or more characters, and ending right at (but not including) the match to >plasmid at the beginning of the line. The expression (?=PATTERN) is zero-length positive lookahead.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
On a Unix system I am trying to add a new line in a file using sed or perl but it seems I am missing something.
Supposing my file has multiple lines of texts, always ending like this {TNG:}}${1:F01.
I am trying to find a to way to add a new line after the }$, in this way {1 should always start on a new line.
I tried it by escaping $ sign using this:
perl -e '$/ = "\${"; while (<>) { s/\$}\{$/}\n{/; print; }' but it does not work.
Any ideas will be appreciated.
give this a try:
sed 's/{TNG:}}\$/&\n/' file > newfile
The sed will by default use BRE, that is, the {}s are literal characters. But we must escape the $.
kent$ cat f
{TNG:}}${1:F01.
kent$ sed 's/{TNG:}}\$/&\n/' f
{TNG:}}$
{1:F01.
With perl:
$ cat input.txt
line 1 {TNG:}}${1:F01
line 2 {TNG:}}${1:F01
$ perl -pe 's/TNG:\}\}\$\K/\n/' input.txt
line 1 {TNG:}}$
{1:F01
line 2 {TNG:}}$
{1:F01
(Read up on the -p and -n options in perlrun and use them instead of trying to do what they do in a one-liner yourself)
I have been using Linux env and recently migrated to solaris. Unfortunately one of my bash scripts requires the use of grep with the P switch [ pcre support ] .As Solaris doesnt support the pcre option for grep , I am obliged to find another solution to the problem.And pcregrep seems to have an obvious loop bug and sed -r option is unsupported !
I hope that using perl or nawk will solve the problem on solaris.
I have not yet used perl in my script and am unware neither of its syntax nor the flags.
Since it is pcre , I beleive that a perl scripter can help me out in a matter of minutes. They should match over multiple lines .
Which one would be a better solution in terms of efficiency the awk or the perl solution ?
Thanks for the replies .
These are some grep to perl conversions you might need:
grep -P PATTERN FILE(s) ---> perl -nle 'print if m/PATTERN/' FILE(s)
grep -Po PATTERN FILE(s) ---> perl -nle 'print "$1\n" while m/(PATTERN)/g' FILE(s)
That's my guess as to what you're looking for, if grep -P is out of the question.
Here's a shorty:
grep -P /regex/ ====> perl -ne 'print if /regex/;'
The -n takes each line of the file as input. Each line is put into a special perl variable called $_ as Perl loops through the whole file.
The -e says the Perl program is on the command line instead of passing it a file.
The Perl print command automatically prints out whatever is in $_ if you don't specify for it to print out anything else.
The if /regex/ matches the regular expression against whatever line of your file is in the $_ variable.
In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe