Storing regex with '^' and '$' inside constant [closed] - perl

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I revised my code and realized I stored the regex inside constant and then used the latter's value for the variable
I'm trying to store a regular expression inside variable constant using the qr// operator. Everything is fine except for '^' and '$'. I need them to match beginning-of-line and end-of-line respectively.
use constant REGEX_LINE => qr/\^(\s*)(.*)\$/;
my $rx = REGEX_LINE;
Printing $rx reveals that it contains some addiotional stuff:
(?^:^(\s*)(.*)$)
Of course now the regex doesn't match my data

If you expect ^ and $ to match start and end of line,
don't escape them (or else they will match ^ and $), and
use /m (or else they will match the start and end of the string).
use constant REGEX_LINE => qr/^(\s*)(.*)$/m;

Add escape character(\) before $ symbol otherwise it will consider as a part of variable

Related

Find and replace a string in Perl [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I have the following command line:
perl -i -pe 's/_GSV*//g' file.fasta
My goal is change some sequences that have the following pattern:
GSVIVG01006342001_GSVIVT01006342001
I want to find all sequences that starts with _GSV and finish with anything (that`s why I put the '*') and substitute for nothing.
When I run my command it just recognize the _GSV and return to me that:
GSVIVG01006342001IVT01006342001
and I want that:
GSVIVG01006342001
Can anybody tell me what's wrong with my command line?
before the *, include a dot that means any character
perl -i -pe 's/_GSV.*//g' file.fasta
You can also include the symbol $ to ensure you arrive until the end of the string
perl -i -pe 's/_GSV.*$//g' file.fasta

Perl search term [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to write the following statement as a Perl search expression:
Find all occurrences of the word "cat" followed by the word "dog" within 13 characters.
So for example the text "catajdwos dogqwzv" would be a result.
Do someone know how to do this?
I'd use a regular expression evaluated by the match operator. You'd use the g modifier of the match operator to find all occurrences.
while ($str =~ /...pattern.../g) {
...
}
Refer to your class notes on regular expressions to compose the pattern you need.
The following should work...
$str = "sdfcatsdfdogffdfcatsdfjljlfflkfjflkjfdogsfsd";
#arr = $str =~ /(cat).{0,10}dog/sgi;
print join(',', #arr), "\n";
s to match over newlines, g to extract all matched instances, and i to ignore case.
I'm not sure what you mean by 'within 13', but I've assumed here that as many as 10 characters can separate the 't' in cat from the 'd' in dog.

Remove a string which is not present in parenthesis ()? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a file which consist a data as below, and I want to remove which data not present
in the parenthesis.
hello (welcome) to chennai (hai)
hello (how) this is for testing (with)
[is] this (bhuvanesh)
I want the output as below
(welcome) (hai)
(how) (with)
(bhuvanesh)
You can use the following sed command:
sed 's/[^(]*\(([^)]\+)\)[^(]*/\1/g' input.txt
Explanation:
I'm using the substitute command. In it's basic form it looks like this:
s/SEARCH/REPLACE/g
the g at end the means global, and means sed should reaplace all occurences of SEARCH not just the first.
The SEARCH pattern looks like this:
[^(]*\(([^)]\+)\)[^(]*
I'll try to explain it step by step...
[^(]*
[] is a character class, the ^ at the beginning means that the characters listed in the class should not match. We are listing only a single character - the opening parenthesis (. The * means this can occur zero or more times. In one sentence, sed is searching for all characters before the first starting parenthesis (.
\(([^)]\+)\)
(...) is a matching group. In the basic sed language it needs to get escaped: \(...\). The first character in the matching group is the opening parenthesis (. A character class [^)] is following. It matches every character except of the closing parenthesis ). The quantifier \+ means there must be at least one character between the parenthesises in your input text, if you would like to allow empty content you need to use the * as quantifier here. It follows the closing parenthesis ) and the end of the matching group \)..
Through the usage of the matching group, the matched content is available via \1 now.
The last part of the search pattern is the same as the first part:
[^(]*
It matches everything until the next opening parenthesis.
The REPLACE pattern is simple. It throws away everything except of the content of matching group \1.
This awk would do:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "(%s) ",$i;print ""}' file
(welcome) (hai)
(how) (with)
(bhuvanesh)
Or like this:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "%s ",$i;print ""}' file
welcome hai
how with
bhuvanesh
Try this one.
sed -r 's/\[.*\][^(]*//g ; s/.*(\(.*\)).*(\(.*\))/\1\2/g'

Using sed how to remove last character only in the first line [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How can I use sed to remove the last character from only the first line of a file?
You can for example use this:
sed '1 s/.$//' file
Explanation
1 indicates the line in which we want to perform the action.
given the syntax s/text/replacement/, we look for any character with . followed by $, which indicates end of line. Hence, we look for the last character before end of line and replace it with nothing. That is, we remove the last character of the line.
To edit the file you can use -i.bak.
Test
$ cat a
hello this is some text
and this is something else
$ sed '1 s/.$//' a
hello this is some tex
and this is something else
For fun, let's see how to accomplish this with awk:
awk -v FS= -v OFS= 'NR==1{NF=NF-1}1' file
This sets the input and output field separators (FS, OFS) as empty (same as BEGIN{FS=OFS=""}), so every single character is a field. Based on that, when the record is 1 (in this case, when we are in the 1st line), decrement the number of fields (NF) so that the last character is "lost". Then 1 is a true condition that makes awk perform its default action: {print $0}.

Using concat() function, is specific cases [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I need concatenate regexp pattern pieces, for this pattern, I use C-style Escape E.
If use concatenation operator ||, works:
E'a{'||2||'}'
does not make much sense, but just interes, how to concatenate same, using concat() function ?
The misunderstanding is this: C-style escapes are just another way to input string literals. When you concatenate strings, be it with the || operator or with the concat() function (Postgres 9.1+), the method how individual strings were input is irrelevant.
In addition to that, literals of other types (like the numeric constant 2 in your example) are coerced to text automatically.
On top of that, your example does not exhibit any characters with a special meaning in escape strings (like \).
SELECT E'a{' || 2 || '}';
SELECT concat(E'a{', 2, '}');
So, the E is totally irrelevant in this particular example.
Since you mention regexp patterns: those tend to have \ in them, which have to be escaped with \ in E'' notation:
SELECT E'\\.' || 2 || '\.';
The modern way is not to use escape strings at all if not necessary. That's why Postgres switched to standard_conforming_strings = ON with PostgreSQL 9.1. That is the setting I tested with.