Check if string starts with given string - perl

I am trying to check if message after "5:16:51:209|INFO| " starts with "Marker". I need to add string "|ICD" after timstamp.
input is :" 05:16:51:209|INFO|Markerprocedure Magnet "
I tried this regex, but its not working. Please help me to get it correct.
if ( $lines[$i] =~ m/(\d{2}:\d{2}:\d{2}:\d{3})|(\w+)|^Marker/)
{
$lines[$i] =~ s/(\d{2}:\d{2}:\d{2}:\d{3})(.*)/$1|ICD$2/ ;
}

I am trying to check if message after "5:16:51:209|INFO| " starts with "Marker"
What it seems to me you're trying to check is whether Marker immediately follows 5:16:51:209|INFO| so it isn't correct to use the ^ regex character because that checks to see whether the start of the string occurs in that position (which, of course, it doesn't). So remove the ^ character and Perl will check whether Marker immediately follows.
Also, you need to escape the | characters like this: \| to prevent it being treated as an alternation command in the regex. Then you can do the test and replace in a single substitution command:
if ( $lines[$i] =~ s/(\d{2}:\d{2}:\d{2}:\d{3})(\|\w+\|Marker)/$1|ICD$2/ )
{
# Line contained "Marker" and "|ICD" inserted
}
Example:
$ echo '15:16:51:209|INFO|Marker blah' | perl -ple 's/(\d{2}:\d{2}:\d{2}:\d{3})(\|\w+\|Marker)/$1|ICD$2/'
Output is:
15:16:51:209|ICD|INFO|Marker blah
Edit: #Prix has pointed out in the comments that if the timestamp is meant to appear at the start of the string, then the ^ start-marker should be at the start of the regex to prevent accidental matches in other parts of the string (and for performance):
s/^(\d{2}:\d{2}:\d{2}:\d{3})(\|\w+\|Marker)/$1|ICD$2/
↑
Use ^ here to anchor the search to the beginning of the string.

Related

Perl: how to format a string containing a tilde character "~"

I have run into an issue where a perl script we use to parse a text file is omitting lines containing the tilde (~) character, and I can't figure out why.
The sample below illustrates what I mean:
#!/usr/bin/perl
use warnings;
formline " testing1\n";
formline " ~testing2\n";
formline " testing3\n";
my $body_text = $^A;
$^A = "";
print $body_text
The output of this example is:
testing1
testing3
The line containing the tilde is dropped entirely from the accumulator. This happens whether there is any text preceding the character or not.
Is there any way to print the line with the tilde treated as a literal part of the string?
~ is special in forms (see perlform) and there's no way to escape it. But you can create a field for it and populate it with a tilde:
formline " \#testing2\n", '~';
The first argument to formline is the "picture" (template). That picture uses various characters to mean particular things. The ~ means to suppress output if the fields are blank. Since you supply no fields in your call to formline, your fields are blank and output is suppressed.
my #lines = ( '', 'x y z', 'x~y~z' );
foreach $line ( #lines ) { # forms don't use lexicals, so no my on control
write;
}
format STDOUT =
~ ID: #*
$line
.
The output doesn't have a line for the blank field because the ~ in the picture told it to suppress output when $line doesn't have anything:
ID: x y z
ID: x~y~z
Note that tildes coming from the data are just fine; they are like any other character.
Here's probably something closer to what you meant. Create a picture, #* (variable-width multiline text), and supply it with values to fill it:
while( <DATA> ) {
local $^A;
formline '#*', $_;
print $^A, "\n";
}
__DATA__
testing1
~testing2
testing3
The output shows the field with the ~:
testing1
~testing2
testing3
However, the question is very odd because the way you appear to be doing things seems like you aren't really doing what formats want to do. Perhaps you have some tricky thing where you're trying to take the picture from input data. But if you aren't going to give it any values, what are you really formatting? Consider that you may not actually want formats.

how can i use perl to calculate the frequency of a variable

PASS AC=0;AF=0.048;
AN=2;
ASP;
BaseQRankSum=0.572;
CAF=[0.9605,.,0.03949];
CLNACC=RCV000111759.1,RCV000034730
I'm a new here.I want to know how to match CAF = [0.9605,.,0.03949] using regular expression,thank you.
while (<>) {
if (
/^CAF= # start of line, then literal 'CAF='
\[ # literal '['
[^\]]+ # 1+ characters different from ']'
\]; # closing ']'
/x
)
{
print;
}
}
The /x modifier allows for linebreaks and comments in the regex (to improve readability).
Or, as a one liner:
perl -ne 'print if (/^CAF=\[[^\]]+\];/);' <your_file>
This prints the complete lines containing the desired pattern.
You need to read the documentation for Perl regex. What you are asking doesn't look more complex than a beginner could match having read the docs:
http://perldoc.perl.org/perlre.html

index argument contains . perl

If a string contains . representing any character, index doesn't match on it. What to do so that it takes . as any character?
For ex,
index($str, $substr)
if $substr contains . anywhere, index will always return -1
thanks
carol
That is not possible. The documentation says:
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
...
The keywords, you can use for further googlings are:
perl regular expression wildcard
Update:
If you just want to know, if your string matches, using a regular expression could look like that:
my $string = "Hello World!";
if( $string =~ /ll. Worl/ )
{
print "Ahoi! Position: ".($-[0])."\n";
}
This is matching a single character.
$-[0] is the offset into the string of the beginning of the entire
match.
-- http://perldoc.perl.org/perlvar.html
If you want to have a pattern, that is matching an arbitary amount of arbitary characters, you could choose a pattern like...
...
if( $string =~ /ll.*orl/ )
{
...
See perlvar for further information about special perl variables. You will find the variable #LAST_MATCH_START and some explanation about $-[0] over there. There are several more variables, that can help you to find sub matches and to gather other interessting information about your matches...
From perldoc -f index, you can see index() doesn't have any regex syntax:
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-
expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after
its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or
whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the
base, ordinarily "-1"
A simple test:
$ perl -e 'print index("1234567asdfghj.","j.")'
13
Use regex:
$str =~ /$substr/g;
$index = pos();

Quantifier follows nothing in regex

My requirement is to print the files having 'xyz' text in their file names using perl.
I tried below and got the following error
Quantifier follows nothing in regex marked by <-- HERE in m/* <-- HERE xyz.xlsx$/;
use strict;
use warnings;
my #files = qw(file_xyz.xlsx,file.xlsx);
my #my_files = grep { /*xyz.xlsx$/ } #files;
for my $file (#my_files) {
print "The output $file \n";
}
Problem is coming when I add * in grep regular expression.
How can I possibly achieve this?
The * is a meta character, called a quantifier. It means "repeat the previous character or character class zero or more times". In your case, it follows nothing, and is therefore a syntax error. What you probably are trying is to match anything, which is .*: Wildcard, followed by a quantifier. However, this is already the default behaviour of a regex match unless it is anchored. So all you need is:
my #my_files = grep { /xyz/ } #files;
You could keep your end of the string anchor xlsx$, but since you have a limited list of file names, that hardly seems necessary. Though you have used qw() wrong, it is not comma separated, it is space separated:
my #files = qw(file_xyz.xlsx file.xlsx);
However, if you should have a larger set of file names, such as one read from a directory, you can place a wildcard string in the middle:
my #my_files = grep { /xyz.*\.xlsx$/i } #files;
Note the use of the /i modifier to match case insensitively. Also note that you must escape . because it is another meta character.

Perl - partial pattern matching in a sequence of letters

I am trying to find a pattern using perl. But I am only interested with the beginning and the end of the pattern. To be more specific I have a sequence of letters and I would like to see if the following pattern exists. There are 23 characters. And I'm only interested in the beginning and the end of the sequence.
For example I would like to extract anything that starts with ab and ends with zt. There is always
So it can be
abaaaaaaaaaaaaaaaaaaazt
So that it detects this match
but not
abaaaaaaaaaaaaaaaaaaazz
So far I tried
if ($line =~ /ab[*]zt/) {
print "found pattern ";
}
thanks
* is a quantifier and meta character. Inside a character class bracket [ .. ] it just means a literal asterisk. You are probably thinking of .* which is a wildcard followed by the quantifier.
Matching entire string, e.g. "abaazt".
/^ab.*zt$/
Note the anchors ^ and $, and the wildcard character . followed by the zero or more * quantifier.
Match substrings inside another string, e.g. "a b abaazt c d"
/\bab\S*zt\b/
Using word boundary \b to denote beginning and end instead of anchors. You can also be more specific:
/(?<!\S)ab\S*zt(?!\S)/
Using a double negation to assert that no non-whitespace characters follow or precede the target text.
It is also possible to use the substr function
if (substr($string, 0, 2) eq "ab" and substr($string, -2) eq "zt")
You mention that the string is 23 characters, and if that is a fixed length, you can get even more specific, for example
/^ab.{19}zt$/
Which matches exactly 19 wildcards. The syntax for the {} quantifier is {min, max}, and any value left blank means infinite, i.e. {1,} is the same as + and {0,} is the same as *, meaning one/zero or more matches (respectively).
Just a * by itself wont match anything (except a literal *), if you want to match anything you need to use .*.
if ($line =~ /^ab.*zt$/) {
print "found pattern ";
}
If you really want to capture the match, wrap the whole pattern in a capture group:
if (my ($string) = $line =~ /^(ab.*zt)$/) {
print "found pattern $string";
}