how can i use perl to calculate the frequency of a variable - perl

PASS AC=0;AF=0.048;
AN=2;
ASP;
BaseQRankSum=0.572;
CAF=[0.9605,.,0.03949];
CLNACC=RCV000111759.1,RCV000034730
I'm a new here.I want to know how to match CAF = [0.9605,.,0.03949] using regular expression,thank you.

while (<>) {
if (
/^CAF= # start of line, then literal 'CAF='
\[ # literal '['
[^\]]+ # 1+ characters different from ']'
\]; # closing ']'
/x
)
{
print;
}
}
The /x modifier allows for linebreaks and comments in the regex (to improve readability).
Or, as a one liner:
perl -ne 'print if (/^CAF=\[[^\]]+\];/);' <your_file>
This prints the complete lines containing the desired pattern.

You need to read the documentation for Perl regex. What you are asking doesn't look more complex than a beginner could match having read the docs:
http://perldoc.perl.org/perlre.html

Related

How to insert a colon between word and number

I want to insert a colon between word and number then add a new line after a number.
For example:
"cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
my expected output:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
My attempt :
$Bday=~ /^([a-z]||\_)/:/^([0-9])/
print "\n";
#!/usr/bin/perl
use warnings;
use strict;
my $str = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011";
$str =~ s/\s*([a-z_]+)((?: \d+)+)/$1:$2\n/g;
print $str;
produces your desired output from your sample input.
Edit: Note the use of the s operator for regular expression substitution. One of the many problems with your code is that you're not using that (IF your intent is to modify the string in place and not extract bits from it for further processing)
One more variant -
> cat test_perl.pl
#!/usr/bin/perl
use strict;
use warnings;
while ( "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011" =~ m/([a-z_]+)\s+([0-9 ]+)/g )
{
print "$1:$2\n";
}
> test_perl.pl
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
>
The original code $Bday=~ /^([a-z]||\_)/:/^([0-9])/ doesn't make much sense. Apart from missing a semicolon and having too many delimiters (matching patterns are of the format /.../ or m/.../ and replacing ones s/.../.../), it could never match anything.
([a-z]||\_) would match:
one lowercase ASCII letter (a through z);
an empty string (the space between the two |s; or
one underscore (escape with a backslash is superfluous).
To get it (or the corresponding subexpression for numbers) to match a sequence of one
or more of the characters, you need to follow it with a +.
^([0-9]) would fail to match unless it was at the beginning of the string. There it would match a single digit.
My solution (taking into account the later comments by the OP about having input such as cat[1] or dog3):
use strict;
use warnings;
my $bday = "cat 11052000 cow_and_owner_ 01011999 12031981 dog 22032011 cat[1] 01012018 dog3 02012018";
# capture groups:
# $1------------------------\ $2-------------\
$bday =~ s/([A-Za-z][A-Za-z0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;
print $bday;
will print out:
cat:11052000
cow_and_owner_:01011999 12031981
dog:22032011
cat[1]:01012018
dog3:02012018
Breakdown:
[A-Za-z]: Begin with a letter.
[A-Za-z0-9_\[\]]*: Follow with zero or more letters, numbers, underscores and square brackets.
\h+: Separate with one or more horizontal whitespace.
\d+(?:\h+\d+)*: One sequence of digits (\d+) followed by zero or more sequences of horizontal whitespace and digits.
(?!\S): Can't be followed by non-whitespace.
\s*: Consume following whitespace (including line feeds; this allows the input to be separated on multiple lines, as long as a single entry is not spread on multiple lines. To get that, replace all the \h+ with \s+.).
The replace pattern will repeat (the /g modifier) sequentially in the source string as long as it matches, placing each heading-date record on its own line and then proceeding with the rest of the string.
Note that if your headers (dog etc.) might contain non-ASCII letters, use \pL or \p{XPosixAlpha} instead of [A-Za-z]:
$bday =~ s/\pL[\pL0-9_\[\]]*)\h+(\d+(?:\h+\d+)*)(?!\S)\s*/$1:$2\n/g;

Append string in the beginning and the end of a line containing certain string

all
I want to know how to append string in the beginning and the end of a line containing certain string using perl?
So for example, my line contains:
%abc %efd;
and I want to append 123 at the beginning of the line and 456 at the end of the line, so it would look like this:
123 %abc %efd 456
8/30/16 UPDATE--------------------------------
So far I have done something like this:
foreach file (find . -type f)
perl -ne 's/^\%abc\s+(\S*)/**\%abc $1/; print;' $file > tmp; mv tmp $file
end
foreach file (find . -type f)
perl -ne 's/$\%def\;\s+(\S*)/\%def\;**\n $1/; print;' $file > tmp; mv tmp $file
end
so this does pretty well except that when abc and def are not in one string.
for example:
%abc
something something something
%def
this would turn out to be
%abc
something something something
%def;
which is not what I want.
Thank you
In you case, you want to append string when line of file match the certain string, it means match and replace.
Firstly, read each line of your input file.
Secondly, check if it match with the string you want to append string into the beginning and the end.
Then replace the match string by the new string which contain additional beginning string, the match string and additional end string.
my $input_file = 'your file name here';
my $search_string = '%abc %efd';
my $add_begin = '123';
my $add_end = '456';
# Read file
open(my $IN, '<', $input_file) or die "cannot open file $input_file";
# Check each line of file
while (my $row = <$IN>) {
chomp $row;
$row =~ s/^($search_string)$/$add_begin $1 $add_end/g;
print $row."\n";
}
Try with input file as below:
%abc %efd
asdahsd
234234
%abc
%efd
%abc%efd
You will receive the result as we expected:
123 %abc %efd 456
asdahsd
234234
%abc
%efd
%abc%efd
Modify the code as your requirement and contact me if there's any issue.
Use m modifier to replacing beginning and ending with line by line.
s/^\%abc/123 $&/mg;
s/\%def$/ 456/mg;
Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. source
Welcome to StackOverflow. We strive to help people solve problems in their existing code and learn languages, rather than simply answer one-off questions, the solutions to which can be easily found in 101 tutorials and documentation. The type of question you've posted doesn't leave a lot of room for learning, and doesn't do much to help future learners. It would help us greatly if you could post a more complete example, including what you've tried so far to get it working.
All that being said, there are two main ways to prepend and append to a string in Perl: 1. the concatenation operator, . and 2. string interpolation.
Concatenation
Use a . to join two strings together. You can chain operations together to compose a longer string.
my $str = '%abc %efd';
$str = '123 ' . $str . ' 456';
say $str; # prints "123 %abc %efd 456" with a trailing newline
Interpolation
Enclose a string in double quotes to instruct Perl to interpolate (i.e. find and evaluate) any Perl-style variables enclosed within the string.
my $str = '%abc %efd';
$str = "123 $str 456";
say $str; # prints "123 %abc %efd 456" with a trailing newline
You'll notice that in both examples we prepended and appended to the existing string. You can also create new variable(s) to hold the result(s) of these operations. Other methods of manipulating and building strings include the printf and sprintf functions, the substr function, the join function, and regular expressions, all of which you will encounter as you continue learning Perl.
As far as looking to see if a string contains a certain substring before performing the operation, you can use the index function or a regular expression:
if (index($str, '%abc %efd') >= 0) {
# or...
if ($str =~ /%abc %efd/) {
Remember to use strict; at the top of your Perl scripts and always (at least while you're learning) declare variables with my. If you're having trouble with the say function, you may need to add the statement use feature 'say'; to the top of your script.
You can find an index of excellent Perl tutorials at learn.perl.org. Good luck and have fun!
UPDATE Here is (I believe) a complete answer to your revised question:
find . -type f -exec perl -i.bak -pe's/^(%abc)\s+(\S*)\s+(%def;)$/**\1 \2 \3**/'
This will modify the files in place and create backup files with the extension .bak. Keep in mind that the expression \S* will only match non-whitespace characters; if you need to match strings that contain whitespace, you will need to update this expression (something like .*? might be workable for you).

Quantifier follows nothing in regex

My requirement is to print the files having 'xyz' text in their file names using perl.
I tried below and got the following error
Quantifier follows nothing in regex marked by <-- HERE in m/* <-- HERE xyz.xlsx$/;
use strict;
use warnings;
my #files = qw(file_xyz.xlsx,file.xlsx);
my #my_files = grep { /*xyz.xlsx$/ } #files;
for my $file (#my_files) {
print "The output $file \n";
}
Problem is coming when I add * in grep regular expression.
How can I possibly achieve this?
The * is a meta character, called a quantifier. It means "repeat the previous character or character class zero or more times". In your case, it follows nothing, and is therefore a syntax error. What you probably are trying is to match anything, which is .*: Wildcard, followed by a quantifier. However, this is already the default behaviour of a regex match unless it is anchored. So all you need is:
my #my_files = grep { /xyz/ } #files;
You could keep your end of the string anchor xlsx$, but since you have a limited list of file names, that hardly seems necessary. Though you have used qw() wrong, it is not comma separated, it is space separated:
my #files = qw(file_xyz.xlsx file.xlsx);
However, if you should have a larger set of file names, such as one read from a directory, you can place a wildcard string in the middle:
my #my_files = grep { /xyz.*\.xlsx$/i } #files;
Note the use of the /i modifier to match case insensitively. Also note that you must escape . because it is another meta character.

How can i make substitutions of the same word in Perl on the same xml line?

I'm working on an XML Document, I need to open it and transform to uppercase some specific tag values on the same line. If I have the same word it only does the substitution for one of them although I'm using two different if loops:
This is my XML:
<pageID="1" width="827" height="1169" Sender_Company="société" Sender_Address="société" Sender_Fax="" Category="C2" Language_2="" Document_Object="" Language_1="french" Language_3="" NumPage="1" Script_1="typed">
This is my code:
while (<FILEIN>) {
if ($_ =~ /pageID="1"/) {
$haschanged = 1;
if ($_ !~ /Sender_Address=""/) {
if ($_ =~ /(Sender_Address="(.*?)")/){
my $SenderAddress = $2;
$SenderAddress = uc($SenderAddress);
$_ =~ s/$1/Sender_Address="$SenderAddress"/;
}
}
if ($_ !~ /Sender_Company=""/) {
if ($_ =~ /(Sender_Company="(.*?)")/) {
my $SenderCompany = $2;
$SenderCompany = uc($SenderCompany);
$_ =~ s/$1/Sender_Company="$SenderCompany"/;
#print "$_\n";
}
}
}
}
When I use two different values for Sender_Company="bla" and Sender_Address="société" the transformation to uppercase works but when I use in this case the same word Sender_Company="société" and Sender_Address="société" it doesn't do the transformation to uppercase.
Does anyone have any ideas? I can't find the logic behind it not wanting to transform the same word when I'm using two distinct if loops at a time. Thank you!
Your understanding of XML is a bit debatable:
That isn't XML. It is an XML fragment at most (Element not closed, tag name can't double as attribute like <pageID="1">, no <?xml ...?> declaration, no root element, …)
Don't parse XML with regexes ;-)
XML doesn't have a concept of “lines”.
Besides of that, the code should work fine. Do note that you can make your life easy, and your code short:
$_ =~ /foo/ is the same as /foo/, $_ !~ /foo/ is the same as !/foo/.
Instead of extracting two captures, and substituting the result in a second regex, you can do it all in just one step:
s{ (?<=Sender_Address=") ([^"]+) (?=") }{ uc $1 }ex
Wait, what? I extract one or more non-"-characters that are preceded by the string Sender_Address=" and are followed by " (look-around assertions). The thing in between I capture, and substitute it with an uppercased version. Because I match at least one character, I don't have to test for the empty tag case. The /e flag allows code in the substitution (not really neccessary here), and the /x allows us to include nonmatching whitespace for better formatting.
You can easily extend this for both attributes you want to uppercase:
# This subsumes your whole logic inside `if (/pageID="1"/)`
$haschanged = 1;
for my $attr (qw/Sender_Address Sender_Company/) {
s{ (?<=\Q$attr\E=") ([^"]+) (?=") }{ uc $1 }ex;
}
The \Q...\E causes the interpolated stuff to match literally, even if it contains characters that would be regex metacharacters otherwise.
There are a few remaining bugs:
You fail to uppercase characters that are given as entities.
XML allows single quotes '...' to be used as tag value delimiters. You don't handle them
See the points under Your understanding of XML…
All of these can be solved by using an XML parser, and then transforming the attributes in the DOM.

How do I ignore multiple newlines in perl?

Suppose I have a file with these inputs:
line 1
line 2
line3
My program should only store "line1", "line2" and "line3" not the newlines. How do I achieve that?
My program already removed leading and trailing whitespaces but it doesn't help to remove newline.
I am setting $/ as \n because each input is separated by a \n.
while (<>) {
chomp;
next unless /\S/;
print "$_\n";
}
Set
$/ = q(); # that's an empty string, like "" or ''
while (<>) {
chomp;
...
}
The special value of the defined empty string is how you tell the input operator to treat one or more newlines as the terminator (preferring more), and also to get chomp to remove them all. That way each record always starts with real data.
Perl -n is the equivalent of wrapping while(<>) { } around your script. Assuming that all you need to do is eliminate blank lines, you can do it like this:
#! /usr/bin/perl -n
print unless ( /^$/ );
... On the other hand, if that's all you need to do, you might as well ditch perl and use
grep -n '^$'
Edit: your post says that you want to store values where lines are not blank... in that case, assuming that you don't have too much work to do in the rest of your script, you might do something like this:
#! /usr/bin/perl -n
my #values;
push #values, $_ unless ( /^$/ );
END {
# do whatever work you want to do here
}
... but this quickly reaches a point of limiting returns if you have very much code inside the END{} block.