How to match 2 patterns within a single line - perl

I have the following Java code
fun(GUIBundle.getString("Key1"), GUIBundle.getString("Key2"));
I use Perl to parse the source code, to see whether "Key1" and "Key2" is found within $gui_bundle.
while (my $line = <FILE>) {
$line_number++;
if ($line =~ /^#/) {
next;
}
chomp $line;
if ($line =~ /GUIBundle\.getString\("([^"]+)"\)/) {
my $key = $1;
if (not $gui_bundle{$key}) {
print "WARNING : key '$key' not found ($name $line_number)\n";
}
}
}
However, for the way I write the code, I can only verify "Key1". How I can verify "Key2" as well?

Add the g modifier and put it into a while loop:
while ($line =~ /GUIBundle\.getString\("([^"]+)"\)/g) { ...

Just use the /g modifier to the regular expression match, in list context:
#matches = $line =~ /GUIBundle\.getString\("([^"]+)"\)/g;
Given your example line, #matches will contain strings: 'Key1' and 'Key2'.

Exchange the if construct with a while, and using the global-matching modifier, /g:
while (my $line = <FILE>) {
$line_number++;
next if $line =~ /^#/;
chomp $line;
while ($line =~ /GUIBundle\.getString\("([^"]+)"\)/g) { #"
my $key = $1;
warn "key '$key' not found ($name $line_number)" unless $gui_bundle{$key};
}
}

Related

Perl DBI — ignore few columns in output

I've used this code:
while (my $line = <IN>)
{
chomp $line;
if($line =~ /(.*?: )\{(.+)\}/)
{
my $value2 = $2;
my #values2 = split(/,/, $value2);
my $insertKeys;
my $insertValues;
foreach $data(#values2)
{
chomp $data;
my ($key, $value) = split(/:/, $data);
$key =~ s/"//g;
$value =~ s/"/'/g;
$insertKeys .= $key.',';
$insertValues .= $value.',';
}
Input:
"actor_ip":"127.0.0.1" "note":"From Git" "user":"Username for 'https" "user_id":null "actor":"Username for 'https" "actor_id":null "org_id":null "action":"user.failed_login" "created_at":1412256345456789 "data":{"actor_location":{"location":{"lat":null "lon":null}}}
Output:
KEYS: actor_ip,note,user,user_id,actor,actor_id,org_id,action,created_at,data,lon,
VALUES: '127.0.0.1','From Git','Username for 'https',null,'Username for 'https',null,null,'user.failed_login',1412256456789,{'actor_location',null
I want to remove these two key and values from output Please let me know how to regex these below
"user":"Username for 'https"
"data":{"actor_location":{"location":{"lat":null "lon":null}}}
You simply need to exclude the keys you don't want:
if ($key !~ /^(data|user)$/)
{
$insertKeys .= $key.',';
$insertValues .= $value.',';
}
However, a more flexible design might be to insert key/value pairs into a hash:
my %params;
foreach $data(#values2)
{
chomp $data;
my ($key, $value) = split(/:/, $data);
$key =~ s/"//g;
$value =~ s/"/'/g;
$params{$key} = $value;
}
Then it would be easy to do whatever you want with the parameters later.
Also, you don't show your DBI code, but this code suggests you are manually building the whole insert query string. A safer (and better-designed) approach would be a parameterized query.

Pattern Matching in perl

I want to parse some information from the file.
Information in the file:
Rita_bike_house_Sha9
Rita_bike_house
I want to have output like dis
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original"
In order to get that I have used the below code:
$name = #_; # This #_ has all the information from the file that I have shown above.
#For matching pattern Rita_bike_house_Sha9
($a, $b) = $name =~ /\w\d+/;
if ($a ne "" and $b ne "" ) { return ($a,$b) }
# this statement doesnot work at all as its first condition
# before the end is not satisified.
Is there any way where I can store "Rita_bike_house" in $a and "Sha9" in $b? I think my regexp is missing with something. Can you suggest anything?
Please don't use the variables $a and $b in your code. There are used by sort and will confuse you.
Try:
while( my $line = <DATA> ){
chomp $line;
if( $line =~ m{ \A ( \w+ ) _ ( [^_]* \d [^_]* ) \z }msx ){
my $first = $1;
my $second = $2;
print "\$a = $first and \$b = $second\n";
}else{
print "\$a = $line and \$b = \"original\"\n";
}
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house
Not very nice, but the next:
use strict;
use warnings;
while(<DATA>) {
chomp;
next if /^\s*$/;
my #parts = split(/_/);
my $b = pop #parts if $parts[$#parts] =~ /\d/;
$b //= '"original"';
my $a = join('_', #parts);
print "\$a = $a and \$b = $b,\n";
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house
prints:
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original",
If you are sure that the pattern which is required will always be similar to 'Sha9' and also it will appear at the end then just do a greedy matching....
open FILE, "filename.txt" or die $!;
my #data = <FILE>;
close(<FILE>);
#my $line = "Rita_bike_house_Sha9";
foreach $line (#data)
{
chomp($line);
if ($line =~ m/(.*?)(_([a-zA-Z]+[0-9]+))?$/)
{
$a = $1;
$b = $3 ? $3 : "original";
}
}

basic regex and string manipulation for DNA analysis using perl

I am new to perl and would like to do what I think is some basic string manipulation to DNA sequences stored in an rtf file.
Essentially, my file reads (file is in FASTA format):
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGG
TAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGCGC
GCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCCGGCTTAACCGGGGAG
GGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCC
ACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAG
GCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
AACAGGATTAGATACCCTGGTAGTCCACGCCGT
What I would like to do is read into my file and print the header (header is >LM1) then match the following DNA sequence GTGCCAGCAGCCGC and then print the preceding DNA sequence.
So my output would look like this:
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTAC
I have written the following program:
#!/usr/bin/perl
use strict; use warnings;
open(FASTA, "<seq_V3_V6_130227.rtf") or die "The file could not be found.\n";
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
my $header = $_;
print "$header\n";
}
my $dna = <FASTA>;
if ($dna =~ /(.*?)GTGCCAGCAGCCGC/) {
print "$dna";
}
}
close(FASTA);
The problem is that my program reads the file line by line and the output I am receiving is the following:
>LM1
GACGGTATCTAACCAGAAAGCCACGGCTAACTAC
Basically I don't know how to assign the entire DNA sequence to my $dna variable and ultimately don't know how to avoid reading the DNA sequence line by line. Also I am getting this warning:
Use of uninitialized value $dna in pattern match (m//) at stacked.pl line 14, line 1113.
If anyone could give me some help with writing better code or point me in the correct direction it would be much appreciated.
Using the pos function:
use strict;
use warnings;
my $dna = "";
my $seq = "GTGCCAGCAGCCGC";
while (<DATA>) {
if (/^>/) {
print;
} else {
if (/^[AGCT]/) {
$dna .= $_;
}
}
}
if ($dna =~ /$seq/g) {
print substr($dna, 0, pos($dna) - length($seq)), "\n";
}
__DATA__
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGG
TAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGCGC
GCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCCGGCTTAACCGGGGAG
GGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCC
ACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAG
GCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
AACAGGATTAGATACCCTGGTAGTCCACGCCGT
You can process a file with multiple entries like so:
while (<DATA>) {
if (/^>/) {
if ($dna =~ /$seq/g) {
print substr($dna, 0, pos($dna) - length($seq)), "\n";
$dna = "";
}
print;
} elsif (/^[AGCT]/) {
$dna .= $_;
}
}
if ($dna && $dna =~ /$seq/g) {
print substr($dna, 0, pos($dna) - length($seq)), "\n";
}
Your while statement reads until the end of file. That means at every loop iteration, $_ is the next line in <FASTA>. So $dna = <FASTA> isn't doing what you think it is. It is reading more than you probably want it to.
while(<FASTA>) { #Reads a line here
chomp($_);
if ($_ =~ m/^>/ ) {
my $header = $_;
print "$header\n";
}
$dna = <FASTA> # reads another line here - Causes skips over every other line
}
Now, you need to read the sequence into your $dna. You can update your while loop with an else statement. So if its a head line, print it, else, we add it to $dna.
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
# It is a header line, so print it
my $header = $_;
print "$header\n";
} else {
# if it is not a header line, add to your dna sequence.
$dna .= $_;
}
}
After the loop, you can do your regex.
Note: This solution assumes there is only 1 sequence in the fasta file. If you have more than one, your $dna variable will have all the sequences as one.
Edit: Adding simple a way to handle multiple sequences
my $dna = "";
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
# Does $dna match the regex?
if ($dna =~ /(.*?)GTGCCAGCAGCCGC/) {
print "$1\n";
}
# Reset the sequence
$dna = "";
# It is a header line, so print it
my $header = $_;
print "$header\n";
} else {
# if it is not a header line, add to your dna sequence.
$dna .= $_;
}
}
# Check the last sequence
if ($dna =~ /(.*?)GTGCCAGCAGCCGC/) {
print "$1\n";
}
I came up with a solution using BioSeqIO (and the trunc method from BioSeq from the BioPerl distribution. I also used index to find the subsequence rather than using a regular expression.
This solution does not print out the id, (line begins with >), if the subsequence was not found or if the subsequence begins at the first postion, (and thus no preceding characters).
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my $in = Bio::SeqIO->new( -file => "fasta_junk.fasta" ,
-format => 'fasta');
my $out = Bio::SeqIO->new( -file => '>test.dat',
-format => 'fasta');
my $lookup = 'GTGCCAGCAGCCGC';
while ( my $seq = $in->next_seq() ) {
my $pos = index $seq->seq, $lookup;
# if $pos != -1, ($lookup not found),
# or $pos != 0, (found $lookup at first position, thus
# no preceding characters).
if ($pos > 0) {
my $trunc = $seq->trunc(1,$pos);
$out->write_seq($trunc);
}
}
__END__
*** fasta_junk.fasta
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAA
AGTACTGTCCGTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTT
GACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGG
TAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGCGC
GCAGGCGGTCTTTTAAGTCTGATGTGAAAGCCCCCGGCTTAACCGGGGAG
GGTCATTGGAAACTGGAAGACTGGAGTGCAGAAGAGGAGAGTGGAATTCC
ACGTGTAGCGGTGAAATGCGTAGATATGTGGAGGAACACCAGTGGCGAAG
GCGACTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCA
AACAGGATTAGATACCCTGGTAGTCCACGCCGT
*** contents of test.dat
>LM1
AAGTCTGACGGAGCAACGCCGCGTGTATGAAGAAGGTTTTCGGATCGTAAAGTACTGTCC
GTTAGAGAAGAACAAGGATAAGAGTAACTGCTTGTCCCTTGACGGTATCTAACCAGAAAG
CCACGGCTAACTAC
read the whole file into memory then look for the regexp
while(<FASTA>) {
chomp($_);
if ($_ =~ m/^>/ ) {
my $header = $_;
print "$header\n";
} else {
$dna .= $_;
}
}
if ($dna =~ /(.*?)GTGCCAGCAGCCGC/) {
print $1;
}

Display And Pass Command Line Arguments in Perl

I have the following program "Extract.pl", which opens a file, finds the lines containing "warning....", "info...", "disabling..." then counts and prints the value and number of them. It is working ok.
What I want to do is to create command line arguments for each of the 3 matches - warning, disabling and infos and then run either of them from the command prompt.
Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
my %warnings = ();
my %infos = ();
my %disablings = ();
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /^warning ([a-zA-Z0-9]*):/i) {
++$warnings{$1};
}
if($line =~ /^disabling ([a-zA-Z0-9]*):/i) {
++$disablings{$1};
}
if($line =~ /^info ([a-zA-Z0-9]*):/i) {
++$infos{$1};
}
}
close $file;
foreach my $w (sort {$warnings{$a} <=> $warnings{$b}} keys %warnings) {
print $w . ": " . $warnings{$w} . "\n";
}
foreach my $d (sort {$disablings{$a} <=> $disablings{$b}} keys %disablings) {
print $d . ": " . $disablings{$d} . "\n";
}
foreach my $i (sort {$infos{$a} <=> $infos{$b}} keys %infos) {
print $i . ": " . $infos{$i} . "\n";
}
The builtin special array #ARGV holds all command line arguments to the script, excluding the script file itself (and the interpreter, if called as perl script.pl). In the case of a call like perl script.pl foo bar warnings, #ARGV would contain the values 'foo', 'bar', and 'warnings'. It's a normal array, so you could write something like (assuming the first argument is one of your options):
my ($warning, $info, $disabling);
if ($ARGV[0] =~ /warning/i) { $warning = 1 }
elsif ($ARGV[0] =~ /info/i) { $info = 1 }
elsif ($ARGV[0] =~ /disabling/i) { $disabling = 1 }
# [...] (opening the file, starting the main loop etc...)
if ( $warning and $line =~ /^warning ([a-zA-Z0-9]*)/i ) {
++$warnings{$1};
}
elsif ( $info and $line =~ /^info ([a-zA-Z0-9]*)/i ) {
++$infos{$1};
}
elsif ( $disabling and $line =~ /^disabling ([a-zA-Z0-9]*)/i ) {
++$disablings{$1};
}
I created flag variables for the three conditions before the main loop that goes through the file to avoid a regex compilation on every line of the file.
You could also use the Getopt::Long or Getopt::Std modules. These provide easy and flexible handling of the command line arguments.

perl if line matches regex, ignore line and move onto next line in file

How would you do the following in perl:
for $line (#lines) {
if ($line =~ m/ImportantLineNotToBeChanged/){
#break out of the for loop, move onto the next line of the file being processed
#start the loop again
}
if ($line =~ s/SUMMER/WINTER/g){
print ".";
}
}
Updated to show more code, this is what I'm trying to do:
sub ChangeSeason(){
if (-f and /.log?/) {
$file = $_;
open FILE, $file;
#lines = <FILE>;
close FILE;
for $line (#lines) {
if ($line =~ m/'?Don't touch this line'?/) {
last;
}
if ($line =~ m/'?Or this line'?/){
last;
}
if ($line =~ m/'?Or this line too'?/){
last;
}
if ($line +~ m/'?Or this line as well'?/){
last;
}
if ($line =~ s/(WINTER)/{$1 eq 'winter' ? 'summer' : 'SUMMER'}/gie){
print ".";
}
}
print "\nSeason changed in file $_";
open FILE, ">$file";
print FILE #lines;
close FILE;
}
}
Just use next
for my $line (#lines) {
next if ($line =~ m/ImportantLineNotToBeChanged/);
if ($line =~ s/SUMMER/WINTER/g){
print ".";
}
}
for $line (#lines) {
unless ($line =~ m/ImportantLineNotToBeChanged/) {
if ($line =~ s/SUMMER/WINTER/g){
print ".";
}
}
}
A more concise method is
map { print "." if s/SUMMER/WINTER/g }
grep {!/ImportantLineNotToBeChanged/} #lines;
(I think I got that right.)
Just use the next feature.
for $line (#lines) {
if ($line =~ m/ImportantLineNotToBeChanged/){
#break out of the for loop, move onto the next line of the file being processed
#start the loop again
next;
}
if ($line =~ s/SUMMER/WINTER/g){
print ".";
}
}
Similarly, you can use "last" to finish the loop. For example:
for $line (#lines) {
if ($line =~ m/ImportantLineNotToBeChanged/){
#continue onto the next iteration of the for loop.
#skip everything in the rest of this iteration.
next;
}
if ($line =~ m/NothingImportantAFTERThisLine/){
#break out of the for loop completely.
#continue to code after loop
last;
}
if ($line =~ s/SUMMER/WINTER/g){
print ".";
}
}
#code after loop
Edit: 7pm on 6/13
I took your code and looked at it, rewrote some things and this is what I got:
sub changeSeason2 {
my $file= $_[0];
open (FILE,"<$file");
#lines = <FILE>;
close FILE;
foreach $line (#lines) {
if ($line =~ m/'?Don't touch this line'?/) {
next;
}
if ($line =~ m/'?Or this line'?/){
next;
}
if ($line =~ m/'?Or this line too'?/){
next;
}
if ($line =~ m/\'Or this line as well\'/){
next;
}
if ($line =~ s/(WINTER)/{$1 eq 'winter' ? 'summer' : 'SUMMER'}/gie){
print ".";
}
}
print "\nSeason changed in file $file";
open FILE, ">$file";
print FILE #lines;
close FILE;
}
Hope this helps some.
Unless I'm misunderstanding you, your "break out of the for loop, move onto the next line... [and] start the loop again" is just a complex way of saying "skip the loop body for this iteration".
for $line (#lines) {
unless (($line =~ m/ImportantLineNotToBeChanged/) {
if ($line =~ s/SUMMER/WINTER/g) {
print ".";
}
}
}