need a faster version of substr in perl or other suggestion

need a faster version of substr in perl or other suggestion - perl

Does anybody have any ideas on how to speed up this perl code?
I want to keep the way this loop works by eating each match and starting the match again at position zero of the string on the next loop iteration... the reason why i need it this way is because i'm matching language constructs that span multiple lines..
The problem i'm having is that i'm reading a really large file that has thousands of lines of `define... and I think the call to substr is slowing it down... Looking for alternative ways of doing this that are much faster..
#!/bin/perl
$vlog = "out/tb_asc.sv";
open(my $F, "$vlog") || die("cannot open file: $vlog\n");
#lines = <$F>;
for(my $i=0; $i < #lines; $i++) {chomp($lines[$i])};
$bigline = join("\n", #lines);
close($F);
$movingline = $bigline;
sub nextline($) {
#c = split(/\n/, $moving);
$c = $c[0];
chomp($c);
return $c;
}
print ">> << START\n";
$moving = $bigline;
$moving =~ s|//.*$||mg;
$moving =~ s|\s+$||mg;
while(1) {
# Blank Linke
if ($moving =~ /\A$/m) {
# print ">> blank\n";
$moving = substr $moving, $+[0]+1;
next;
}
# timescale
if ($moving =~ /\A\s*`timescale\s+\w+\s*\/\s*\w+$/m) {
print ">> timescale\n";
$moving = substr $moving, $+[0];
next;
}
if ($moving =~ /\A\s*`define\s+(\w+)\s+(.*)$/m) {
$def = $1;
$val = $2;
print "define: $def $val\n";
$moving = substr $moving, $+[0];
next;
}
if ($moving =~ /\A\s*`ifndef\s+(\w+)$/m) {
$def = $1;
$val = $2;
print "ifndef $def\n";
$moving = substr $moving, $+[0];
next;
}
if ($moving =~ /\A\s*`ifdef\s+(\w+)$/m) {
$def = $1;
$val = $2;
print "ifdef $def\n";
$moving = substr $moving, $+[0];
next;
}
if ($moving =~ /\A\s*`else$/m) {
$def = $1;
$val = $2;
print "else\n";
$moving = substr $moving, $+[0];
next;
}
$c = nextline($moving);
print "\n=> processing:[$c]\n";
die("parse error\n");
}

Related

how to display the hash value from my sample data

I'm learning perl at the moment, i wanted to ask help to answer this exercise.
My objective is to display the hash value of PartID 1,2,3
the sample output is displaying lot, wafer, program, version, testnames, testnumbers, hilimit, lolimit and partid values only.
sample data
lot=lot123
wafer=1
program=prgtest
version=1
Testnames,T1,T2,T3
Testnumbers,1,2,3
Hilimit,5,6,7
Lolimit,1,2,3
PartID,,,,
1,3,0,5
2,4,3,2
3,5,6,3
This is my code:
#!/usr/bin/perl
use strict;
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file ) or die("Error in command line arguments\n");
my $lotid = "";
open(DATA, $file) or die "Couldn't open file $file";
while(my $line = <DATA>) {
#print "$line";
if ( $line =~ /^lot=/ ) {
#print "$line \n";
my ($dump, $lotid) = split /=/, $line;
print "$lotid\n";
}
elsif ($line =~ /^program=/ ) {
my ($dump, $progid) = split /=/, $line;
print "$progid \n";
}
elsif ($line =~ /^wafer=/ ) {
my ($dump, $waferid) = split /=/, $line;
print "$waferid \n";
}
elsif ($line =~ /^version=/ ) {
my ($dump, $verid) = split /=/, $line;
print "$verid \n";
}
elsif ($line =~ /^testnames/i) {
my ($dump, #arr) = split /\,/, $line;
foreach my $e (#arr) {
print $e, "\n";
}
}
elsif ($line =~ /^testnumbers/i) {
my ($dump, #arr1) = split /\,/, $line;
foreach my $e1 (#arr1) {
print $e1, "\n";
}
}
elsif ($line =~ /^hilimit/i) {
my ($dump, #arr2) = split /\,/, $line;
foreach my $e2 (#arr2) {
print $e2, "\n";
}
}
elsif ($line =~ /^lolimit/i) {
my ($dump, #arr3) = split /\,/, $line;
foreach my $e3 (#arr3) {
print $e3, "\n";
}
}
}
Kindly help add to my code to display Partid 1,2,3 hash.

So I've rewritten your code a little to use a few more modern Perl idioms (along with some comments to explain what I've done). The bit I've added is near the bottom.
#!/usr/bin/perl
use strict;
# Added 'warnings' which you should always use
use warnings;
# Use say() instead of print()
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
# Use a lexical variable for a filehandle.
# Use the (safer) 3-argument version of open().
# Add $! to the error message.
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
# Read each record into $_ - which makes the following code simpler
while (<$fh>) {
# Match on $_
if ( /^lot=/ ) {
# Use "undef" instead of a $dump variable.
# split() works on $_ by default.
my (undef, $lotid) = split /=/;
# Use say() instead of print() - less punctuation :-)
say $lotid;
}
elsif ( /^program=/ ) {
my (undef, $progid) = split /=/;
say $progid;
}
elsif ( /^wafer=/ ) {
my (undef, $waferid) = split /=/;
say $waferid;
}
elsif ( /^version=/ ) {
my (undef, $verid) = split /=/;
say $verid;
}
elsif ( /^testnames/i) {
my (undef, #arr) = split /\,/;
# Changed all of these similar pieces of code
# to use the same variable names. As they are
# defined in different code blocks, they are
# completely separate variables.
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^testnumbers/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^hilimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^lolimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# And here's the new bit.
# If we're on the "partid" line, then read the next
# three lines, split each one and print the first
# element from the list returned by split().
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}
Update: By the way, there are no hashes anywhere in this code :-)
Update 2: I've just realised that you only have three different ways to process the data. So you can simplify your code drastically by using slightly more complex regexes.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
while (<$fh>) {
# Single value - just print it.
if ( /^(?:lot|program|wafer|version)=/ ) {
my (undef, $value) = split /=/;
say $value;
}
# List of values - split and print.
elsif ( /^(?:testnames|testnumbers|hilimit|lolimit)/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# Extract values from following lines.
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}

Perl output format

I'm reading a log file and grouping it based on the 'Program' name and in turn its ID.
LOG FILE
------------------------------------------
DEV: COM-1258
Program:Testing
Reviewer:Jackie
Description:New Entries
rev:r145201
------------------------------------------
QA: COM-9696
Program:Testing
Reviewer:Poikla
Description:Some random changes
rev:r112356
------------------------------------------
JIRA: COM-1234
Program:Development
Reviewer:John Wick
Description:Genral fix
rev:r345676
------------------------------------------
JIRA:COM-1234
Program:Development
Reviewer:None
Description:Updating Received
rev:r909276
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Balise Mat
Description:Audited
rev:r876391
------------------------------------------
JIRA: COM-8585
Program:Testing
Reviewer:Gold frt
Description: yet to be reviewed
rev:r565639
The code I have,
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Terse = 1;
my $file = "log.txt";
open FH, $file or die "Couldn't open file: [$!]\n";
my $data = {};
my $hash = {};
while (<FH>)
{
my $line = $_;
chomp $line;
if ($line =~ m/(-){2,}/)
{
my $program = $hash->{Program} || '';
my $jira = $hash->{JIRA} || $hash->{QA} || $hash->{DEV} ||
+'';
if ($program && $jira)
{
push #{$data->{$program}{$jira}}, $hash;
$hash = {};
}
}
else
{
if ($line =~ m/:/)
{
my ($key, $value) = split /:\s*/, $line;
$hash->{$key} = $value;
}
elsif ($line =~ m#/# && exists $hash->{Files})
{
$hash->{Files} .= "\n$line";
}
}
}
print 'data = ' . Dumper($data);
foreach my $prg (sort keys %{$data})
{
print "===========================================================
+=\n";
print " PROGRAM : $prg
+ \n";
print "===========================================================
+=\n";
foreach my $jira (sort keys %{$data->{$prg}})
{
print "******************\n";
print "JIRA ID : $jira\n";
print "******************\n";
foreach my $hash (#{$data->{$prg}{$jira}})
{
foreach my $key (keys %{$hash})
{
# print the data except Program and JIRA
next if $key =~ m/(Program|JIRA|DEV|QA)/;
print " $key => $hash->{$key}\n";
}
print "\n";
}
}
}
I have a requirement to print the output in the below format and currently unable to do so with my logic, any ideas would be really helpful.
PROGRAM: Development
Change IDs:
1.JIRA
a.COM-1234
PROGRAM: Testing
Change IDs:
1.JIRA
a.COM-6789
b.COM-8585
2.QA
a.COM-9696
3.DEV
a.COM-1258

I would write this
use strict;
use warnings 'all';
use List::Util 'uniq';
my $file = 'log.txt';
open my $fh, $file or die "Couldn't open file: [$!]\n";
my #data;
{
my %item;
while ( <$fh> ) {
chomp;
if ( eof or /\-{2,}/ ) {
push #data, { %item } if keys %item;
%item = ();
}
else {
my ( $key, $value ) = split /\s*:\s*/;
next unless $value;
$item{$key} = $value;
$item{jira} = $key if grep { $key eq $_ } qw/ JIRA DEV QA /;
}
}
}
my %data;
{
for my $item ( #data ) {
my ($prog, $jira) = #{$item}{qw/ Program jira /};
push #{ $data{$prog}{$jira} }, $item->{$jira};
}
}
for my $prog ( sort keys %data ) {
printf "PROGRAM: %s\n", $prog;
print "Change IDs:\n";
my $n = 1;
for my $jira ( qw/ JIRA QA DEV / ) {
next unless my $codes = $data{$prog}{$jira};
printf "%d.%s\n", $n++, $jira;
my $l = 'a';
printf " %s.%s\n", $l++, $_ for sort(uniq(#$codes));
}
print "\n";
}
output
PROGRAM: Development
Change IDs:
1.JIRA
a.COM-1234
PROGRAM: Testing
Change IDs:
1.JIRA
a.COM-6789
b.COM-8585
2.QA
a.COM-9696
3.DEV
a.COM-1258

#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $file = 'test';
my $hash;
my $id_hash = ();
my $line_found = 0;
my $line_count = 1;
my $ID;
my $ID_num;
open (my $FH, '<', "$file") or warn $!;
while (my $line = <$FH> ) {
chomp($line);
if ( $line =~ m/------------------------------------------/){
$line_found = 1;
$line_count++;
next;
}
if ( $line_found ) {
$line =~ m/(.*?):(.*)/;
$ID = $1;
$ID_num = $2;
$line_found = 0;
}
if ( $line =~ m/Program:(.*)/ ) {
my $pro = $1;
push #{$hash->{$pro}->{$ID}}, ($ID_num) ;
}
$line_count++;
}
close $FH;
foreach my $pro (keys %$hash){
# print Dumper($pro);
print "PROGRAM:\t$pro\nChange IDs:\n";
foreach my $ids (keys $hash->{$pro}){
print "\t1. $ids\n";
foreach my $id (values $hash->{$pro}->{$ids}){
print "\t\ta. $id\n";
}
}
}
OUTPUT
PROGRAM: Testing
Change IDs:
1. QA
a. COM-9696
1. DEV
a. COM-1258
1. JIRA
a. COM-6789
a. COM-8585
PROGRAM: Development
Change IDs:
1. JIRA
a. COM-1234
a. COM-1234
Just change the output to your need!!

stockholm to fasta format - include accession id in every header

Hello I've multiple sequences in stockholm format, at the top of every alignment there is a accession ID, for ex: '#=GF AC PF00406' and '//' --> this is the end of the alignment. When I'm converting the stockholm format to fasta format I need PF00406 in the header of every sequence of the particular alignment. Some times there will be multiple stockholm alignments in one file. I tried to modify the following perl script, it gave me bizarre results, any help will be greatly appreciated.
my $columns = 60;
my $gapped = 0;
my $progname = $0;
$progname =~ s/^.*?([^\/]+)$/$1/;
my $usage = "Usage: $progname [<Stockholm file(s)>]\n";
$usage .= " [-h] print this help message\n";
$usage .= " [-g] write gapped FASTA output\n";
$usage .= " [-s] sort sequences by name\n";
$usage .= " [-c <cols>] number of columns for FASTA output (default is $columns)\n";
# parse cmd-line opts
my #argv;
while (#ARGV) {
my $arg = shift;
if ($arg eq "-h") {
die $usage;
} elsif ($arg eq "-g") {
$gapped = 1;
} elsif ($arg eq "-s"){
$sorted = 1;
} elsif ($arg eq "-c") {
defined ($columns = shift) or die $usage;
} else {
push #argv, $arg;
}
}
#ARGV = #argv;
my %seq;
while (<>) {
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
#seq =~ s/[\.\-]//g unless $gapped;
$seq{$name} .= $seq;
}
}
printseq();
sub printseq {
if($sorted){
foreach $key (sort keys %seq){
print ">$key\n";
for (my $i = 0; $i < length $seq{$key}; $i += $columns){
print substr($seq{$key}, $i, $columns), "\n";
}
}
} else{
while (my ($name, $seq) = each %seq) {
print ">$name\n";
for (my $i = 0; $i < length $seq; $i += $columns) {
print substr ($seq, $i, $columns), "\n";
}
}
}
%seq = ();
}

Depending on the how much variation there is in the line with the accessionID, you might need to modify the regex, but this works for your example file
my %seq;
my $aln;
while (<>) {
if ($_ =~ /#=GF AC (\w+)/) {
$aln = $1;
}
elsif ($_ =~ /^\s*\/\/\s*$/){
$aln = '';
}
next unless /\S/;
next if /^\s*\#/;
if (/^\s*\/\//) { printseq() }
else {
chomp;
my ($name, $seq) = split;
$name = $name . ' ' . $aln;
$seq{$name} .= $seq;
}
}
printseq();

Pattern Matching in perl

I want to parse some information from the file.
Information in the file:
Rita_bike_house_Sha9
Rita_bike_house
I want to have output like dis
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original"
In order to get that I have used the below code:
$name = #_; # This #_ has all the information from the file that I have shown above.
#For matching pattern Rita_bike_house_Sha9
($a, $b) = $name =~ /\w\d+/;
if ($a ne "" and $b ne "" ) { return ($a,$b) }
# this statement doesnot work at all as its first condition
# before the end is not satisified.
Is there any way where I can store "Rita_bike_house" in $a and "Sha9" in $b? I think my regexp is missing with something. Can you suggest anything?

Please don't use the variables $a and $b in your code. There are used by sort and will confuse you.
Try:
while( my $line = <DATA> ){
chomp $line;
if( $line =~ m{ \A ( \w+ ) _ ( [^_]* \d [^_]* ) \z }msx ){
my $first = $1;
my $second = $2;
print "\$a = $first and \$b = $second\n";
}else{
print "\$a = $line and \$b = \"original\"\n";
}
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house

Not very nice, but the next:
use strict;
use warnings;
while(<DATA>) {
chomp;
next if /^\s*$/;
my #parts = split(/_/);
my $b = pop #parts if $parts[$#parts] =~ /\d/;
$b //= '"original"';
my $a = join('_', #parts);
print "\$a = $a and \$b = $b,\n";
}
__DATA__
Rita_bike_house_Sha9
Rita_bike_house
prints:
$a = Rita_bike_house and $b = Sha9,
$a = Rita_bike_house and $b = "original",

If you are sure that the pattern which is required will always be similar to 'Sha9' and also it will appear at the end then just do a greedy matching....
open FILE, "filename.txt" or die $!;
my #data = <FILE>;
close(<FILE>);
#my $line = "Rita_bike_house_Sha9";
foreach $line (#data)
{
chomp($line);
if ($line =~ m/(.*?)(_([a-zA-Z]+[0-9]+))?$/)
{
$a = $1;
$b = $3 ? $3 : "original";
}
}

Why doesn't my decrypt function work?

This was not written by me; it was written by someone who passed it down to me. I lost contact with the author of the code. I have been using this code for a few years and just now realized this error. It seems that the letter sequence rkey1 messes up the output.
For example turkey1 outputs as decryption as tur79y1. This Perl code should output turkey1 and not tur79y1:
$String = "turkey1";
$e = &encode_escaped(&palace_encrypt($String));
$d = &palace_decrypt(&decode_escaped("'\"".$e."\"'"));
print $d."<br>\n";
KEY REMOVED BY OWNER
sub palace_decrypt
{
local $lastchar = 0;
local $rc = 0;
local #bs;
for($i=length($_[0])-1; $i>=0; $i--) {
local $tmp = ord(substr($_[0], $i, 1));
$bs[$i] = $tmp ^ $palace_key[$rc++] ^ $lastchar;
$lastchar = $tmp ^ $palace_key[$rc++];
}
return join("", map { chr($_) } #bs);
}
sub decode_escaped
{
$_[0] =~ m/\"(.*)\"/;
local $str = $1;
$str =~ s/\\\\/\0/g;
$str =~ s/\\"/"/g;
$str =~ s/\\(..)/pack("c",hex($1))/ge;
$str =~ s/\0/\\/g;
return $str;
}
sub palace_encrypt
{
local $lastchar = 0;
local $rc = 0;
local #bs;
for($i=length($_[0])-1; $i>=0; $i--) {
local $b = ord(substr($_[0], $i, 1));
$bs[$i] = $b ^ $palace_key[$rc++] ^ $lastchar;
$lastchar = $bs[$i] ^ $palace_key[$rc++];
}
return join("", map { chr($_) } #bs);
}
sub encode_escaped
{
local $str = $_[0];
$str =~ s/\\/\\\\/g;
$str =~ s/([^A-Za-z0-9\.\\])/sprintf("\\%2.2X", ord($1))/ge;
return $str;
}

Your problem is that your decode_escaped does not exactly undo what encode_escaped did. Replace it with the following and that should fix your problem.
sub decode_escaped
{
$_[0] =~ m/\"(.*)\"/;
local #str = split /(\\\\)/, $1;
foreach (#str) {
s/\\"/"/g;
s/\\(..)/chr(hex($1))/ge;
s/\\\\/\\/;
}
return join '', #str;
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

need a faster version of substr in perl or other suggestion - perl

Related

how to display the hash value from my sample data

Perl output format

stockholm to fasta format - include accession id in every header

Pattern Matching in perl

Why doesn't my decrypt function work?

Categories

Resources