Using SED to replace multiple parts of string at once - sed

I have the following line in a file that I need to replace 2 parts of. This is the original line:
'display min(min(sindex,lon=%lon1%,lon=%lon2%),lat=%lat1%,lat=%lat2%)'
I need to replace it with this:
'display amin(actualPrecip,lon=%lon1%,lon=%lon2%,lat=%lat1%,lat=%lat2%)'
I used regexr.com to generate this regex to match the 2 parts, but not sure what to do with it. Basically I need to use sed to do an inplace replacement.
('display min\(min)|(\)\,)
That generated this on regexr.com:
'display min(min(sindex,lon=%lon1%,lon=%lon2%),lat=%lat1%,lat=%lat2%)'
So the first part needs to be replaced with 'display amin( and 2nd match needs to be replaced with just a comma. Is there an easy way to do this using sed?
Cheers, Mike

I ended up doing it in PHP instead of a regular expression, it achieved the same results:
<?php
exec("ls -b *.gs", $result);
foreach($result as $filename) {
echo "filename {$filename} \n";
$data = file_get_contents($filename);
$start = stripos($data, "display min(min(", 1);
$replaced = false;
if ($start !== false) {
$finish = $start + 16;
$data = substr_replace($data, "display amin(", $start, $finish - $start);
$start = stripos($data, "),", $finish+1);
$finish = $start+2;
$data = substr_replace($data, ",", $start, $finish - $start);
$replaced = true;
}
$start = stripos($data, "display max(max(", 1);
if ($start !== false) {
$finish = $start + 16;
$data = substr_replace($data, "display amax(", $start, $finish - $start);
$start = stripos($data, "),", $finish+1);
$finish = $start+2;
$data = substr_replace($data, ",", $start, $finish - $start);
$replaced = true;
}
if ($replaced === true) file_put_contents($filename, $data);
}
?>

This might work for you (GNU sed):
sed 's/min(min(sindex\([^)]*\))/amin(actualPrecip\1/' file
Use pattern matching and a back reference to format as desired.

Related

How to replace a sequence of numbers separated by " _" in a string with a single number

Say i have a string "mg_delay_1_2_it" , whereby i can have varying sequence of the numbers separated by "_" i.e i can have also a string like "mg_delay_1_2_10_it", or "mg_delay_1_2_5_25_30_it". I want to be able to replace the number section with a single number to produce several versions example:
If the string is:mg_delay_1_2_5_25_30_it,
i want to be able to produce mg_delay_1_it ,mg_delay_2_it, mg_delay_5_it ,mg_delay_25_it and mg_delay_30_it from the original string.
Please how do i do this efficiently in perl?
Try this:
use strict;
use warnings;
use Data::Dumper;
my $str = 'mg_delay_1_2_5_25_30_it';
my $start = 'mg_delay';
my $end = 'it';
if (my ($res) = $str =~ /\Q$start\E_((?:\d+_)+)\Q$end\E/) {
my #items = $res =~ /(\d+)/g;
my #versions = map { $start . '_' . $_ . '_' . $end } #items;
print Dumper(\#versions);
}
Output:
$VAR1 = [
'mg_delay_1_it',
'mg_delay_2_it',
'mg_delay_5_it',
'mg_delay_25_it',
'mg_delay_30_it'
];
Alternatively, if $start and $end are not known:
my $str = 'mg_delay_1_2_5_25_30_it';
if (my ($start, $res, $end ) = $str =~ /^((?:(?!_\d).)+)_((?:\d+_)+)(.+)$/) {
my #items = $res =~ /(\d+)/g;
my #versions = map { $start . '_' . $_ . '_' . $end } #items;
print Dumper(\#versions);
}

Printing the search path taken to find item during BFS

I am trying to solve the doublets puzzle problem using Perl. This is one of my first times using Perl so please excuse the messy code.
I have everything working, I believe, but am having an issue printing the shortest path. Using a queue and BFS I am able to find the target word but not the actual path taken.
Does anyone have any suggestions? I have been told to keep track of the parents of each element but it is not working.
#!/usr/bin/perl
use strict;
my $file = 'test';
#my $file = 'wordlist';
open(my $fh, $file);
my $len = length($ARGV[0]);
my $source = $ARGV[0];
my $target = $ARGV[1];
my #words;
# Creates new array of correct length words
while (my $row = <$fh>) {
chomp $row;
my $rowlen = length($row);
if ($rowlen == $len) {
push #words, $row;
}
}
my %wordHash;
# Creates graph for word variations using dictionary
foreach my $word (#words) {
my $wordArr = [];
for (my $i = 0; $i < $len; $i++) {
my $begin = substr($word, 0, $i);
my $end = substr($word, $i+1, $len);
my $key = "$begin" . "_" . "$end";
my $Arr = [];
my $regex = "$begin" . "[a-z]" . "$end";
foreach my $wordTest (#words) {
if ("$wordTest" =~ $regex && "$wordTest" ne "$word") {
push $wordArr, "$wordTest";
}
}
}
$wordHash{"$word"} = $wordArr;
}
my #queue;
push(#queue, "$source");
my $next = $source;
my %visited;
my %parents;
my #path;
# Finds path using BFS and Queue
while ("$next" ne "$target") {
print "$next: ";
foreach my $variation (#{$wordHash{$next}}) {
push(#queue, "$variation");
$parents{"$variation"} = $next;
print "$variation | ";
}
print "\n-----------------\n";
$visited{"$next"} = 1;
push(#path, "$next");
$next = shift(#queue);
while ($visited{"$next"} == 1) {
$next = shift(#queue);
}
}
print "FOUND: $next\n\n";
print "Path the BFS took: ";
print "#path\n\n";
print "Value -> Parent: \n";
for my $key (keys %parents) {
print "$key -> $parents{$key}\n";
}
Before you accept a word from the #queue to be $next, you test to ensure that it's not been %visited. By then, though, damage has been done. The test has ensured a visited word wont become the focus again and hence, will prevent loops but the earlier code updated %parents whether the word had been %visited or not.
If a word has been %visited, you not only want to avoid it becomming the $next candidate, you want to avoid it being a considered $variation as that will screw up %parents. I don't have a word dictionary to test with and you haven't given an example of the failure but I think you can fix this up by shifting the %visited guard into the inner loop where variations are considered;
foreach my $variation (#{$wordHash{$next}}) {
next if %visited{ $variation } ;
push(#queue, "$variation");
... etc ...
This will protect the integrity of your #parents array as well as stop loops. On a small note, you don't need use double quotes when indexing into a hash; as I've done above, just state the scalar variable - using quotes just interpolates the value of the variable which produces the same result.
Your code, IMHO, is excellent for a beginner, BTW.
Update
I've since got a word dictionary and the problem above does exists as well as one other. The code does move one letter at a time from the source but in a near random direction - not necessarily closer to the target. To correct that, I changed the regex you use to build your graph such that the corresponding letter from the target replaces the generic [a-z]. There are also a couple of minor changes - mostly style related. The updated code looks like this;
use v5.12;
my $file = 'wordlist.txt';
#my $file = 'wordlist';
open(my $fh, $file);
my $len = length($ARGV[0]);
my $source = $ARGV[0];
my $target = $ARGV[1];
chomp $target;
my #target = split('', $target);
my #words;
# Creates new array of correct length words
while (my $row = <$fh>) {
$row =~ s/[\r\n]+$//;
my $rowlen = length($row);
if ($rowlen == $len) {
push #words, $row;
}
}
my %wordHash;
# Creates graph for word variations using dictionary
foreach my $word (#words) {
my $wordArr = [];
for (my $i = 0; $i < $len; $i++) {
my $begin = substr($word, 0, $i);
my $end = substr($word, $i+1, $len);
my $key = "$begin" . "_" . "$end";
my $Arr = [];
# my $re_str = "$begin[a-z]$end";
my $regex = $begin . $target[$i] . $end ;
foreach my $wordTest (#words) {
if ($wordTest =~ / ^ $regex $ /x ) {
next if $wordTest eq $word ;
push $wordArr, "$wordTest";
}
}
}
$wordHash{"$word"} = $wordArr;
}
my #queue;
push(#queue, "$source");
my $next = $source;
my %visited;
my %parents;
my #path;
# Finds path using BFS and Queue
while ($next ne $target) {
print "$next: ";
$visited{$next} = 1;
foreach my $variation (#{$wordHash{$next}}) {
next if $visited{ $variation } ;
push(#queue, $variation);
$parents{$variation} = $next;
print "$variation | ";
}
print "\n-----------------\n";
push(#path, $next);
while ( $visited{$next} ) {
$next = shift #queue ;
}
}
push #path, $target ;
print "FOUND: $next\n\n";
print "Path the BFS took: #path\n\n";
print "Value -> Parent: \n";
for my $key (keys %parents) {
print "$key -> $parents{$key}\n";
}
and when ran produces;
./words.pl head tail | more
head: heal |
-----------------
heal: teal | heil |
-----------------
teal:
-----------------
heil: hail |
-----------------
hail: tail |
-----------------
FOUND: tail
Path the BFS took: head heal teal heil hail tail
Value -> Parent:
hail -> heil
heil -> heal
teal -> heal
tail -> hail
heal -> head
You could probably remove the printing of the %parents hash - as hash values come out randomly, it doesnt tell you much

Start time always getting same in Perl for my script

I have made a script to extract the content of log files and to calculate the time difference if the task is complete.
Suppose I have four jobs and each job has thre individual tasks, so far I need the start of each task, and just print it.
Everything is fine except when I try to initialise to make it convenient, by using $j, $l which are used as sort of two-dimensional array.
The problem is at the output where I get the same "Started at" for each job.
The values of $counter and $l should be the root cause.
Can anyone help? I tried my best and am sort of newbie.
sub getdate {
my $line = $_[0];
($hrs, $min) = split(':', $line, 3);
return $hrs, $min;
}
print FILE "<html><head>\n";
print FILE "<title>CGI Test</title>\n";
print FILE "</head>\n";
print FILE "<body>";
print FILE "<font size=\"5\" color=\"#008080\" face=\"Tahoma\"><b><u><br>";
print FILE "PBI Batch for 22/02/2013";
print FILE "</font></b></u><br><br><br>";
my $i = 0;
my $j = 0;
my $l = 0;
my #sample;
#print FILE "<h4>";
foreach $header (<COLLECTION>) {
chomp($header);
($heading, $filepath) = split(',', $header);
#$two[$i]="<font size=\"3\"color=\"#008000\" face=\"Tahoma\"><b><u><br>";
#$two[$i]="<font size=\"3\" color=".$color." face=\"Tahoma\"><b><u><br>";
$two[$i] .= $heading;
#$two[$i] .= "</font></u></b><br>";
#print FILE "<font size=\"3\" color=\"#008000\" face=\"Tahoma\"><b><u><br>";
# print FILE $two[$i];
#print FILE $heading;
#print FILE "</font></u></b><br>";
#print $filepath."\n";
open(MYFILE1, $filepath) or die 'Could nont openfile';
my $counter;
foreach $list (<MYFILE1>) {
chomp($list);
($file, $path) = split(',', $list);
#print FILE $file;
my #secondstart;
my #secondend;
my $secondcounter = 0;
#print FILE "valueofllllllllllllllllllllllllllll".$l;
foreach $counter ($file) {
print FILE "valueofllllllllllllllllllllllllllll" . $l;
$l++;
$sample[$j][$l] = $counter;
print FILE "secCOUNTER " . $secondcounter;
$secondcounter++;
}
print FILE " space";
open(MYFILE, $path) or die 'ERRROR';
my $count = 0;
foreach $line (<MYFILE>) {
my #endtime;
$flag = 'false';
#$counter++;
$count++;
print FILE $count . "========";
if ($count == 1) {
($hrs, $min) = getdate($line);
$starttime[$j][$l] = ($hrs * 60) + $min;
}
else {
($hrs, $min) = split(':', $line, 3);
if ($line =~ m/End of Procedure/) {
$flag = 'true';
$endtime[$j][$l] = $hrs . $min;
$endtime[$j][$l] = ($hrs * 60) + $min;
}
else {
$endtime[$j][$l] = ($hrs * 60) + $min;
}
}
$duration[$j][$l] = $endtime[$j][$l] - $starttime[$j][$l];
}
# print $flag;
#print FILE $file." : ";
#print FILE "value of ".$j."and".$l;
$startstatus[$j][$l] = "Started at" . $starttime[$j][$l];
$durationstatus[$j][$l] = "&nbspDuration is " . $duration[$j][$l] . "m";
# print FILE "Started at".$starttime;
# print FILE "&nbspDuration is ".$duration."m";
# print FILE "<br>";
close(MYFILE);
}
my $valueofl = $l;
#print FILE "vlaeeofl".$valueofl;
print "valueofllllllllllllllllllllllllllll" . $l;
$l = 0;
if ($flag eq 'true') {
$status = 'Completed';
$color = '#008000';
print FILE "<font size=\"3\" color="
. $color
. " face=\"Tahoma\"><b><u><br>"
. $two[$i]
. "</font></u></b><br>";
print FILE $status . "<br>";
while ($l <= $valueofl) {
#print $j."and".$l;
# print "valueofllllllllllllllllllllllllllll".$l;
print FILE $sample[$j][$l] . "&nbsp&nbsp&nbsp&nbsp";
print FILE $startstatus[$j][$l] . "&nbsp&nbsp&nbsp&nbsp";
print FILE $durationstatus[$j][$l] . "<br>";
$l++;
}
# print FILE $startstatus[$j][0];
# print FILE $durationstatus[$j][0];
}
else {
#print "valueofllllllllllllllllllllllllllll".$l;
#print $j."and".$l;
$status = 'In Progress';
$color = 'blue';
print FILE "<font size=\"3\" color="
. $color
. " face=\"Tahoma\"><b><u><br>"
. $two[$i]
. "</font></u></b><br>"
. $status;
}
$i++;
$j++;
}
print FILE "</body>";
print FILE "</html>";
close(FILE);
close(MYFILE1)
This is a shocking piece of Perl. You must always start you program with use strict and use warnings, and declare all variables as close as possible to their first point of use using my. That is the most basic form of debugging, and it is only polite to do this at the very least before asking other people for help.
The problem is likely to lie in your for statement
foreach $counter ($file) { ... }
which will execute the body of the loop just once, with $content set to the value of $file. I can't imagine what you meant it to do.

Discovering duplicate lines

I've got a file of CSS elements, and I'm trying to check for any duplicate CSS elements,.. then output the lines that show the dupe lines.
###Test
###ABC
###test
##.hello
##.ABC
##.test
bob.com###Test
~qwerty.com###Test
~more.com##.ABC
###Test & ##.ABC already exists in the list, and I'd like a way to output the lines that are used in the file, basically duplication checking (case sensitive). So using the above list, I would generate something like this..
Line 1: ###Test
Line 7: bob.com###Test
Line 8: ~qwerty.com###Test
Line 5: ##.ABC
Line 9: ~more.com##.ABC
Something in bash, or maybe perl?
Thanks :)
I've been challenged by your problem, so I wrote you a script. Hope you liked it. :)
#!/usr/bin/perl
use strict;
use warnings;
sub loadf($);
{
my #file = loadf("style.css");
my #inner = #file;
my $l0 = 0; my $l1 = 0; my $l2 = 0; my $dc = 0; my $tc;
foreach my $line (#file) {
$l1++;
$line =~ s/^\s+//;
$line =~ s/\s+$//;
foreach my $iline (#inner) {
$l2++;
$iline =~ s/^\s+//;
$iline =~ s/\s+$//;
next if ($iline eq $line);
if ($iline =~ /\b$line\b/) {
$dc++;
if ($dc > 0) {
if ($l0 == 0) {
print "Line " . $l1 . ": " . $line . "\n";
$l0++;
}
print "Line " . $l2 . ": " . $iline . "\n";
}
}
}
print "\n" unless($dc == 0);
$dc = 0; $l0 = 0; $l2 = 0;
}
}
sub loadf($) {
my #file = ( );
open(FILE, $_[0] . "\n") or die("Couldn't Open " . $_[0] . "\n");
#file = <FILE>;
close(FILE);
return #file;
}
__END__
This does exactly what you need. And sorry if it's a bit messy.
This seems to work:
sort -t '#' -k 2 inputfile
It groups them by the part after the # characters:
##.ABC
~more.com##.ABC
###ABC
##.hello
##.test
###test
bob.com###Test
~qwerty.com###Test
###Test
If you only want to see the unique values:
sort -t '#' -k 2 -u inputfile
Result:
##.ABC
###ABC
##.hello
##.test
###test
###Test
This pretty closely duplicates the example output in the question (it relies on some possibly GNU-specific features):
cat -n inputfile |
sed 's/^ *\([0-9]\)/Line \1:/' |
sort -t '#' -k 2 |
awk -F '#+' '{if (! seen[$2]) { \
if ( count > 1) printf "%s\n", lines; \
count = 0; \
lines = "" \
}; \
seen[$2] = 1; \
lines = lines "\n" $0; ++count}
END {if (count > 1) print lines}'
Result:
Line 5: ##.ABC
Line 9: ~more.com##.ABC
Line 1: ###Test
Line 7: bob.com###Test
Line 8: ~qwerty.com###Test
I'd recommend using the uniq function if you can install MoreUtils:
how-do-i-print-unique-elements-in-perl-array
Here is one way to do it, which is fairly easy to extend to multiple files if need be.
With this file find_dups.pl:
use warnings;
use strict;
my #lines;
while (<>) { # read input lines
s/^\s+//; s/\s+$//; # trim whitespace
push #lines, {data => $_, line => $.} if $_ # store useful data
}
#lines = sort {length $$a{data} <=> length $$b{data}} #lines; # shortest first
while (#lines) {
my ($line, #found) = shift #lines;
my $re = qr/\Q$$line{data}\E$/; # search token
#lines = grep { # extract matches from #lines
not $$_{data} =~ $re && push #found, $_
} #lines;
if (#found) { # write the report
print "line $$_{line}: $$_{data}\n" for $line, #found;
print "\n";
}
}
then perl find_dups.pl input.css prints:
line 5: ##.ABC
line 9: ~more.com##.ABC
line 1: ###Test
line 7: bob.com###Test
line 8: ~qwerty.com###Test

Why does perl "hash of lists" do this?

I have a hash of lists that is not getting populated.
I checked that the block at the end that adds to the hash is in fact being called on input. It should either add a singleton list if the key doesn't exist, or else push to the back of the list (referenced under the right key) if it does.
I understand that the GOTO is ugly, but I've commented it out and it has no effect.
The problem is that when printhits is called, nothing is printed, as if there are no values in the hash. I also tried each (%genomehits), no dice.
THANKS!
#!/usr/bin/perl
use strict;
use warnings;
my $len = 11; # resolution of the peaks
#$ARGV[0] is input file
#$ARGV[1] is call number
# optional -s = spread number from call
# optional -o specify output file name
my $usage = "see arguments";
my $input = shift #ARGV or die $usage;
my $call = shift #ARGV or die $usage;
my $therest = join(" ",#ARGV) . " ";
print "the rest".$therest."\n";
my $spread = 1;
my $output = $input . ".out";
if ($therest =~ /-s\s+(\d+)\s/) {$spread = $1;}
if ($therest =~ /-o\s+(.+)\s/) {$output = $1;}
# initialize master hash
my %genomehits = ();
foreach (split ';', $input) {
my $mygenename = "err_naming";
if ($_ =~ /^(.+)-/) {$mygenename = $1;}
open (INPUT, $_);
my #wiggle = <INPUT>;
&singlegene(\%genomehits, \#wiggle, $mygenename);
close (INPUT);
}
&printhits;
#print %genomehits;
sub printhits {
foreach my $key (%genomehits) {
print "key: $key , values: ";
foreach (#{$genomehits{$key}}) {
print $_ . ";";
}
print "\n";
}
}
sub singlegene {
# let %hash be the mapping hash
# let #mygene be the gene to currently process
# let $mygenename be the name of the gene to currently process
my (%hash) = %{$_[0]};
my (#mygene) = #{$_[1]};
my $mygenename = $_[2];
my $chromosome;
my $leftbound = -2;
my $rightbound = -2;
foreach (#mygene) {
#print "Doing line ". $_ . "\n";
if ($_ =~ "track" or $_ =~ "output" or $_ =~ "#") {next;}
if ($_ =~ "Step") {
if ($_ =~ /chrom=(.+)\s/) {$chromosome = $1;}
if ($_ =~ /span=(\d+)/) {$1 == 1 or die ("don't support span not equal to one, see wig spec")};
$leftbound = -2;
$rightbound = -2;
next;
}
my #line = split /\t/, $_;
my $pos = $line[0];
my $val = $line[-1];
# above threshold for a call
if ($val >= $call) {
# start of range
if ($rightbound != ($pos - 1)) {
$leftbound = $pos;
$rightbound = $pos;
}
# middle of range, increment rightbound
else {
$rightbound = $pos;
}
if (\$_ =~ $mygene[-1]) {goto FORTHELASTONE;}
}
# else reinitialize: not a call
else {
FORTHELASTONE:
# typical case, in an ocean of OFFs
if ($rightbound != ($pos-1)) {
$leftbound = $pos;
}
else {
# register the range
my $range = $rightbound - $leftbound;
for ($spread) {
$leftbound -= $len;
$rightbound += $len;
}
#print $range . "\n";
foreach ($leftbound .. $rightbound) {
my $key = "$chromosome:$_";
if (not defined $hash{$key}) {
$hash{$key} = [$mygenename];
}
else { push #{$hash{$key}}, $mygenename; }
}
}
}
}
}
You are passing a reference to %genomehits to the function singlegene, and then copying it into a new hash when you do my (%hash) = %{$_[0]};. You then add values to %hash which goes away at the end of the function.
To fix it, use the reference directly with arrow notation. E.g.
my $hash = $_[0];
...
$hash->{$key} = yadda yadda;
I think it's this line:
my (%hash) = %{$_[0]};
You're passing in a reference, but this statement is making a copy of your hash. All additions you make in singlegene are then lost when you return.
Leave it as a hash reference and it should work.
PS - Data::Dumper is your friend when large data structures are not behaving as expected. I'd sprinkle a few of these in your code...
use Data::Dumper; print Dumper \%genomehash;