How to trim ' . ' and ' | ' in a single command using PERL? - perl

I have a string like this $data = .|abc|bcd|cde|.
I need the string like this : abc|bcd|cde.
So I do :
$data =~ s/\|$//; # trim the last '|' out...
$data =~ s/^\.| +//gm ; #trim '.' in the begining
$data =~ s/^\|//; # trim '|' in the begining
But the problem I am facing is, the script is taking too long to execute. Is there any way to complete the whole operation with a single command ??
(Also tried chop($data) but that takes out only the last |)
Please suggest...

$data =~ s/(^[.|]*)|([.|]*$)//g;
That said, I don't assume that this will speed up your script significantly.

Another way: $data =~ s/^\.\|(.*)\|/$1/
But as Rene said, your speed bottleneck is probably somewhere else in your script.

Related

how to save the pattern you search for in perl

if ($_=~m/^[\w](.+)\n/)
{
$seq.= $1;
}
I am using this pattern to recognise a character sequence but I also want to include the first character (the [\w])
If you want to include the first character, include it into the capturing part of the regex:
$seq .= $1 if /^(\w.+)\n/;
The parens determine what ends up in $1, so you want
if ($_=~m/^([\w].+)\n/)
{
$seq.= $1;
}
This simplifies to
$seq .= $1 if /^(\w.+\n)/;
You probably meant .* (0 or more non-linefeed) instead of .+ (1 or more non-linefeed).
$seq .= $1 if /^(\w.*)\n/;
I'd write that as follows:
chomp;
$seq .= $_ if /^\w/;
This last one is not strictly equivalent.
It doesn't check if the second character of $_ is a non-linfeed.
It doesn't check if $_ contains a line feed.
If $_ contains a line feed, it's expected to be the last character of the string.
$_ is modified.

Extract the numbers at the end from a file name using perl

I am unable to extract the last digits in the filename and rename the file placing the digits at the beginning of the file.
Like suppose my file name is "Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285_371925.out"
I want to rename the file as "371925_Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285.out"
my $newFileName ='Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285_371925.out';
my ($digits) = $newFileName =~ /(\d+)/g;
my $newFileName_2="${digits}_Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285_371925.out"
try:
$newFileName =~ /(\d+)\.out/;
my $digits = $1;
my $newFileName_2=$digits."_Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285_371925.out";
(\d+)\.out/ should give you all Digits before .out
Try this:
$newFileName =~ s/(.*?)_(\d+)\.out/$2_$1\.out/;
Or
$newFileName =~ s/(.*?)_(\d+)(\.\w+)/$2_$1$3/;
You can do it with a single regex:
my $newFileName = 'Gen_TCC_TIF_2110_413010_L3TL_Ae6TL707285_371925.out';
my $newFileName_2 = $filename =~ s/(.*)_(\d+)(?=\.out)/$2_$1/r;
# or, for older Perl, use this instead:
(my $newFileName_2 = $filename) =~ s/(.*)_(\d+)(?=\.out)/$2_$1/;
# or, to modify directly the variable $newFileName:
$newFileName =~ s/(.*)_(\d+)(?=\.out)/$2_$1/;
Or if you want to get the digits (because you need them for something else), then you can do:
my ($digits) = $newFileName =~ /_(\d+)\.out/;
You were using /g modifier, which made your regex match all blocks of digits, which isn't what you wanted. (even worst, it was returning an array, but you were only keeping the first element (2110) in the scalar $digit )

need help on Regular expression

We have a file with below syntax:
I/P : abc_com.an.gx3d_02-20-2014_05-26-38.txt
O/P : abc_com.an.gx3d
I am trying to remove the part which starts with a timestamp. I tried with the below code but it's not working:
(my $test = $file) =~ s/^\d{2}\.*//;
Your ^ anchor is forcing your regex to only match at the beginning of the string. You probably want something closer to the following:
(my $test = $file) =~ s/_\d{2}-\d{2}-\d{4}_.*//;

Parse UTF-8 HTML to CSV Ascii using Perl

First off I am a little new to this, so the answer may be that it is up to the consumer, however, I have the following code:
#!/usr/bin/perl
open(RESPONSE,"response.xml")
$result ="";
while(<RESPONSE>){
next unless $. > 1
$line = $_
$line =~ "<html><body>";
$line =~ "</body></html>";
$result .= $line
}
print "$result";
exit 0;
But this still outputs \n and \r\n explicitly. I tried adding the following...
use Encode
...
$final = decode_utf8($result);
print "$final";
But I still see the chars when I open up the doc generated by this shell command....
perl parse.pl > "outfile.csv"
So for example
<html><body>test,a\r\ntest2,b<body></html>
Stays as test,a\r\ntest2,b in the csv
Thanks!
If you want to parse HTML or XML then use an HTML or XML parser. If you want to create a CSV file then use a CSV file module.
This problem has nothing at all do to with the differences between Unicode and ASCII.

Rename a file with perl

I have a file in a different folder I want to rename in perl, I was looking at a solution earlier that showed something like this:
#rename
for (<C:\\backup\\backup.rar>) {
my $file = $_;
my $new = $file . 'backup' . $ts . '.rar';
rename $file, $new or die "Error, can not rename $file as $new: $!";
}
however backup.rar is in a different folder, I did try putting "C:\backup\backup.rar" in the <> above, however I got the same error.
C:\Program Files\WinRAR>perl backup.pl
String found where operator expected at backup.pl line 35, near "$_ 'backup'"
(Missing operator before 'backup'?)
syntax error at backup.pl line 35, near "$_ 'backup'"
Execution of backup.pl aborted due to compilation errors.
I was using
# Get time
my #test = POSIX::strftime("%m-%d-%Y--%H-%M-%S\n", localtime);
print #test;
To get the current time, however I couldn't seem to get it to rename correctly.
What can I do to fix this? Please note I am doing this on a windows box.
Pay attention to the actual error message. Look at the line:
my $new = $_ 'backup'. #test .'.rar';
If you want to interpolate the contents of $_ and the array #test into a string like that, you need to use:
my $new = "${_}backup#test.rar";
but I have a hard time making sense of that.
Now, strftime returns a scalar. Why not use:
my $ts = POSIX::strftime("%m-%d-%Y--%H-%M-%S", localtime);
my $new = sprintf '%s%s%s.rar', $_, backup => $ts;
Incidentally, you might end up making your life much simpler if you put the time stamp first and formatted it as YYYYMMDD-HHMMSS so that there is no confusion about to which date 04-05-2010 refers.
The line
my $new = $_ 'backup'. #test .'.rar';
probably should read
my $new = $file . 'backup' . #test . '.rar';
(You were missing a concatenation operator, and it is clearer to use the named variable from the line before than reusing $_ there...)
I think you missed the string concat symbol . (the period) :
my $new = $_ 'backup'. #test .'.rar';
should be
my $new = $_ . 'backup' . #test . '.rar';
A slight side issue but you don't need
for (<C:\\backup\\backup.rar>) {
my $file = $_;
.....
}
The < > construct would be useful if you were expanding a wildcard but you are not.
Be thoughtful of future readers of this code (you in a year!) and write
my $file = 'C:\backup\backup.rar' ;
Note the single quotes which doen't expand backslashes.