How to truncate the extension for special case in Perl? - perl

I'm working on a script to truncate all the extensions for a file using the regex as below but it seem doesn't works well as this command does remove some data that I want as it will basically removing everything whenever it see a dot.
The regex I use currently:-
/\..*?$/
It would remove some files like
b10_120.00c.current.all --> b10_120
abc_10.77.log.bac.temp.ls --> abc_10
but I'm looking for an output in b10_120.00c and abc_10.77
Aside from that, is there a way to printout the output such as it keep certain extension only? Such as for the above 2 examples, it will displays b10_120.00c.current and abc_10.77.log. Thank you very much.

The following will strip file name extensions off:
s/\.[^.]+$//;
Explanation
\. matches a literal .
[^.]+ matches every character that is not a .
$ till end of string
Update
my ($new_file_name) = ( $file_name =~ m/^( [^.]+ \. [^.]+ )/x );
Explanation
^ anchor at the start of the string
[^.]+ matches every character that is not a .
\. matches a literal .
[^.]+ matches every character that is not a .
Test
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More 'tests' => 2;
my %file_name_map = (
'b10_120.00c.current.all' => 'b10_120.00c',
'abc_10.77.log.bac.temp.ls' => 'abc_10.77',
);
sub new_file_name {
my $file_name = shift;
my ($new_file_name) = ( $file_name =~ m/^( [^.]+ \. [^.]+ )/x );
return $new_file_name;
}
for my $file_name ( keys %file_name_map ) {
is $file_name_map{$file_name}, new_file_name($file_name),
"Got $file_name_map{$file_name}";
}

$file =~ s/(\.[^.]+).*/$1/; # SO requires 30 chars in answer, that is stupid

You should use \. for the dot in the regular expression.
Also please explain in more details how you want to process file name.

Instead of a regex, I would suggest using this package:
http://perldoc.perl.org/File/Basename.html

Related

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

How to use a variable in a substitution?

I've a text file and I want to match and erase the following text (please note the newline):
[ From:
http://www.website.com ]
The following code works
$text =~ s/\[.*\]//ms;
This other doesn't
my $patt = \[.*\];
$text =~ s/$patt//ms;
Would someone be so kind to explain me why?
Thanks in advance
The second variant works perfectly, if you quote the pattern string and get rid of syntax error:
#!/usr/bin/perl
use strict;
use warnings;
my $text = qq{a[ From:
http://www.website.com ]b};
my $patt = qr/\[.*?\]/s;
$text =~ s/$patt//;
print $text;
Prints:
ab
I added ? quantifier to the regexp to make the replacement ungreedy. And removed m modifier, because you are not using ^ and $ in your regexp, so m is useless.
The only reason your variation isn't working is that you haven't put quotes around your $patt string. As it is it throws a syntax error. This works fine
my $patt = '\[.*\]';
$text =~ s/$patt//ms;
My only comment is that the /m modifier is superfluous as it modifies the behaviour of the $ and ^ anchors, which you aren't using here. Only /s is necessary to make the . match newline characters.

Perl split() Function Not Handling Pipe Character Saved As A Variable

I'm running into a little trouble with Perl's built-in split function. I'm creating a script that edits the first line of a CSV file which uses a pipe for column delimitation. Below is the first line:
KEY|H1|H2|H3
However, when I run the script, here is the output I receive:
Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|Col11|Col12|Col13|
I have a feeling that Perl doesn't like the fact that I use a variable to actually do the split, and in this case, the variable is a pipe. When I replace the variable with an actual pipe, it works perfectly as intended. How could I go about splitting the line properly when using pipe delimitation, even when passing in a variable? Also, as a silly caveat, I don't have permissions to install an external module from CPAN, so I have to stick with built-in functions and modules.
For context, here is the necessary part of my script:
our $opt_h;
our $opt_f;
our $opt_d;
# Get user input - filename and delimiter
getopts("f:d:h");
if (defined($opt_h)) {
&print_help;
exit 0;
}
if (!defined($opt_f)) {
$opt_f = &promptUser("Enter the Source file, for example /qa/data/testdata/prod.csv");
}
if (!defined($opt_d)) {
$opt_d = "\|";
}
my $delimiter = "\|";
my $temp_file = $opt_f;
my #temp_file = split(/\./, $temp_file);
$temp_file = $temp_file[0]."_add-headers.".$temp_file[1];
open(source_file, "<", $opt_f) or die "Err opening $opt_f: $!";
open(temp_file, ">", $temp_file) or die "Error opening $temp_file: $!";
my $source_header = <source_file>;
my #source_header_columns = split(/${delimiter}/, $source_header);
chomp(#source_header_columns);
for (my $i=1; $i<=scalar(#source_header_columns); $i++) {
print temp_file "Col$i";
print temp_file "$delimiter";
}
print temp_file "\n";
while (my $line = <source_file>) {
print temp_file "$line";
}
close(source_file);
close(temp_file);
The first argument to split is a compiled regular expression or a regular expression pattern. If you want to split on text |. You'll need to pass a pattern that matches |.
quotemeta creates a pattern from a string that matches that string.
my $delimiter = '|';
my $delimiter_pat = quotemeta($delimiter);
split $delimiter_pat
Alternatively, quotemeta can be accessed as \Q..\E inside double-quoted strings and the like.
my $delimiter = '|';
split /\Q$delimiter\E/
The \E can even be omitted if it's at the end.
my $delimiter = '|';
split /\Q$delimiter/
I mentioned that split also accepts a compiled regular expression.
my $delimiter = '|';
my $delimiter_re = qr/\Q$delimiter/;
split $delimiter_re
If you don't mind hardcoding the regular expression, that's the same as
my $delimiter_re = qr/\|/;
split $delimiter_re
First, the | isn't special inside doublequotes. Setting $delimiter to just "|" and then making sure it is quoted later would work or possibly setting $delimiter to "\\|" would be ok by itself.
Second, the | is special inside regex so you want to quote it there. The safest way to do that is ask perl to quote your code for you. Use the \Q...\E construct within the regex to mark out data you want quoted.
my #source_header_columns = split(/\Q${delimiter}\E/, $source_header);
see: http://perldoc.perl.org/perlre.html
It seems as all you want to do is count the fields in the header, and print the header. Might I suggest something a bit simpler than using split?
my $str="KEY|H1|H2|H3";
my $count=0;
$str =~ s/\w+/"Col" . ++$count/eg;
print "$str\n";
Works with most any delimeter (except alphanumeric and underscore), it also saves the number of fields in $count, in case you need it later.
Here's another version. This one uses the character class brackets instead, to specify "any character but this", which is just another way of defining a delimeter. You can specify delimeter from the command-line. You can use your getopts as well, but I just used a simple shift.
my $d = shift || '[^|]';
if ( $d !~ /^\[/ ) {
$d = '[^' . $d . ']';
}
my $str="KEY|H1|H2|H3";
my $count=0;
$str =~ s/$d+/"Col" . ++$count/eg;
print "$str\n";
By using the brackets, you do not need to worry about escaping metacharacters.
#!/usr/bin/perl
use Data::Dumper;
use strict;
my $delimeter="\\|";
my $string="A|B|C|DD|E";
my #arr=split(/$delimeter/,$string);
print Dumper(#arr)."\n";
output:
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'DD';
$VAR5 = 'E';
seems you need define delimeter as \\|

How can i detect symbols using regular expression in perl?

Please how can i use regular expression to check if word starts or ends with a symbol character, also how to can i process the text within the symbol.
Example:
(text) or te-xt, or tex't. or text?
change it to
(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?
help me out?
Thanks
I assume that "word" means alphanumeric characters from your example? If you have a list of permitted characters which constitute a valid word, then this is enough:
my $string = "x1 .text1; 'text2 \"text3;\"";
$string =~ s/([a-zA-Z0-9]+)/<t>$1<\/t>/g;
# Add more to character class [a-zA-Z0-9] if needed
print "$string\n";
# OUTPUT: <t>x1</t> .<t>text1</t>; '<t>text2</t> "<t>text3</t>;"
UPDATE
Based on your example you seem to want to DELETE dashes and apostrophes, if you want to delete them globally (e.g. whether they are inside the word or not), before the first regex, you do
$string =~ s/['-]//g;
I am using DVK's approach here, but with a slight modification. The difference is that her/his code would also put the tags around all words that don't contain/are next to a symbol, which (according to the example given in the question) is not desired.
#!/usr/bin/perl
use strict;
use warnings;
sub modify {
my $input = shift;
my $text_char = 'a-zA-Z0-9\-\''; # characters that are considered text
# if there is no symbol, don't change anything
if ($input =~ /^[a-zA-Z0-9]+$/) {
return $input;
}
else {
$input =~ s/([$text_char]+)/<t>$1<\/t>/g;
return $input;
}
}
my $initial_string = "(text) or te-xt, or tex't. or text?";
my $expected_string = "(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?";
# version BEFORE edit 1:
#my #aux;
# take the initial string apart and process it one word at a time
#my #string_list = split/\s+/, $initial_string;
#
#foreach my $string (#string_list) {
# $string = modify($string);
# push #aux, $string;
#}
#
# put the string together again
#my $final_string = join(' ', #aux);
# ************ EDIT 1 version ************
my $final_string = join ' ', map { modify($_) } split/\s+/, $initial_string;
if ($final_string eq $expected_string) {
print "it worked\n";
}
This strikes me as a somewhat long-winded way of doing it, but it seemed quicker than drawing up a more sophisticated regex...
EDIT 1: I have incorporated the changes suggested by DVK (using map instead of foreach). Now the syntax highlighting is looking even worse than before; I hope it doesn't obscure anything...
This takes standard input and processes it to and prints on Standard output.
while (<>) {
s {
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;
print ;
}
You might need to change the bit to match the concept of word.
I have use the x modifeid to allow the regexx to be spaced over more than one line.
If the input is in a Perl variable, try
$string =~ s{
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;

regular expression is not working

my $pat = '^x.*d$';
my $dir = '/etc/inet.d';
if ( $dir =~ /$pat/xmsg ) {
print "found ";
}
how to make it sucess
Your pattern is looking for strings starting with x (^x) and ending in d (d$). The path you are trying does not match as it doesn't start with x.
You can use YAPE::Regex::Explain to help you understand regular expressions:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/^x.*d$/xms;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?msx-i:^x.*d$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?msx-i: group, but do not capture (with ^ and $
matching start and end of line) (with .
matching \n) (disregarding whitespace and
comments) (case-sensitive):
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
x 'x'
----------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
----------------------------------------------------------------------
d 'd'
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Also, you should not need the g modifier in this case. The documentation has plenty of information about regexes: perlre
There is an 'x' too much :
my $pat = '^.*d$';
my $dir = '/etc/inet.d';
if ( $dir =~ /$pat/xmsg ) {
print "found ";
}
My guess is that you're trying to list all files in /etc/init.d whose name matches the regular expression.
Perl isn't smart enough to figure out that when you name a string variable $dir, assign to it the full pathname of an existing directory, and pattern match against it, you don't intend to match against the pathname,
but against the filenames in that directory.
Some ways to fix this:
perldoc -f glob
perldoc -f readdir
perldoc File::Find
You may just want to use this:
if (glob('/etc/init.d/x*'))
{
warn "found\n";
}