Perl parsing - mixture of chars, tabs and spaces - perl

I have the following types of line in my code:
MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/
I want to parse them to get:
'MMAPI_CLOCK_OUTPUTS'
'1'
'clock outputs system'
So I tried:
elsif($TheLine =~ /\s*(.*)s*=s*(.*),s*\/*(.*)*\//)
but this doesn't get the last string 'clock outputs system'
What should the parsing code actually be?

You should escape the slashes, stars and the s for spaces. Instead of writing /, * or s in your regex, write \/, \* and \s:
/\s*(.*)\s=\s*(.*),\s\/\*(.*)\*\//

if($TheLine =~ m%^(\S+)\s+=\s+(\d+),\s+/\*(.*)\*/%) {
print "$1 $2 $3\n"
}
This uses % as an alternative delimiter in order to avoid leaning toothpick syndrome when you escape the / characters.

Try this regex: /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/
Here is an example in which you can test it:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/\n
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/";
while ($str =~ /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/gm) {
print "$1 $2 $3 \n";
}
# Output:
# MMAPI_CLOCK_OUTPUTS 1 clock outputs system
# MMAPI_SYSTEM_MANAGEMENT 0 sys man system

Related

lowercase everything except content between single quotes - perl

Is there a way in perl to replace all text in input line except ones within single quotes(There could be more than one) using regex, I have achieved this using the code below but would like to see if it can be done with regex and map.
while (<>) {
my $m=0;
for (split(//)) {
if (/'/ and ! $m) {
$m=1;
print;
}
elsif (/'/ and $m) {
$m=0;
print;
}
elsif ($m) {
print;
}
else {
print lc;
}
}
}
**Sample input:**
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
**Sample output:**
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
You can give this a shot. All one regexp.
$str =~ s/(?:^|'[^']*')\K[^']*/lc($&)/ge;
Or, cleaner and more documented (this is semantically equivalent to the above)
$str =~ s/
(?:
^ | # Match either the start of the string, or
'[^']*' # some text in quotes.
)\K # Then ignore that part,
# because we want to leave it be.
[^']* # Take the text after it, and
# lowercase it.
/lc($&)/gex;
The g flag tells the regexp to run as many times as necessary. e tells it that the substitution portion (lc($&), in our case) is Perl code, not just text. x lets us put those comments in there so that the regexp isn't total gibberish.
Don't you play too hard with regexp for such a simple job?
Why not get the kid 'split' for it today?
#!/usr/bin/perl
while (<>)
{
#F = split "'";
#F = map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
print join "'", #F;
}
The above is for understanding. We often join the latter two lines reasonably into:
print join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
Or enjoy more, making it a one-liner? (in bash shell) In concept, it looks like:
perl -pF/'/ -e 'join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
In reality, however, we need to respect the shell and do some escape (hard) job:
perl -pF/\'/ -e 'join "'"'"'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
(The single-quoted single quote needs to become 5 letters: '"'"')
If it doesn't help your job, it helps sleep.
One more variant with Perl one-liner. I'm using hex \x27 for single quotes
$ cat sql_str.txt
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
$ perl -ne ' { #F=split(/\x27/); for my $val (0..$#F) { $F[$val]=lc($F[$val]) if $val%2==0 } ; print join("\x27",#F) } ' sql_str.txt
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
$

Move last character of line to specific column -- sed? awk?

I need to replace all lines ending with specific character (say, &) such that this character should be in certain column (say, 80).
Which tool is best?
I have started thinking about sed:
sed 's/\(.*\)&/\1 <what should be here??> &/'
but cannot understand how to replace with variable number of spaces such that & goes to column 80.
Thanks!
Use the /e switch to s/// that tells Perl to evaluate the replacement portion to compute the result.
#! /usr/bin/env perl
use strict;
use warnings;
while (<>) {
s/^(.*)(&)$/$1 . " " x (79 - length $1) . $2/e;
print;
}
Sample run:
$ echo -e 'foo&\n&\nbar &\nbaz' | ./align-ampersands
foo &
&
bar &
baz
If your input contains TAB characters, you will need to use more sophisticated processing.
Not sure if I understand your question correctly but you can try something like (assuming your file is space delimited):
awk '/&$/ {for(i=1;i<=NF;i++) $i=(i==80)?"& "$i:$i}1' yourFile
Awk and Perl will both work. Both have printf and substr:
#! /usr/bin/env perl
use warnings;
use strict;
my $string = "this is some text &";
my $last_char = substr($string, -1, 1);
$string = substr ($string, 0, length ($string ) - 1);
printf qq(%-79.79s%s\n), $string, $last_char;
The substr command is available in both Awk and Perl.
The whole command could be made into a one liner:
printf qq(%-79.79s%s\n), substr ($string, 0, length ($string ) - 1), substr($string, -1, 1);
awk '/&$/{$80="&"}1' file

Perl parsing Text File with regular expression

I have a file with the following random structures:
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"
or
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"
I am trying to parse it with perl to get the values like the following:
1362224754632;00966590832186;580;AAA;L2
Below is the code:
if($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)
{
print LOGFILE "$1;$2;$3;$4;$5;$6;$7\n";
}
elsif($Record =~ /USMS (.*?)|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" TEXT="(.*?)"/)
{
print LOGFILE "$1;$2;$3;$4;$5;$6\n";
}
But I am getting always:
;;;;;
Pipe (|) is a special character in regular expressions. Escape it, like: \| and it will work.
if($Record =~ /USMS (.*?)\|<REQ MSISDN="(.*?)" CONTRACT="(.*?)" SUBSCRIPTION="(.*?)" FORMAT="(.*?)" THRESHOLDID="(.*?)" TEXT="(.*?)"/)
and the same for the else branch.
Instead of using a single regex, I would split the data into its separate sections first, then approach them separately.
my($usms_part, $request) = split / \s* \|<REQ \s* /x, $Record;
my($usms_id) = $usms_part =~ /^USMS (\d+)$/;
my %request;
while( $request =~ /(\w+)="(.*?)"/g ) {
$request{$1} = $2;
}
Rather than having to hard code all the possible key/value pairs, and their possible orderings, you can parse them generically in one piece of code.
Change
(.*?)
to
([a-zA-Z0-9]*)
It looks like all you want is the fields contained in double-quotes.
That looks like this
use strict;
use warnings;
while (<DATA>) {
my #values = /"([^"]+)"/g;
print join(';', #values), "\n";
}
__DATA__
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="AAA" FORMAT="ascii" TEXT="L2"
USMS 1362224754632|<REQ MSISDN="00966590832186" CONTRACT="580" SUBSCRIPTION="BBB" THRESHOLDID="1" FORMAT="ascii" TEXT="L2"
output
00966590832186;580;AAA;ascii;L2
00966590832186;580;BBB;1;ascii;L2

How can i detect symbols using regular expression in perl?

Please how can i use regular expression to check if word starts or ends with a symbol character, also how to can i process the text within the symbol.
Example:
(text) or te-xt, or tex't. or text?
change it to
(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?
help me out?
Thanks
I assume that "word" means alphanumeric characters from your example? If you have a list of permitted characters which constitute a valid word, then this is enough:
my $string = "x1 .text1; 'text2 \"text3;\"";
$string =~ s/([a-zA-Z0-9]+)/<t>$1<\/t>/g;
# Add more to character class [a-zA-Z0-9] if needed
print "$string\n";
# OUTPUT: <t>x1</t> .<t>text1</t>; '<t>text2</t> "<t>text3</t>;"
UPDATE
Based on your example you seem to want to DELETE dashes and apostrophes, if you want to delete them globally (e.g. whether they are inside the word or not), before the first regex, you do
$string =~ s/['-]//g;
I am using DVK's approach here, but with a slight modification. The difference is that her/his code would also put the tags around all words that don't contain/are next to a symbol, which (according to the example given in the question) is not desired.
#!/usr/bin/perl
use strict;
use warnings;
sub modify {
my $input = shift;
my $text_char = 'a-zA-Z0-9\-\''; # characters that are considered text
# if there is no symbol, don't change anything
if ($input =~ /^[a-zA-Z0-9]+$/) {
return $input;
}
else {
$input =~ s/([$text_char]+)/<t>$1<\/t>/g;
return $input;
}
}
my $initial_string = "(text) or te-xt, or tex't. or text?";
my $expected_string = "(<t>text</t>) or <t>te-xt</t>, or <t>tex't</t>. or <t>text</t>?";
# version BEFORE edit 1:
#my #aux;
# take the initial string apart and process it one word at a time
#my #string_list = split/\s+/, $initial_string;
#
#foreach my $string (#string_list) {
# $string = modify($string);
# push #aux, $string;
#}
#
# put the string together again
#my $final_string = join(' ', #aux);
# ************ EDIT 1 version ************
my $final_string = join ' ', map { modify($_) } split/\s+/, $initial_string;
if ($final_string eq $expected_string) {
print "it worked\n";
}
This strikes me as a somewhat long-winded way of doing it, but it seemed quicker than drawing up a more sophisticated regex...
EDIT 1: I have incorporated the changes suggested by DVK (using map instead of foreach). Now the syntax highlighting is looking even worse than before; I hope it doesn't obscure anything...
This takes standard input and processes it to and prints on Standard output.
while (<>) {
s {
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;
print ;
}
You might need to change the bit to match the concept of word.
I have use the x modifeid to allow the regexx to be spaced over more than one line.
If the input is in a Perl variable, try
$string =~ s{
( [a-zA-z]+ ) # word
(?= [,.)?] ) # a symbol
}
{<t>$1</t>}gx ;

split function extension

I am learning the sample code from split function.
Sample code.
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
my $info = "Caine:Michael:Actor:14, Leafy Drive";
my #personal = split(/:/, $info);
# #personal = ("Caine", "Michael", "Actor", "14, Leafy Drive");
If change the $info = "Caine Michael Actor /* info data */";
How to use the split(/ /, $info) to export the result below.
# #personal = ("Caine", "Michael", "Actor", "info data");
Thank you.
Alternative approach:
Have you considered using the 3-parameter version of split:
$info = "Caine Michael Actor /* info data */";
#personal= split(' ',$info,4);
resulting in
#personal=('Caine','Michael','Actor','/* info data */');
then you would have to remove / * * / .. to get your result...
It really is better to use regex for this:
$info = "Caine Michael Actor /* info data */";
$info =~ /(\w+)\s+(\w+)\s+(\w+).*\/\*(.+)\*\//;
#personal = ($1, $2, $3, $4);
Mainly because your input string has ambiguities related to word separators not easily handled by split.
In case you're wondering how to read the regex:
/
(\w+) # CAPTURE a sequence of one of more word characters into $1
\s+ # MATCH one or more white space
(\w+) # CAPTURE a sequence of one of more word characters into $2
\s+ # MATCH one or more white space
(\w+) # CAPTURE a sequence of one of more word characters into $3
.* # MATCH zero or more of anything
\/\* # MATCH the opening of C-like comment /*
(.+) # CAPTURE a sequence of one or more of anything into $4
\*\/ # MATCH the closing of C-like comment */
/x
since there isn't an answer yet that handles the general case, here goes:
split isn't your best bet here, and since the delimiter can be both a matched and non matched character, it will be clearest to invert the problem and describe what you do what to match, which in this case is either a string of non space characters, or the contents of a c style comment.
use strict;
use warnings;
my $info = "Caine Michael Actor /* info data */";
my #personal = grep {defined} $info =~ m! /\* \s* (.+?) \s* \*/ | (\S+) !xg;
say join ', ' => #personal;
that will return a list of words / contents of comments in any sequence you need. The syntax highlighter doesn't highlight the above regex properly, the regex is everything between !
Cooked something up :). Does work only for you example. Cannot generalize
use strict;
use warnings;
my $info = "Caine Michael Actor /* info data */";
if($info=~m{/\*\s*(.*?)\s*\*/})
{
my $temp = $1;
$temp=~s{\s+}{##}g;
$info=~s{/\*\s*(.*?)\s*\*/}{$temp};
}
my #personal = split(/ /, $info);
foreach(#personal)
{
s{##}{ }g;
print "$_\n";
}
Output:
C:>perl a.pl
Caine
Michael
Actor
info data