Removing double quotes in beginning & end of string in a pipe delimited line in perl - perl

I have a Line that is pipe delimited:
John |DEMME|"9 Snowy "" Court"|WERRIBEE|""VIC""
I split my line to each fields
#fields = split (/\|/, $_);
what I want is to remove the double quotes in the beginning/end of each fields
but it should retain the double quotes that are in between.
expected output
John |DEMME|9 Snowy "" Court|WERRIBEE|VIC
I also tried this
s/^\"|"$//g;
but what it does is it reads by line not but fields, so it will only remove the double qoutes which are at the beginnig and end of the line.
another scenario:
John |DEMME| "Shop 6A ""Atlantic on Coolum""|WERRIBEE|VIC
output should be
John |DEMME| Shop 6A "Pacific on Coolum"|WERRIBEE|VIC
I hope you guys can help me with this.
thank you very much

If it's reading by lines, then you could write:
s/(?:^|(?<=\|))"|"(?=$|\|)//g;
to remove " at the start of a line or after a |, or at the end of a line or before a |.
(The (?<=...) notation creates a zero-width positive "lookbehind" assertion, which in this case checks to see if there's a | preceding; the (?=...) notation creates a zero-width positive "lookahead" assertion, which in this case checks to see if there's a | or end-of-file following.)

try this one dude
my $_='John |DEMME|"9 Snowy "" Court"|WERRIBEE|""VIC""';
my #fields = split (/\|/, $_);
foreach my $item(#fields){
$item=~s/^\"+//g;
$item=~s/\"+$//g;
print "$item";
}

The quotes could also be removed from each field using a map expression:
#fields = map { s/^"(.*)"$/$1/; $_ } split (/\|/, $_);
split breaks the line apart into a list, while the map applies the substitution to each member of the list. To join them back up:
print join('|', #fields), "\n";

Related

Add a new line in a variable using perl

I am trying to add a new line in a variable after certain number of words. For example: If we have a variable:
$x = "This a variable, start a new line here, This is a new line.";
If I print the above variable
print $x;
I should get the below output:
This is a variable,
start a new line here,
This is a new line.
How can I achieve this in Perl from the variable itself?
I do not agree to the formula "after certain number of words".
Note that the first target line has 4 words, whereas remaining 2 have
5 words each.
Actually you need to replace each comma and following sequence of
spaces (if any) with a comma and \n.
So the intuitive way to do it is:
$x =~ s/,\s*/,\n/g;
The simplest way is to split the string on comma followed by a space and then
join the word groups with a comma followed by a newline.
my $x = "This a variable, start a new line here, This is a new line.";
print join(",\n", split /, /, $x) . "\n";
output
This a variable,
start a new line here,
This is a new line.
For solving the general, how do I reformat this string with line breaks after n-columns? problem, use the Text::Wrap library (as suggested by #ikegami):
use Text::Wrap;
my $x = "The quick brown fox jumped over the lazy dog.";
$Text::Wrap::columns = 15;
# wrap() needs an array of words
my #words = split /\s+/, $x;
# Initial tab, subsequent tab values set to '' (think indent amount)
print wrap('', '', #words) . "\n";
output
The quick
brown fox
jumped over
the lazy dog.
You probably want to use regular expressions. You can do this:
$x =~ s/^(\S+\s+){3}\K/\n/;
Or if this is about the commas and not the spaces:
$x =~ s/^([^,]+,+){2}\s*\K/\n/;
(in this case I also remove any potential space that would be after the comma)
You can also configure separately how many words or comma you want, by putting this in a variable:
my $nbwords = 7; # add a line after the 7th word
$x =~ s/^(\S+\s+){$nbwords}\K/\n/;
Now, that would keep the last space so you may want to do this:
my $nbwords = 7; # add a line after the 7th word
$nbwords--; # becomes 6 because there is another word after that we match as well
$x =~ s/^(\S+\s+){$nbwords}\S+\K\s+/\n/;
You should probably learn to use Regexps but just to explain the above:
\s is any space character (like space, tab, line feed, etc)
\S (uppercase) is any character except a space character
+ means any number of characters of that type described with what is before. So \s+ means any number of consecutive space characters.
{123} means 123 times that type of character ...
{3,80} means 3 to 80 times. So + is equivalent to {1,} (one to unlimited)
\K means that whatever is before will not be replaced, only what is after.

Unpack and x option in TEMPLATE

I have a row which looks like (no new lines, this is one line and I replaces the spaces with _ as otherwise they are trimmed):
46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P
The the use of unpack perl method returns as follows:
unpack("x49 A4", $line); # where $line is the above example line
returns: CALL
unpack("x68 A4", $line);
returns: 0122
unpack("x238 A4", $line);
return: 2015
Apparently, the column numbers do not match with the number given after 'x' in the TEMPLATE, as x238 is not equal to column 238 ('0000'), I have '2015' on column 251, not 238. The same for the other.
Please, explain how exactly the numbers given after 'x' in TEMPLATE work.
Thank you
First of all, your data isn't what you said it is. The data you provided produces the desired result.
$ perl -E'
my $line = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
$line =~ s/_/ /g;
say unpack("x238 A4", $line);
'
0000
Maybe your actual data contained non-printable characters. Or maybe some of the spaces were actually tabs.
$ perl -E'
$_ = "46S990BZ6BRIG___1381TRANSOCEAN_LTD______________BCALL_FEB00025000__1000000000000000000000000000000000000000B90002132015000000099999900161100000000000000007500111214111414121714100003000_H8817H100015012200005000000000010000000000000000000000009920202020150213__20_________________________________________________OV__0203P";
s/_/ /g;
s/LTD\K\s+/\t\t/; # If the spaces after LTD were tabs
say unpack("x238 A4", $_);
'
2015
If you want to extract the characters at based on how your viewer expands tabs, you will need to expand tabs to spaces in the same fashion before passing the string to unpack.

How do I find the sum of all numbers in STDIN even if there are non-digit characters?

I have an assignment asking me to enter a sequence of numbers and characters each separated by a space and the sequence in ended by entering in "q" or "Q" followed by a space. Everything except the numbers should be discarded and we are to find the sum. So for example if the input is "1 12 a 2 5 P Q" then we should expect to get "20" as the output.
So far I'm using
$input = <>;
$input =~ tr/0-9//cd;
to get only the numbers but what I want is to split them up and get the sum. Right now the output would be 11225 and I want "1+12+2+5" and get the sum.
perl -ne '$s=0;($line)=/(.*?)[Qq]/;while($line=~/(\d+)/g) {$s+=$1} print "$s\n"'
Explanation:
Strips the trailing part of each line starting with a Q or a q, then scan the remaining part for isolated positive integers and adds these together.
First, strip out all characters that aren't numbers or spaces:
$input =~ s/[^0-9\s]//g;
Then, split on whitespace:
#digits = split(/\s/, $input);
Then you have a list of digits that you can add up.
Preserve spaces in your first step:
$input =~ tr/0-9 //cd;
Then split on spaces:
my #numbers = split ' ', $input;
(this is a special form of split that works like split /\s+/ but also discards empty leading fields).
You probably want to start by getting rid of everything after a Q though:
$input =~ s/Q .*//i;
For what it's worth, I wouldn't have jumped to using tr here; I'd have started by spliting on spaces, then processed fields that were only digits until a Q was reached.

Perl regex replace first name last name with first name last initial

I want to have the output of $var below to be John D
my $var = "John Doe";
I have tried
$var =~ s/(.+\b.).+\z],'\1.'//g;
Here's a general solution (feel free to swap in '\w' where I used '.', and add a \s where I used \s+)
my $var = "John Doe";
(my $fname, my $linitial) = $var =~ /(.*)\s+(.).*/
Then you have the values
$fname = 'John';
$linitial = 'D';
and you can do:
print "$fname $linitial";
to get
"John D"
EDIT
Until you do your next match, each of the capture parentheses creates a variable ($1 and $2, respectively), so the whole thing can be shortened a bit as follows:
my $var = "John Doe";
$var =~ /(.*)\s+(.).*/
print "$1 $2";
To replace the last sequence of non-whitespace characters with just the initial character, you could write this
use strict;
use warnings;
my $var = "John Doe";
$var =~ s/(\S)\S*\s*$/$1/;
print $var;
output
John D
Assuming your string has ascii names this will work
$var =~ s/([a-zA-Z]+)\s([a-zA-Z]+)/$1." ".substr($2,0,1)/ge;
$var = "John Doe";
s/^(\w+)\s+(\w)/$1 \u$2/ for $var;
A simple regex that solves this problem is the substitution
s/^\w+\s+\K(\w).*/\U$1/s
What does this do?
^ \w+ \s+ matches a word at the beginning of the string, plus whitespace towards the next word
\K is the keep escape. It keeps the currently matched part outside of that substring that is considered “matched” by the regex engine. This avoids an extra capture group, and is practically a look-behind.
(\w) matches and captures one “word” character. This is the leading character of the second word in the string.
.* matches the rest of the string. I do this to overwrite any other names that may come: you stated that Lester del Ray should be transformed to Lester D, not Lester D Ray as a solution with \w* instead of the .* part would have done. The /s modifier is relevant for this, as it enables . to match every character including newlines (who knows what's inside the string?).
The substitution uses the \U modifier to uppercase the rest of the string, which consists of the value of the capture.
Test:
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "Lester del Ray"
Lester D
$ perl -E'$_ = shift; s/^\w+\s+\K(\w).*/\U$1/s; say' "John Doe"
John D
Something like this might be a little more usable/reusable in the long run.
$initial = sub { return substr shift, 0, 1 ; };
make a get initial function
$var =~ s/(\w)\s+(\w)/&$initial($1) &$initial($2)/sge;
Then replace the first and second results using execute in the regex;

What's happening in this Perl foreach loop?

I have this Perl code:
foreach (#tmp_cycledef)
{
chomp;
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
#print "$cycle_code, $close_day, $first_date\n";
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The value of tmp_cycledef comes from output of an SQL query:
select cycle_code,cycle_close_day,to_char(cycle_first_date,'YYYY-MM-DD')
from cycle_definition d
order by cycle_code;
What exactly is happening inside the for loop?
Huh, I'm surprised no one fixed it for you :)
It looks like the person who wrote this was trying to trim leading and trailing whitespace from each field. It's a really odd way to do that, and for some reason he was overly concerned with interior whitespace in each field despite his anchors.
I think that should be the same as trimming the whitespace around the delimiter in the split:
foreach (#tmp_cycledef)
{
s/^\s+//; s/$//; #leading and trailing whitespace on the whole string
my ($cycle_code, $close_day, $first_date) = split(/\s*\|\s*/, $_, 3);
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The key to thinking about split is considering which parts of the string you want to throw away, not just what separates the fields that you want.
For regex part, s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/ do stripping of leading and trailing whitespaces
Each row in #tmp_cycledef is composed of a string formatted following "cycle_code | close_day | first_date".
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
Split the string into three parts. The following regular expressions are used to strip leading and trailing whitespaces.
The last instruction of the loop creates an entry in the dictionary $cycledef indexed by $cycle_code. The entry is formated is formatted using the following scheme:
[ $close_day, YYYY, MM, DD ]
where $first_date = "YYYY-MM-DD".
#tmp_cycledef: The output of the sql query is stored in this array
foreach (#tmp_cycledef) : For every element in this array.
chomp : remove the \n char from the end of every element.
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
split the elements into 3 parts and assign the variable to each of the splited element. parts of split are "split(/PATTERN/,EXPR,LIMIT)"
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
This regex part is sripping of leading and trailing whitespaces from each variable.
my god, it's been such a long time since I've read perl... but I'll give it a shot.
you grab a record from #tmp_cycledef, and chomp off the newline at the end, and split it up into the three variables: then, like S.Mark said, each substitution regex strips off the leading and trailing whitespace for each of the three variable. Finally, the values get pushed into a hash as a list, with some debugging code commented out right above it.
hth
Your query gives a set of rows that
are stored in the array
#tmp_cycledef.
We iterate over each row in the
result using: foreach
(#tmp_cycledef).
The result rows might have trailing
newline char, we get rid of them
using chomp.
Next we split the row (which is not
in $_) on the pipe and assign the
first 3 pieces to $cycle_code,
$close_day and $first_date
respectively.
The split pieces might have leading
and trailing white spaces, the next 3
lines are to remove the leading and
trailing white space in the 3
variables.
Finally we make an entry into the
hash %cycledef. The key use is
$cycle_code and the value is an
array whose first element is
$close_day and rest of the elements
are pieces got after splitting
$first_date on hyphen.