Perl quotes surrounded by only string in array - perl

I need to place the single quotes other than number in an array.
I tried the following code but it was not working . Can anyone help me to sort it out.
$data = join ',', map { /'\w+'/ } #$row[0..3];
Input/Output :
Input :
[1,string test, value test, 5]
Output:
(1,'string test', 'value test', 5)

To place '' around elements that have not a single digit in them
my $data = join ',', map { /[0-9]/ ? $_ : "'${_}'" } #$row[0..3];
where string 10 test doesn't get quoted.
Or, to leave unquoted only pure integers
my $data = join ',', map { /[^0-9]/ ? "'${_}'" : $_ } #$row[0..3];
which quotes strings with a number in them as well, like the example above.
For non-integer numbers, there is Scalar::Util::looks_like_number
use Scalar::Util 'looks_like_number';
my $data = join ',', map { looks_like_number($_) ? $_ : "'${_}'" } #$row[0..3];
what of course works for the second case (integers) as well.

Related

Extracting info from file rows into columns using whatever it works (PERL, SED, AWK)

Maybe I´m too old for perl/awk/sed, too young to stop programming.
Here is the problem I need to solve:
I have info like this in a TXT file:
Name:
Name 1
Phone:
1111111
Email:
some#email1
DoentMatterInfo1:
whatever1
=
Name:
Name 2
Phone:
22222222
DoentMatterInfo2:
whatever2
Email:
some#email2
=
Name:
Name 3
DoentMatterInfo3:
whatever2
Email:
some#email3
=
Please note that the desired info is in the next line, there is a record separator (=) and very important, some records doesn't have all the info, but could have info that we dont want.
So, the challenge is to extract the desired info, if exist, in an output like:
Name 1 ; 111111 ; some#email1
Name 2 ; 222222 ; some#email2
Name 3 ; ; some#email3
What I have tried that worked a little bit but stills is not what I´m looking for.
1. Using PERL
Using Perl I got the fields that matter:
while (<>) {
if ($_ =~ /Name/) {
print "=\n". scalar <>;
}
if ($_ =~ /Email/) {
print "; ". scalar <>;
}
if ($_ =~ /Phone/) {
print "; ". scalar <>;
}
}
The I got a file like:
Name 1
; 1111111
; some#email1
=
Name 2
; 22222222
; some#email2
=
Name:
Name 3
; some#email3
=
Now with sed I put each record in a single line:
SED
With SED, this command replaces the Line Feed, got the info in a single line:
sed ':a;N;$!ba;s/\n//g' input.txt > out1.txt
And out back the line feed:
sed 's/|=|/\n/g' out1.txt > out2.txt
So I got a file with the info in each line:
Name 1 ; 1111111 ; some#email1
Name 2 ; 22222222 ; some#email2
Name 3 ; some#email3
Still not what I would like to get from coding. I want something better, like being able to fill the missing phone with space, so the second column could be always the phone column. Do you get it?
AS you can see, the poitn is to find a solution, no matter if is using Perl, AWk or SED. I´m trying perl hashes...
Thanks in advance!!
Here is a Perl solution, asked for and attempted
use warnings;
use strict;
use feature 'say';
my #fields = qw(Name Phone Email); # fields to process
my $re_fields = join '|', map { quotemeta } #fields;
my %record;
while (<>) {
if (/^\s*($re_fields):/) {
chomp($record{$1} = <>);
}
elsif (/^\s*=/) {
say join ';', map { $record{$_} // '' } #fields;
%record = ();
}
}
The input is prepared in the array #fields; this is the only place where those names are spelled out, so if more fields need be added to processing just add them here. A regex pattern for matching any one of these fields is also prepared, in $re_fields.
Then we read line by line all files submitted on the command line, using the <> operator.
The if condition captures an expected keyword if there. In the body we read the next line for its value and store it with the key being the captured keyword (need not know which one).
On a line starting with = the record is printed (correctly with the given sample file). I put nothing for missing fields (no spaces) and no extra spaces around ;. Adjust the output format as desired.
In order to collect records throughout and process further (or just print) later, add them to a suitable data structure instead of printing. What storage to choose depends on what kind of processing is envisioned. The simplest way to go is to add strings for each output record to an array
my (#records, %record);
while (<>) {
...
elsif (/^\s*=/) {
push #records, join ';', map { $record{$_} // '' } #fields;
%record = ();
}
}
Now #records has ready strings for all records, which can be printed simply as
say for #records;
But if more involved processing may be needed then better store in an array copies of %record as hash references, so that individual components can later be manipulated more easily
my (#records, %record);
while (<>) {
...
elsif (/^\s*=/) {
# Add a key to the hash for any fields that are missing
$record{$_} //= '' for #fields;
push #records, { %record };
%record = ();
}
}
I add a key for possibly missing fields, so that the hashrefs have all expected keys, and I assign an empty string to it. Another option is to assign undef.
Now you can access individual fields in each record as
foreach my $rec (#records) {
foreach my $fld (sort keys %$rec) {
say "$fld -> $rec->{$fld}"
}
}
or of course just print the whole thing using Data::Dumper or such.
This will work using any awk in any shell on every UNIX box:
$ cat tst.awk
BEGIN { OFS=" ; " }
$0 == "=" {
print f["Name:"], f["Phone:"], f["Email:"]
delete f
lineNr = 0
next
}
++lineNr % 2 { tag = $0; next }
{ f[tag] = $0 }
.
$ awk -f tst.awk file
Name 1 ; 1111111 ; some#email1
Name 2 ; 22222222 ; some#email2
Name 3 ; ; some#email3
I would do it like this:
$ cat prog.awk
#!/bin/awk -f
BEGIN { OFS = ";" }
/^(Name|Phone|Email):$/ { getline arr[$0] ; next }
/^=$/ { print arr["Name:"], arr["Phone:"], arr["Email:"] ; delete arr }
Explanation:
In the BEGIN block, define the output field separator (semicolon).
For each line in the input file, if the line (in its entirety) equals Name: or Phone: or Email: then assign that string to the key and the value of the following line to the value of an element of the associative array arr. (That is how getline can be used to assign a value to a variable.) Then skip the next rule.
If the line is =, print the three values from the arr associative array, and then clear out the array (reset all the values to the empty string).
* * * *
Make it executable:
chmod +x prog.awk
Use it:
$ ./prog.awk file.txt
Name 1;1111111;some#email1
Name 2;22222222;some#email2
Name 3;;some#email3
Note - a missing value is indicated by two consecutive semicolons (not by a space). Using space as placeholder for NULL is a common bad practice (especially in relational databases, but in flat files too). You can change this to use NULL as placeholder, I am not terribly interested in that bit of the problem.
Input file format is easy to parse: split on =\n into records, split each record on \n into a hash and push the hash into #result array.
Then just output each element of #result array with specifying fields of interest.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #result;
my $data = do { local $/; <DATA> };
my #records = split('=\n?',$data);
push #result, {split "\n", $_} for #records;
say Dumper(\#result);
my #fields = qw/Name: Phone: Email:/;
for my $record (#result) {
$record->{$_} = $record->{$_} || '' for #fields;
say join('; ', #$record{#fields});
}
__DATA__
Name:
Name 1
Phone:
1111111
Email:
some#email1
DoentMatterInfo1:
whatever1
=
Name:
Name 2
Phone:
22222222
DoentMatterInfo2:
whatever2
Email:
some#email2
=
Name:
Name 3
DoentMatterInfo3:
whatever2
Email:
some#email3
=
Output
$VAR1 = [
{
'DoentMatterInfo1:' => 'whatever1',
'Name:' => 'Name 1',
'Email:' => 'some#email1',
'Phone:' => '1111111'
},
{
'Phone:' => '22222222',
'Email:' => 'some#email2',
'Name:' => 'Name 2',
'DoentMatterInfo2:' => 'whatever2'
},
{
'DoentMatterInfo3:' => 'whatever2',
'Name:' => 'Name 3',
'Email:' => 'some#email3'
}
];
Name 1; 1111111; some#email1
Name 2; 22222222; some#email2
Name 3; ; some#email3

Use of hash of hashes for extracting data

I am new to perl and am trying to understand hashes. I've tried using a basic hash and its working. I am now trying to extract data using a hash of hashes. E.g I have a text file (input.txt) that contains some random information. How can I extract the required information using a hash of hashes structure.
input.txt
hi how r you this is sample .txt. you can use it for learning hash and hash of hashes. Let say I have cell ("name") and it has value as below
cell ("name"){
pin : A, B;
function: A+B;
value: 0.435;
}
I want to extract cell data in following format.
Output
Cell Pin Value
name A 0.435
I tried this:
while(<$fh>)
{
if(/ cell \(/){$hash{cell} //=$_;}
elsif(/ pin \(/){$hash{pin} //=$_;}
elsif(/ value :/){$hash{value} //=$_;}
}
use Data::Dump;
dd \%hash;
This will give only one entry in hash form. How can I get all these matches available in the input file.
Firstly, you need some way to avoid the text commentary at the start of the file. You could just skip the first line, but then random text that appears elsewhere will mess things up. What would be better is to look for the relevant data but happily ignore any other text no matter where it appears.
Notice that the text commentary contains relevant-looking data: cell ("name") but there is no { on the end of the line. You could use that to distinguish between the commentary and the data but that's perhaps a little too flexible. Probably better to insist on the { as well as whitespace only before the cell declaration.
Once inside a cell, it's reasonable to insist on having no comments. We can then just iteratively read lines and split on the ":" until we reach the }. Combined with some general advice;
Separate regex definition from regex use.
Test your matches before using the capture variables; and
Use 'extended mode' regexes which allow whitespace in the regex
This all gives us;
#!/usr/bin/env perl
use v5.12;
use Data::Dumper qw(Dumper);
my $cell_name_re = qr/ ^ \s* cell \s* \( \s* "(\w+)" \) \s* { /x;
my $cell_data_re = qr/ ^ \s* ([^:]+) : (\N+) \n /x;
my $closing_curly_re = qr/ ^ \s* } /x;
my %data ;
while (<>) {
next unless /$cell_name_re/ ;
my $cell_name = $1 ;
my %cell_hash ;
while (<>) {
if ( /$cell_data_re/ ) {
$cell_hash{ $1 } = $2 ;
}
elsif ( /$closing_curly_re/ ) {
$data{ $cell_name } = \%cell_hash ;
last ; # exit the inner loop
}
else {
warn "Don't understand line $. - ignoring" ;
}
}
}
print Dumper( \%data );
exit 0;
There are two key things here - firstly, %cell_hash is declared inside the first loop which ensures we get a new %cell_hash each time through; and when we insert %cell_hash into the global %data we take a reference to it with \. Running it the input data above yields;
{
'name' => {
'function' => ' A+B;',
'value' => ' 0.435;',
'pin ' => ' A, B;'
}
};

Perl join adds leading semicolon?

We loop through a database and push each row to an array
while (($carriergw) = $sth->fetchrow_array) {
if ($rows >= 1) {
push(#gwlist, $carriergw);
}
else {
push(#gwlist, -1);
}
}
This yields the array (0 10) for example. When I try to join the elements
by adding a semicolon after every element:
join(';', #gwlist)
The join function adds a leading semi colon (i.e. ;10;0). What we need is just 10;0. How
to get the list without any leading or trailing separators?
Your array #gwlist has an empty string or undef for its first element. How do you declare it? I think you have written
my #gwlist = undef;
If you write
my #gwlist;
push #gwlist, 10;
push #gwlist, 0;
print join ';', #gwlist;
then you will get 10;0 for output. You need to investigate where that first element came from.
By the way, your while loop is better written as
while (my ($carriergw) = $sth->fetchrow_array) {
push #gwlist, $rows > 0 ? $carriergw : -1;
}
but the test on $rows is almost certainly unnecessary. You don't say where its value comes from, but it looks like you want to push a single -1 if no rows were retrieved from the table. If that is the case then the while loop will never be entered, so not even the -1 will be added to the array.
There probably is an empty element, an undef or just whitespace as the first (gwlist[0]) element of #gwlist. To get around it, you may shift the first element off, or use an array slice:
shift #gwlist;
join ';', #gwlist;
Or:
join ';', #gwlist[1..$#gwlist]
my #l = (undef, 0, 10);
print join(";", #l), "\n";
Gives:
;0;10
If you use warnings it also says:
Use of uninitialized value $l[0] in join or string at test.pl line 5
You can prevent this by filtering:
print join(";", grep { defined $_ } #l), "\n";

Convert a string into a hash in Perl using split()

$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
%hash = split /,|=>/, $hashdef;
print "$_=>$hash{$_}" foreach(keys %hash);
Mouse=>JerryDog=>SpikeCat=>Tom
I am new to Perl. Can any one explain the regular expression inside the split function? I able to know | is used as the choice of both, but I was still confused.
%hash = split /|=>/, $hashdef;
I get the output
S=>pe=>J=>eT=>or=>rm=>,y=>,u=>sM=>og=>D=>oC=>ai=>kt
%hash = split /,/, $hashdef;
Mouse=>Jerry=>Cat=>TomDog=>Spike=>
Please explain the above condition.
split's first argument defines what separates the elements you want.
/,|=>/ matches a comma (,) or an equals sign followed by a greater-than sign (=>). They're just literals here, there's nothing special about them.
/|=>/ matches the zero-length string or an equals sign followed by a greater-than sign, and splitting on a zero-length string just splits a string up into individual characters; therefore, in your hash, M will map to o, u will map to s, etc. They appear jumbled up in your output because hashes don't have a definite ordering.
/,/ just splits on a comma. You're creating a hash that maps Mouse=>Jerry to Cat=>Tom and Dog=>Spike to nothing.
$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
my %hash = eval( "( $hashdef )" );
print $hash{'Mouse'}."\n";
eval executes a string as a Perl expression. This doesn't use split, but I think would be a good way to handle the case outlined in your post of getting a hash from your string, seeing as your string happens to be well formed Perl, so I've added it here.
sub hash2string {
my $href = $_[0];
my $hstring = "";
foreach (keys %{$href}) {
$hstring .= "$_=>$href->{$_}, ";
}
return substr($hstring, 0, -2);
}
sub string2hash {
my %lhash;
my #lelements = split(/, /, $_[0]);
foreach (#lelements) {
my ($skey,$svalue) = split(/=>/, $_);
$lhash{$skey} = $svalue;
}
return %lhash;
}

Instr Equivalent in perl?

A variable named RestrictedNames holds the list of restricted user names. SplitNames is an array variable which holds the complete set of user name. Now I have to check whether current name is found in RestrictedNames variable like using instr.
#SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates") and now i want to block all the surnames which has "singh" ,"algates" etc.
#SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates")
$RestrictedNames="tiwary singh algates n2 n3 n4 n5 n6";
for(my $i=0;$i<#SplitNames;$i++)
{
if($RestrictedNames =~ m/^$SplitNames[$i]/ ) //google'd this condition, still fails
{
print "$SplitNames[$i] is a restricted person";
}
}
You should modify this line:
if($RestrictedNames =~ m/^$SplitNames[$i]/ )
to
if($RestrictedNames =~ m/$SplitNames[$i]/ )
^ looks for a match from the beginning.
For more details about perl metacharacters, see here
EDIT:
If you need blocking based on surnames, try this code in the for-loop body.
my #tokens = split(' ', $SplitNames[$i]); # splits name on basis of spaces
my $surname = $tokens[$#tokens]; # takes the last token
if($RestrictedNames =~ m/$surname/ )
{
print "$SplitNames[$i] is a restricted person\n";
}
Don't try dealing with a string of restricted names, deal with an array.
Then just use the smart match operator (~~ or two tilde characters) to see if a given string is in it.
#!/usr/bin/perl
use v5.12;
use strict;
use warnings;
my $RestrictedNames="n1 n2 n3 n4 n5 n6 n7 n8 n9";
my #restricted_names = split " ", $RestrictedNames;
say "You can't have foo" if 'foo' ~~ #restricted_names;
say "You can't have bar" if 'bar' ~~ #restricted_names;
say "You can't have n1" if 'n1' ~~ #restricted_names;
say "You can't have n1a" if 'n1a' ~~ #restricted_names;
Try something like below using Hash Slice:
my #users = ( "n10", "n12", "n13", "n4", "n5" );
my #r_users = ( "n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9" ) ;
my %check;
#check{#r_users} = ();
foreach my $user ( #users ) {
if ( exists $check{$user} ) {
print"Restricted User: $user \n";
}
}
Most idiomatic way would be to create a hash of the restricted names, then split the surname from the name and check if the surname is in the hash.
use strict;
use warnings;
my #SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates");
my $RestrictedNames = "tiwar y singh algates n2 n3 n4 n5 n6";
# Create hash of restricted names
my %restricted;
map { $restricted{$_}++ } split(' ', $RestrictedNames);
# Loop over names and check if surname is in the hash
for my $name (#SplitNames) {
my $surname = (split(' ', $name))[-1];
if ( $restricted{$surname} ) {
print "$name is a restricted person\n";
}
}
Please note that the split function normally takes a RegEx. However using ' ' with split is a special case. It splits on any length of whitespace, and also ignores any leading whitespace, so it's useful for splitting strings of individual words.
FYI, the equivalent to instr in perl is to use index($string, $substring). If $substring does not occur inside $string it will return -1. Any other value means $string contains $substring. However, when comparing lists it's much less hassle to use a hash like I have shown above... and unlike index, it won't match 'joyce' when you really only meant to match 'joy'.