Perl advice - Taking in a file and changing contents - perl

I have a request for some advice on how to approach this script I want to write in Perl. Basically I have a file that looks like :
id: 1
Relationship: ""
name: shelby
pet: 1
color:4
There are certain keywords such as pet and color that have numbers after then. I want to be able to take in a file and look for these keywords (there are 5 or 6 of them) and then change the number to the word that number corresponds to. That is to say for the keyword "Pet"---> 0 =dog, 1 = cat, 2=fish. And for the keyword "color" 0 = red, 1=blue,2=purple,3=brown,4=white. The script should find and change these numbers. The goal should be an output file that looks like:
id: 1
Relationship: ""
name: shelby
pet: cat
color:white
I've been struggling with how to do this for a while. I looked up online maybe I could do an array of hashes or something but I'm relatively new to Perl and don't know exactly how to even approach this problem.... Any advice would be much appreciated!
Thanks

If we're talking about a small set of values, you could have a hash of arrays:
%lookups = ( pet => [ "dog", "cat", "fish" ],
color => [ "red", "blue", "purple", "brown", "white" ] );
Then, when you're reading the file, check each keyword against the hash. If it has a key with that keyword in it, replace the value in the line you read with the value from the hash.

Usage: script.pl file.txt > output.txt
use strict;
use warnings;
my %tags = (
"pet" => [ qw(dog cat fish) ],
"color" => [ qw(red blue purple brown white) ],
);
my $rx = join '|', keys %tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}

This should do it
use strict;
my $inputFileName = 'E:\test.txt';
my $outputFileName = 'E:\test2.txt';
my %Colors = ( 1 => 'Red' , 2 => 'Green' , 4 => 'Blue' );
my %Pets = ( 1 => 'Dog' , 2 => 'Cat' );
open( IN , "<" , $inputFileName) or die "$inputFileName could not be opened $!";
open( OUT, ">" , $outputFileName) or die "$outputFileName could not be opened $!";
while(<IN>)
{
my $line = $_;
if (/^(\s*pet\s*:\s*)(\d+)/ )
{
$line = $1. $Pets{$2} . "\n";
}
elsif (/\s*^color\s*:\s*(\d+)/ )
{
$line = $1. $Colors{$2} . "\n";
}
print OUT $line;
}
close(IN);
close(OUT);

Using zigdon's suggestion
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my %param = (pet => [qw/dog cat fish/],
color => [qw/ red blue purple brown white/],
);
while (<DATA>) {
if (/^(pet|color):\s*(\d)$/) {
print "$1: $param{ $1 }[$2]\n";
}
else {
print;
}
}
__DATA__
id: 1
Relationship: ""
name: shelby
pet: 1
color:4

If there are not many cases, you can try something like this, to be run with perl -p:
if (/^id/)
{
s/\d+/%h=(1=>"dog",2=>"warf",3=>"ee");$h{$&}/e;
}
if (/^other/)
{
s/\d+/%h=(1=>"other_thing",3=>"etc",4=>"etc2");$h{$&}/e;
}
EDIT:
To automate the tests, you can do something like (also taking the idea of hashes from zigdon):
my #interesting_tags = ("color", "pet");
my $regexp = "(" . join("|" , #interesting_tags) . ")";
my %lookups = ( pet => [ "dog", "cat", "fish" ],
color => [ "red", "blue", "purple", "brown", "white" ] );
while (<>)
{
if (/$regexp/)
{
my $element = $&;
s/\d+/$lookups{$element}[$&]/e;
}
}

Related

Get key details from the value through grep

I am trying to find the key name as output by matching $country_value variable in grep through the hash I have.
#!/usr/bin/perl -w
use strict;
use warnings;
my $country_value = 1;
my $country = {
'IN' => [
1,
5
],
'US' => [
2,
6
],
'UK' => [
3,
7
]
};
my $country_details = grep { $_ eq $country_value } values %{$country};
print $country_details;
print "\n";
As per the hash, I need to get the output as IN because the value of IN is 1 and the $country_value is 1, which is what I am trying to find out.
But, I get the output as 0 instead of IN.
Can someone please help?
In your code, values returns a reference to an array. You need to dereference that to get a list for grep.
use warnings;
use strict;
my $country_value = 1;
my $country = {
'IN' => [
1,
5
],
'US' => [
2,
6
],
'UK' => [
3,
7
]
};
my $country_details;
for my $name (keys %{$country}) {
if (grep { $_ == $country_value } #{ $country->{$name} }) {
$country_details = $name;
last;
}
}
print $country_details, "\n";
Prints:
IN

How to find strings that contain a substring from a given list

I have a string that may or may not contain some specific words.
IF it contain the one of the works I want to print the string in a different color (depending on the word)
So I was thinking to have an array containing the list of the words (e.g. one for red word one for yellow and one for green as the example below:
push(#red_word, [ "error","ERROR","Assertion","assertion","Error","ASSERTION","Errors" ]);
push(#yellow_word, [ "WARNING","Warning","warning","PAUSED","Paused","paused","Warnings" ]);
push(#green_word, [ "ACTIVE","Active","active" ]);
$l is the string i want to check, I tried something like this
foreach my $l (#$lines) {
if ($l =~ #red_word) {
print '<FONT COLOR="FF0000">'.$l.'</FONT><br>';
}
else {
if ($l =~ #yellow_word) {
print '<FONT COLOR="FFFF00">'.$l.'</FONT><br>';
}
else {
if ($l =~ #green_word) {
print '<FONT COLOR="008000">'.$l.'</FONT><br>';
}
else {
print '<FONT COLOR="000000">'.$l.'</FONT><br>';
}
}
}
}
but the result is erratic, some lines are printed in red without any relation to the list red_word.
what am I doing wrong?
This isn't doing what you think it's doing:
push(#red_word, [ "error","ERROR","Assertion","assertion","Error","ASSERTION","Errors" ]);
push(#yellow_word, [ "WARNING","Warning","warning","PAUSED","Paused","paused","Warnings" ]);
push(#green_word, [ "ACTIVE","Active","active" ]);
You're creating a two dimensional data structure a single element array, containing a nested array.
$VAR1 = [
[
'error',
'ERROR',
'Assertion',
'assertion',
'Error',
'ASSERTION',
'Errors'
]
];
That match isn't going to work very well as a result. I'm not actually sure what it'll be doing, but it won't be testing 'if the word is in the list'.
Try instead building a regular expression from your array:
my #red_words = (
"error", "ERROR", "Assertion", "assertion",
"Error", "ASSERTION", "Errors"
);
my $is_red = join( "|", map {quotemeta} #red_words );
$is_red = qr/($is_red)/;
print "Red" if $line =~ m/$is_red/;
Perhaps something like this:
#!/usr/bin/env perl
use strict;
use warnings;
my %colour_map = (
'error' => 'FF0000',
'errors' => 'FF0000',
'assertion' => 'FF0000',
'warning' => 'FFFF00',
'warnings' => 'FFFF00',
'paused' => 'FFFF00',
'active' => '008000',
);
my $search = join( "|", map {quotemeta} keys %colour_map );
$search = qr/\b($search)\b/;
my #lines = (
"line containing assertion",
"a warning",
"green for active",
"A line containing ACTIVE"
);
foreach my $line (#lines) {
if ( my ($word) = $line =~ m/$search/ ) {
print "<FONT COLOR=\"$colour_map{lc($word)}\">$line</FONT><BR/>\n";
}
else {
print "<FONT COLOUR=\"000000\">$line</FONT><BR/>\n";
}
}
(Not entirely sure if there's a way to tranpose a list of matches. I'll have another think).

how to assign lines of variable length to a variable in perl?

I have a file I want to read that has a variable number of ids for each location that looks like this:
loc1 id1 id4 id5 id9
loc2 id2
loc3 id1 id11 id23
I would like to store this as follows locs(loc) = {all ids belonging to that location}
So that later, when I read another file I can do something like
if (grep id, locs(loc)){do something}
I tried to do this using a hash, but this is not working. I tried:
open my $loclist, '<', $ARGV[0];
my %locs;
while (<$loclist>) {
my #loclist_rec = split;
my $loclist_loc = #rlist_rec[0];
$locs{$loclist_loc} = #loclist_rec;
}
but this isnt working.
I new to perl and still trying to understand the different datatypes.
Any ideas? Thanks a lot!
This should do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open my $loclist, '<', "test.txt" or die $!;
my %locs;
while (<$loclist>) {
my ($loclist_loc, #loclist_rec) = split;
$locs{$loclist_loc} = \#loclist_rec;
}
print Dumper \%locs;
OUTPUT:
$ perl test.pl
$VAR1 = {
'loc2' => [
'id2'
],
'loc1' => [
'id1',
'id4',
'id5',
'id9'
],
'loc3' => [
'id1',
'id11',
'id23'
]
};
Also a possible choice would be a hash of hashes. When you want to look up an id, you could say if ($locs{$loc}{$id}) {do something}. The data structure would be
$VAR1 = {
'loc2' => {
'id2' => 1
},
'loc1' => {
'id1' => 1,
'id5' => 1,
'id4' => 1,
'id9' => 1
},
'loc3' => {
'id1' => 1,
'id11' => 1,
'id23' => 1
}
};

Perl Hash Comparison

I have two "different" files with the same kind of data i.e.
KEY_gl Start_gl End_gl
1 114029 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
KEY_gm Start_gm End_gm
1 115000 17
2 284 1624
3 1803 2942
4 3070 3282
5 3295 4422
I have saved these two different files in "hash" . The "Key" column is the key and the start and end are the values for these two different keys.
I have written a code to compare these two hashes and print out the "similar" and "non similar" keys from the files.
The Code is
my %hash_gl = ();
my %hash_gm = ();
open( my $fgl, "/home/gaurav/GMR/new_gl.txt" ) or die "Can't open the file";
while ( my $line_gl = <$fgl> ) {
chomp $line_gl;
my ( $key_gl, $start_gl, $end_gl ) = split( "\t", $line_gl );
$hash_gl{$key_gl} = [ $start_gl, $end_gl ];
}
while ( my ( $key_gl, $val_gl ) = each %hash_gl ) {
#print "$key_gl => #{$val_gl}\n";
}
open( my $fgm, "/home/gaurav/GMR/new_gm.txt" ) or die "Can't open the file";
while ( my $line_gm = <$fgm> ) {
chomp $line_gm;
my ( $key_gm, $start_gm, $end_gm ) = split( "\t", $line_gm );
$hash_gm{$key_gm} = [ $start_gm, $end_gm ];
}
while ( my ( $key_gm, $val_gm ) = each %hash_gm ) {
#print "$key_gm => #{$val_gm}\n";
}
for ( sort keys %hash_gl ) {
unless ( exists $hash_gm{$_} ) {
print "$_: not found in second hash\n";
next;
}
if ( $hash_gm{$_} == $hash_gl{$_} ) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
}
Kindly tell the errors in this as I am not getting the desired output.Also , I am sorry that I am new to this forum so I am not able to do the editing correctly.
After reading your files, the two hashes look like this. I created the output using Data::Dump's function dd.
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
KEY_gl => [ "Start_gl", "End_gl" ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
KEY_gm => [ "Start_gm", "End_gm" ],
);
As you can see, the values are array refs. You put them in array refs when saying $hash_gl{$key_gl} == [ $start_gl, $end_gl ]; (and the same for gm).
When you compare the two, you are using ==, which is used for numerical comparison. If you print one of the $hash_gm{$_} values, you will get something like this:
ARRAY(0x3bb114)
That's because it's an array ref. You cannot compare those using ==.
You now have two possibilities:
you can do the comparison yourself; to do that, you need to go into the array ref and compare each value:
if ( $hash_gm{$_}->[0] == $hash_gl{$_}->[0]
&& $hash_gm{$_}->[1] == $hash_gl{$_}->[1] )
{
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
you can install and use Array::Utils
use Array::Utils 'array_diff';
# later...
if (! array_diff( #{ $hash_gm{$_} }, #{ $hash_gl{$_} } )) {
print "$_: values are equal\n";
} else {
print "$_: values are not equal\n";
}
I would go with the first solution as that is more readable because you do not need the dereferencing and the effort to install a module just to save half a line of code is not worth it.
Assuming that you want to compare the values, say the start position, here's how I'd do it:
use warnings;
use strict;
open my $in, '<', '1.txt' or die "$!\n";
open my $in2, '<', '2.txt' or die "$!\n";
my (%hash1, %hash2);
while (<$in>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash1{$key} = [$start, $stop];
}
while (<$in2>){
chomp;
next unless /^\s+/;
my ($key, $start, $stop) = /\s+(\d+)\s+(\d+)\s+(\d+)/;
$hash2{$key} = [$start, $stop];
}
for my $key (sort keys %hash1){
if (#{$hash1{$key}}[0] == #{$hash2{$key}}[0]){
print "start matches: file1 #{$hash1{$key}}[0]\tfile2 #{$hash2{$key}}[0]\n";
}
else {print "start doesn't match: file1 #{$hash1{$key}}[0]\t file2 #{$hash2{$key}}[0]\n"};
}
#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my %hash_gl = (
1 => [ 114029, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 442 ],
);
my %hash_gm = (
1 => [ 115000, 17 ],
2 => [ 284, 1624 ],
3 => [ 1803, 2942 ],
4 => [ 3070, 3282 ],
5 => [ 3295, 4422 ],
);
sub check_hash_size {
my $hash_gl = shift;
my $hash_gm = shift;
if ((keys %$hash_gl) != (keys %$hash_gm)) {
say "the hashes are 2 different sizes";
}
else
{
say "the hashes are the same size";
}
}
sub diag_hashes {
my $hash_gl = shift;
my $hash_gm = shift;
for my $gl_key ( keys %$hash_gl ) {
if ( (scalar #{$$hash_gl{$gl_key}}) != (scalar #{$$hash_gm{$gl_key}}) ) {
say "$gl_key entry arrays are different sizes";
}
else
{
say "arrays are the same size for key $gl_key";
}
if ( ((scalar #{$$hash_gl{$gl_key}}) or (scalar #{$$hash_gm{$gl_key}})) > 2 ) {
say "$gl_key entry array exceeds 2 values";
}
if ($$hash_gl{$gl_key}[0] eq $$hash_gm{$gl_key}[0]) {
say "$gl_key start is the same in both hashes";
}
else
{
say "** key $gl_key start is different";
}
if ($$hash_gl{$gl_key}[1] eq $$hash_gm{$gl_key}[1]) {
print "$gl_key end is the same in both hashes","\n";
}
else
{
say "** key $gl_key end is different";
}
}
}
check_hash_size( \%hash_gl ,\%hash_gm);
diag_hashes( \%hash_gl ,\%hash_gm);

List of paths into hash array tree in Perl

I got an array of paths
C:\A
C:\B\C
D:\AB
and I'd like to have these in a hash array tree so I can go through them in a TT2 template.
What I mean is like this:
#dirs = [
{
name => "C:",
subs => [
{
name => "A",
subs => [],
},
{
name => "B",
subs => [
{
name => "C",
subs => [],
}
],
}
]
},
{
name => "D:",
subs => [
{
name => "AB",
subs => [],
}
],
}
]
I also know that I'm probably doing brainderp here so I'm open to other approaches, only requirement is turning that list of paths into something you can rebuild as a tree with the TT2 Template Toolkit.
Also what's that structure called? I just thought of hash array tree but I bet that's wrong.
Here's a very short approach. Note that this can only be so simple because I changed your data format to a hash of hashes which perfectly matches your tree structure. See the code below to transform the resulting structure to yours.
my $tree = {root => {}};
foreach my $input (<DATA>) { chomp $input;
my $t = $tree;
$t = $t->{$_} //= {} for split /\\/ => $input;
}
use Data::Dumper; print Dumper $tree;
__DATA__
C:\A
C:\B\C
D:\AB
C:\B\A
C:\B\A\C
Output:
$VAR1 = {
'C:' => {
'A' => {},
'B' => {
'A' => {
'C' => {}
},
'C' => {}
}
},
'D:' => {
'AB' => {}
}
};
To transform this data structure into yours, simply use this code:
sub transform {
my $tree = shift;
my #children = ();
while (my ($name, $children) = each %$tree) {
push #children, {
name => $name,
subs => [ transform($children) ],
}
}
return #children;
}
my $AoH_tree = {name => 'root', subs => [transform($tree)] };
Done. :) For a completely different approach with much more sugar, power and readability, but much more LOC, see my other answer.
This is a longer but much more readable and more comfortable solution. You don't have to (and probably don't want to) use this, but maybe it can help (not only you) to learn more about different approaches. It introduces a small Moo class for tree nodes which can add names recursively to itself with readable sorting and stringification methods.
Edit: for a completely different and extremely short alternative, see my other answer. I divided it up in two answers because they are completely different approaches and because this answer is already long enough. ;)
Tree class
Note this is basically no more than your nested AoHoAoH... structure - with a litte bit sugar added. ;)
# define a tree structure
package Tree;
use Moo; # activates strict && warnings
use List::Util 'first';
# name of this node
has name => (is => 'ro');
# array ref of children
has subs => (is => 'rw', isa => sub { die unless ref shift eq 'ARRAY' });
Now after the basic preparations (our objects have one scalar name and one array ref subs) we come to the main part of this answer: the recursive add_deeply method. Note that from here everything reflects the recursive nature of your data structure:
# recursively add to this tree
sub add_deeply {
my ($self, #names) = #_;
my $next_name = shift #names;
# names empty: do nothing
return unless defined $next_name;
# find or create a matching tree
my $subtree = first {$_->name eq $next_name} #{$self->subs};
push #{$self->subs}, $subtree = Tree->new(name => $next_name, subs => [])
unless defined $subtree;
# recurse
$subtree->add_deeply(#names);
}
The following two methods are not that important. Basically they are here to make the output pretty:
# sort using node names
sub sort {
my $self = shift;
$_->sort for #{$self->subs}; # sort my children
$self->subs([ sort {$a->name cmp $b->name} #{$self->subs} ]); # sort me
}
# stringification
use overload '""' => \&to_string;
sub to_string {
my $self = shift;
my $prefix = shift // '';
# prepare
my $str = $prefix . '{TREE name: "' . $self->name . '"';
# stringify children
if (#{$self->subs}) {
$str .= ", children: [\n";
$str .= $_->to_string(" $prefix") for #{$self->subs};
$str .= "$prefix]";
}
# done
return $str . "}\n";
}
How to use this
Now comes the simple part. Just read the input (from __DATA__ here) and add_deeply:
# done with the tree structure: now use it
package main;
# parse and add names to a tree
my $tree = Tree->new(name => 'root', subs => []);
foreach my $line (<DATA>) {
chomp $line;
$tree->add_deeply(split /\\/ => $line);
}
# output
$tree->sort;
print $tree;
__DATA__
C:\A
C:\B\C
D:\AB
C:\B\A
C:\B\A\C
Output:
{TREE name: "root", children: [
{TREE name: "C:", children: [
{TREE name: "A"}
{TREE name: "B", children: [
{TREE name: "A", children: [
{TREE name: "C"}
]}
{TREE name: "C"}
]}
]}
{TREE name: "D:", children: [
{TREE name: "AB"}
]}
]}
I did one with a complex hash structure keeping track of already placed nodes, and then I did this one. More steps, but somewhat leaner code.
while ( <> ) {
chomp;
my $ref = \#dirs;
foreach my $dir ( split /\\/ ) {
my $i = 0;
$i++ while ( $ref->[$i] and $ref->[$i]{name} ne $dir );
my $r = $ref->[$i] ||= { name => $dir, subs => [] };
$ref = $r->{subs};
}
}