perl prevent key value duplicates - perl

I'm looping through a file line by line, it has key->value pair that I'm then outputting to xml. How can I do a check to make sure I haven't already outputted this key/value pair?
In C# I would do it so easy by inserting into dictionary then just using .Contains(), any tips in perl?

Perl has the defined and exists keywords that operate on hash elements.
$hash{'foo'} = 'bar';
print defined $hash{'foo'}; # prints 1
print exists $hash{'foo'}; # prints 1
For most purposes, they do the same thing. The one subtle difference is when the hash value is the special "undefined" value:
$hash{'baz'} = undef;
print defined $hash{'baz'}; # doesn't print 1
print exists $hash{'baz'}; # prints 1

You can do the same thing using a perl hash.
my %seen;
while (my $line = <$filehandle>)
{
next if ($seen{$line});
print $line;
$seen{$line} = 1;
}

Related

print lines after finding a key word in perl

I have a variable $string and i want to print all the lines after I find a keyword in the line (including the line with keyword)
$string=~ /apple /;
I'm using this regexp to find the key word but I do not how to print lines after this keyword.
It's not really clear where your data is coming from. Let's assume it's a string containing newlines. Let's start by splitting it into an array.
my #string = split /\n/, $string;
We can then use the flip-flop operator to decide which lines to print. I'm using \0 as a regex that is very unlikely to match any string (so, effectively, it's always false).
for (#string) {
say if /apple / .. /\0/;
}
Just keep a flag variable, set it to true when you see the string, print if the flag is true.
perl -ne 'print if $seen ||= /apple/'
If your data in scalar variable we can use several methods
Recommended method
($matching) = $string=~ /([^\n]*apple.+)/s;
print "$matching\n";
And there is another way to do it
$string=~ /[^\n]*apple.+/s;
print $&; #it will print the data which is match.
If you reading the data from file, try the following
while (<$fh>)
{
if(/apple/)
{
print <$fh>;
}
}
Or else try the following one liner
perl -ne 'print <> and exit if(/apple/);' file.txt

Appending values to Hash if key is same in Perl

Problem is to read a file with value at every new line. Content of file looks like
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
With the string before first comma being the Key and string in between last and second last comma the Value
i.e. for the first line 3ssdwyeim3 becomes the key and 0.476 Value.
Now as we are looping over each line if the key exists we have to concatenate the values separated by comma.
Hence for the next new line as key already exists key remains 3ssdwyeim3 but the value is updated to 0.476,0.421.
Finally we have to print the keys and values in a file.
I have written a code to achieve the same, which is as follows.
sub findbreakdown {
my ( $out ) = #_;
my %timeLogger;
open READ, "out.txt" or die "Cannot open out.txt for read :$!";
open OUTBD, ">$out\_breakdown.csv" or die "Cannot open $out\_breakdown.csv for write :$!";
while ( <READ> ) {
if ( /(.*),.*,.*,.*,(.*),.*/ ) {
$btxnId = $1;
$time = $2;
if ( !$timeLogger{$btxnId} ) {
$timeLogger{$btxnId} = $time;
}
else {
$previousValue = $timeLogger{$btxnId};
$newValue = join ",", $previousValue, $time;
$timeLogger{$btxnId} = $newValue;
}
}
foreach ( sort keys %timeLogger ) {
print OUTBD "$_ ,$timeLogger{$_}\n";
}
}
close OUTBD;
close READ;
}
However Something is going wrong and its printing like this
3ssdwyeim3,0.476
3ssdwyeim3,0.476,0.421
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Whereas expected is:
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436
Your program is behaving correctly, but you are printing the current state of the entire hash after you process each line.
Therefore you are printing hash keys before they have the complete set of values, and you have many duplicated lines.
If you move the foreach loop that prints to the end of your program (or simply use the debugger to inspect the variables) you will find that the final state of the hash is exactly what you expect.
Edit: I previously thought the problem was the below, but it's because I misread the sample data in your question.
This regular expression is not ideal:
if (/(.*),.*,.*,.*,(.*),.*/) {
The .* is greedy and will match as much as possible (including some content with commas). So if any line contains more than six comma-separated items, more than one item will be included in the first matching group. This may not be a problem in your actual data, but it's not an ideal way to write the code. The expression is more ambiguous than necessary.
It would be better written like this:
if (/^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),[^,]*$/) {
Which would only match lines with exactly six items.
Or consider using split on the input line, which would be a cleaner solution.
This is much simpler than you have made it. You can just split each line into fields and use push to add the value to the list corresponding to the key
I trust you can modify this to read from an external file instead of the DATA file handle?
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
my #fields = split /,/;
push #{ $data{$fields[0]} }, $fields[-2];
}
for my $key ( sort keys %data ) {
print join(',', $key, #{ $data{$key} }), "\n";
}
__DATA__
3ssdwyeim3,3ssdwyeic9,2017-03-16,09:10:35.372,0.476,EndInbound
3ssdwyeim3,3ssdwyfyyn,2017-03-16,09:10:35.369,0.421,EndOutbound
3ssdwyfxc0,3ssdwyfxfi,2017-03-16,09:10:35.456,0.509,EndInbound
3ssdwyfxc0,3ssdwyhg0v,2017-03-16,09:10:35.453,0.436,EndOutbound
output
3ssdwyeim3,0.476,0.421
3ssdwyfxc0,0.509,0.436

Perl - How to create commands that users can input in console?

I'm just starting in Perl and I'm quite enjoying it. I'm writing some basic functions, but what I really want to be able to do is to use those functions intelligently using console commands. For example, say I have a function adding two numbers. I'd want to be able to type in console "add 2, 4" and read the first word, then pass the two numbers as parameters in an "add" function. Essentially, I'm asking for help in creating some basic scripting using Perl ^^'.
I have some vague ideas about how I might do this in VB, but Perl, I have no idea where I'd start, or what functions would be useful to me. Is there something like VB.net's "Split" function where you can break down the contents of a scalar into an array? Is there a simple way to analyse one word at a time in a scalar, or iterate through a scalar until you hit a separator, for example?
I hope you can help, any suggestions are appreciated! Bear in mind, I'm no expert, I started Perl all of a few weeks ago, and I've only been doing VB.net half a year.
Thank you!
Edit: If you're not sure what to suggest and you know any simple/intuitive resources that might be of help, that would also be appreciated.
Its rather easy to make a script which dispatches to a command by name. Here is a simple example:
#!/usr/bin/env perl
use strict;
use warnings;
# take the command name off the #ARGV stack
my $command_name = shift;
# get a reference to the subroutine by name
my $command = __PACKAGE__->can($command_name) || die "Unknown command: $command_name\n";
# execute the command, using the rest of #ARGV as arguments
# and print the return with a trailing newline
print $command->(#ARGV);
print "\n";
sub add {
my ($x, $y) = #_;
return $x + $y;
}
sub subtract {
my ($x, $y) = #_;
return $x - $y;
}
This script (say its named myscript.pl) can be called like
$ ./myscript.pl add 2 3
or
$ ./myscript.pl subtract 2 3
Once you have played with that for a while, you might want to take it further and use a framework for this kind of thing. There are several available, like App::Cmd or you can take the logic shown above and modularize as you see fit.
You want to parse command line arguments. A space serves as the delimiter, so just do a ./add.pl 2 3 Something like this:
$num1=$ARGV[0];
$num2=$ARGV[1];
print $num1 + $num2;
will print 5
Here is a short implementation of a simple scripting language.
Each statement is exactly one line long, and has the following structure:
Statement = [<Var> =] <Command> [<Arg> ...]
# This is a regular grammar, so we don't need a complicated parser.
Tokens are seperated by whitespace. A command may take any number of arguments. These can either be the contents of variables $var, a string "foo", or a number (int or float).
As these are Perl scalars, there is no visible difference between strings and numbers.
Here is the preamble of the script:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
strict and warnings are essential when learning Perl, else too much weird stuff would be possible. The use 5.010 is a minimum version, it also defines the say builtin (like a print but appends a newline).
Now we declare two global variables: The %env hash (table or dict) associates variable names with their values. %functions holds our builtin functions. The values are anonymous functions.
my %env;
my %functions = (
add => sub { $_[0] + $_[1] },
mul => sub { $_[0] * $_[1] },
say => sub { say $_[0] },
bye => sub { exit 0 },
);
Now comes our read-eval-loop (we don't print by default). The readline operator <> will read from the file specified as the first command line argument, or from STDIN if no filename is provided.
while (<>) {
next if /^\s*\#/; # jump comment lines
# parse the line. We get a destination $var, a $command, and any number of #args
my ($var, $command, #args) = parse($_);
# Execute the anonymous sub specified by $command with the #args
my $value = $functions{ $command }->(#args);
# Store the return value if a destination $var was specified
$env{ $var } = $value if defined $var;
}
That was fairly trivial. Now comes some parsing code. Perl “binds” regexes to strings with the =~ operator. Regexes may look like /foo/ or m/foo/. The /x flags allows us to include whitespace in our regex that doesn't match actual whitespace. The /g flag matches globally. This also enables the \G assertion. This is where the last successful match ended. The /c flag is important for this m//gc style parsing to consume one match at a time, and to prevent the position of the regex engine in out string to being reset.
sub parse {
my ($line) = #_; # get the $line, which is a argument
my ($var, $command, #args); # declare variables to be filled
# Test if this statement has a variable declaration
if ($line =~ m/\G\s* \$(\w+) \s*=\s* /xgc) {
$var = $1; # assign first capture if successful
}
# Parse the function of this statement.
if ($line =~ m/\G\s* (\w+) \s*/xgc) {
$command = $1;
# Test if the specified function exists in our %functions
if (not exists $functions{$command}) {
die "The command $command is not known\n";
}
} else {
die "Command required\n"; # Throw fatal exception on parse error.
}
# As long as our matches haven't consumed the whole string...
while (pos($line) < length($line)) {
# Try to match variables
if ($line =~ m/\G \$(\w+) \s*/xgc) {
die "The variable $1 does not exist\n" if not exists $env{$1};
push #args, $env{$1};
}
# Try to match strings
elsif ($line =~ m/\G "([^"]+)" \s*/xgc) {
push #args, $1;
}
# Try to match ints or floats
elsif ($line =~ m/\G (\d+ (?:\.\d+)? ) \s*/xgc) {
push #args, 0+$1;
}
# Throw error if nothing matched
else {
die "Didn't understand that line\n";
}
}
# return our -- now filled -- vars.
return $var, $command, #args;
}
Perl arrays can be handled like linked list: shift removes and returns the first element (pop does the same to the last element). push adds an element to the end, unshift to the beginning.
Out little programming language can execute simple programs like:
#!my_little_language
$a = mul 2 20
$b = add 0 2
$answer = add $a $b
say $answer
bye
If (1) our perl script is saved in my_little_language, set to be executable, and is in the system PATH, and (2) the above file in our little language saved as meaning_of_life.mll, and also set to be executable, then
$ ./meaning_of_life
should be able to run it.
Output is obviously 42. Note that our language doesn't yet have string manipulation or simple assignment to variables. Also, it would be nice to be able to call functions with the return value of other functions directly. This requires some sort of parens, or precedence mechanism. Also, the language requires better error reporting for batch processing (which it already supports).

Grep to find item in Perl array

Every time I input something the code always tells me that it exists. But I know some of the inputs do not exist. What is wrong?
#!/usr/bin/perl
#array = <>;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
if (grep($match, #array)) {
print "found it\n";
}
The first arg that you give to grep needs to evaluate as true or false to indicate whether there was a match. So it should be:
# note that grep returns a list, so $matched needs to be in brackets to get the
# actual value, otherwise $matched will just contain the number of matches
if (my ($matched) = grep $_ eq $match, #array) {
print "found it: $matched\n";
}
If you need to match on a lot of different values, it might also be worth for you to consider putting the array data into a hash, since hashes allow you to do this efficiently without having to iterate through the list.
# convert array to a hash with the array elements as the hash keys and the values are simply 1
my %hash = map {$_ => 1} #array;
# check if the hash contains $match
if (defined $hash{$match}) {
print "found it\n";
}
You seem to be using grep() like the Unix grep utility, which is wrong.
Perl's grep() in scalar context evaluates the expression for each element of a list and returns the number of times the expression was true.
So when $match contains any "true" value, grep($match, #array) in scalar context will always return the number of elements in #array.
Instead, try using the pattern matching operator:
if (grep /$match/, #array) {
print "found it\n";
}
This could be done using List::Util's first function:
use List::Util qw/first/;
my #array = qw/foo bar baz/;
print first { $_ eq 'bar' } #array;
Other functions from List::Util like max, min, sum also may be useful for you
In addition to what eugene and stevenl posted, you might encounter problems with using both <> and <STDIN> in one script: <> iterates through (=concatenating) all files given as command line arguments.
However, should a user ever forget to specify a file on the command line, it will read from STDIN, and your code will wait forever on input
I could happen that if your array contains the string "hello", and if you are searching for "he", grep returns true, although, "he" may not be an array element.
Perhaps,
if (grep(/^$match$/, #array)) more apt.
You can also check single value in multiple arrays like,
if (grep /$match/, #array, #array_one, #array_two, #array_Three)
{
print "found it\n";
}

perl split on empty file

I have basically the following perl I'm working with:
open I,$coupon_file or die "Error: File $coupon_file will not Open: $! \n";
while (<I>) {
$lctr++;
chomp;
my #line = split/,/;
if (!#line) {
print E "Error: $coupon_file is empty!\n\n";
$processFile = 0; last;
}
}
I'm having trouble determining what the split/,/ function is returning if an empty file is given to it. The code block if (!#line) is never being executed. If I change that to be
if (#line)
than the code block is executed. I've read information on the perl split function over at
http://perldoc.perl.org/functions/split.html and the discussion here about testing for an empty array but not sure what is going on here.
I am new to Perl so am probably missing something straightforward here.
If the file is empty, the while loop body will not run at all.
Evaluating an array in scalar context returns the number of elements in the array.
split /,/ always returns a 1+ elements list if $_ is defined.
You might try some debugging:
...
chomp;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper( { "line is" => $_ } );
my #line = split/,/;
print Dumper( { "split into" => \#line } );
if (!#line) {
...
Below are a few tips to make your code more idiomatic:
The special variable $. already holds the current line number, so you can likely get rid of $lctr.
Are empty lines really errors, or can you ignore them?
Pull apart the list returned from split and give the pieces names.
Let Perl do the opening with the "diamond operator":
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work.
Say your input is in a file named input and contains
Campbell's soup,0.50
Mac & Cheese,0.25
Then with
#! /usr/bin/perl
use warnings;
use strict;
die "Usage: $0 coupon-file\n" unless #ARGV == 1;
while (<>) {
chomp;
my($product,$discount) = split /,/;
next unless defined $product && defined $discount;
print "$product => $discount\n";
}
that we run as below on Unix:
$ ./coupons input
Campbell's soup => 0.50
Mac & Cheese => 0.25
Empty file or empty line? Regardless, try this test instead of !#line.
if (scalar(#line) == 0) {
...
}
The scalar method returns the array's length in perl.
Some clarification:
if (#line) {
}
Is the same as:
if (scalar(#line)) {
}
In a scalar context, arrays (#line) return the length of the array. So scalar(#line) forces #line to evaluate in a scalar context and returns the length of the array.
I'm not sure whether you're trying to detect if the line is empty (which your code is trying to) or whether the whole file is empty (which is what the error says).
If the line, please fix your error text and the logic should be like the other posters said (or you can put if ($line =~ /^\s*$/) as your if).
If the file, you simply need to test if (!$lctr) {} after the end of your loop - as noted in another answer, the loop will not be entered if there's no lines in the file.