how to display the hash value from my sample data - perl

I'm learning perl at the moment, i wanted to ask help to answer this exercise.
My objective is to display the hash value of PartID 1,2,3
the sample output is displaying lot, wafer, program, version, testnames, testnumbers, hilimit, lolimit and partid values only.
sample data
lot=lot123
wafer=1
program=prgtest
version=1
Testnames,T1,T2,T3
Testnumbers,1,2,3
Hilimit,5,6,7
Lolimit,1,2,3
PartID,,,,
1,3,0,5
2,4,3,2
3,5,6,3
This is my code:
#!/usr/bin/perl
use strict;
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file ) or die("Error in command line arguments\n");
my $lotid = "";
open(DATA, $file) or die "Couldn't open file $file";
while(my $line = <DATA>) {
#print "$line";
if ( $line =~ /^lot=/ ) {
#print "$line \n";
my ($dump, $lotid) = split /=/, $line;
print "$lotid\n";
}
elsif ($line =~ /^program=/ ) {
my ($dump, $progid) = split /=/, $line;
print "$progid \n";
}
elsif ($line =~ /^wafer=/ ) {
my ($dump, $waferid) = split /=/, $line;
print "$waferid \n";
}
elsif ($line =~ /^version=/ ) {
my ($dump, $verid) = split /=/, $line;
print "$verid \n";
}
elsif ($line =~ /^testnames/i) {
my ($dump, #arr) = split /\,/, $line;
foreach my $e (#arr) {
print $e, "\n";
}
}
elsif ($line =~ /^testnumbers/i) {
my ($dump, #arr1) = split /\,/, $line;
foreach my $e1 (#arr1) {
print $e1, "\n";
}
}
elsif ($line =~ /^hilimit/i) {
my ($dump, #arr2) = split /\,/, $line;
foreach my $e2 (#arr2) {
print $e2, "\n";
}
}
elsif ($line =~ /^lolimit/i) {
my ($dump, #arr3) = split /\,/, $line;
foreach my $e3 (#arr3) {
print $e3, "\n";
}
}
}
Kindly help add to my code to display Partid 1,2,3 hash.

So I've rewritten your code a little to use a few more modern Perl idioms (along with some comments to explain what I've done). The bit I've added is near the bottom.
#!/usr/bin/perl
use strict;
# Added 'warnings' which you should always use
use warnings;
# Use say() instead of print()
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
# Use a lexical variable for a filehandle.
# Use the (safer) 3-argument version of open().
# Add $! to the error message.
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
# Read each record into $_ - which makes the following code simpler
while (<$fh>) {
# Match on $_
if ( /^lot=/ ) {
# Use "undef" instead of a $dump variable.
# split() works on $_ by default.
my (undef, $lotid) = split /=/;
# Use say() instead of print() - less punctuation :-)
say $lotid;
}
elsif ( /^program=/ ) {
my (undef, $progid) = split /=/;
say $progid;
}
elsif ( /^wafer=/ ) {
my (undef, $waferid) = split /=/;
say $waferid;
}
elsif ( /^version=/ ) {
my (undef, $verid) = split /=/;
say $verid;
}
elsif ( /^testnames/i) {
my (undef, #arr) = split /\,/;
# Changed all of these similar pieces of code
# to use the same variable names. As they are
# defined in different code blocks, they are
# completely separate variables.
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^testnumbers/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^hilimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^lolimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# And here's the new bit.
# If we're on the "partid" line, then read the next
# three lines, split each one and print the first
# element from the list returned by split().
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}
Update: By the way, there are no hashes anywhere in this code :-)
Update 2: I've just realised that you only have three different ways to process the data. So you can simplify your code drastically by using slightly more complex regexes.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
while (<$fh>) {
# Single value - just print it.
if ( /^(?:lot|program|wafer|version)=/ ) {
my (undef, $value) = split /=/;
say $value;
}
# List of values - split and print.
elsif ( /^(?:testnames|testnumbers|hilimit|lolimit)/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# Extract values from following lines.
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}

Related

Reading text file into hash and accessing values perl

I am trying to read text file content into hash but having some problem reading as well as accessing it.
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/
The file format will be as above. My desired output is a hash with the non spacing line as key, and the path lines as values. I would like to know also how to access each values for different keys.
resctrl_top => /path/to/a/
vdm05top => /path/to/b/,/path/to/c/,...
Below are the effort I tried:
use strict;
use warnings;
my %hash;
open FILE, "filename.txt" or die $!;
my $key;
while (my $line = <FILE>) {
chomp($line);
if ($line !~ /^\s/) {
($key) = $line =~ /^\S+/g;
$hash{$key} = [];
} else {
$line =~ s/^\s+//;
push #{ $hash{$key} }, $line;
}
}
close FILE;
foreach (keys %hash){
print "$key => $hash{$key}\n";
}
Try this way:
use strict;
use warnings;
use Data::Dumper;
my %hash;
my $key;
while (my $line = <DATA>) {
chomp($line);
if ($line !~ /^\s/) {
$key = $line;
} else {
$line =~ s/\s//g;
push (#{$hash{$key}} , $line);
}
}
my %final;
foreach my $k (keys %hash){
my $val = join(",", #{$hash{$k}});
$final{$k} = $val; #New hash will have key and respective values
}
print Dumper(\%final);
__DATA__
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/
Result:
$VAR1 = {
'vdm05top' => '/path/to/b/,/path/to/c/,/path/to/d/,/path/to/e/,/path/to/f/',
'resctrl_top' => '/path/to/a/'
};
Hope this solves your problem.
Here's a pretty simple solution.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper; # Just for output
my ($key, %hash); # Declare globals
while (<DATA>) { # Quick hack - read from DATA
chomp;
if (/^\s/) { # If the line starts with a space
s/^\s+//;
push #{$hash{$key}}, $_;
} else { # The line is a key
$key = $_;
}
}
say Dumper \%hash;
__DATA__
resctrl_top
/path/to/a/
vdm05top
/path/to/b/
/path/to/c/
/path/to/d/
/path/to/e/
/path/to/f/

Correct use of Perl "exists"

I have two files. The first two columns in both are chromosome loci and genotypes, for instance chr1:1736464585 and T/G.
I have put the first two columns into a hash. I want to check whether the hash key (the chromosome locus) exists in the second file.
I have written this Perl program and have tried many variations but I'm not sure if I'm using exists correctly: it gives the error exists is not an HASH or ARRAY element or a subroutine.
#!/usr/bin/perl
use strict;
use warnings;
my $output = "annotated.txt";
open( O, ">>$output" );
my $filename = "datatest.txt";
my $filename2 = "MP2.txt";
chomp $filename;
chomp $filename2;
my %hash1 = ();
open( FN1, $filename ) or die "Can't open $filename: $!";
my #lines = <FN1>;
foreach my $line (#lines) {
my #split = split /\t/, $line;
if ( $line =~ /^chr/ ) {
my ( $key, $value ) = ( $split[0], $split[1] );
$hash1{$key} = $value;
}
}
my $DATA;
open( $DATA, $filename2 ) or die $!;
my #lines2 = <$DATA>;
foreach my $line2 (#lines2) {
my #split2 = split /\t/, $line2;
if ( $line2 =~ /^chr/ ) {
if ( exists %hash1{$key} ) {
print "$line2\n";
}
}
}
The syntax of the following line is incorrect:
if (exists %hash1{$key}) { ... }
This should be:
if (exists $hash1{$key}) { ... }

Perl output format

I'm reading a log file and grouping it based on the 'Program' name and in turn its ID.
LOG FILE
------------------------------------------
DEV: COM-1258
Program:Testing
Reviewer:Jackie
Description:New Entries
rev:r145201
------------------------------------------
QA: COM-9696
Program:Testing
Reviewer:Poikla
Description:Some random changes
rev:r112356
------------------------------------------
JIRA: COM-1234
Program:Development
Reviewer:John Wick
Description:Genral fix
rev:r345676
------------------------------------------
JIRA:COM-1234
Program:Development
Reviewer:None
Description:Updating Received
rev:r909276
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Balise Mat
Description:Audited
rev:r876391
------------------------------------------
JIRA: COM-8585
Program:Testing
Reviewer:Gold frt
Description: yet to be reviewed
rev:r565639
The code I have,
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Terse = 1;
my $file = "log.txt";
open FH, $file or die "Couldn't open file: [$!]\n";
my $data = {};
my $hash = {};
while (<FH>)
{
my $line = $_;
chomp $line;
if ($line =~ m/(-){2,}/)
{
my $program = $hash->{Program} || '';
my $jira = $hash->{JIRA} || $hash->{QA} || $hash->{DEV} ||
+'';
if ($program && $jira)
{
push #{$data->{$program}{$jira}}, $hash;
$hash = {};
}
}
else
{
if ($line =~ m/:/)
{
my ($key, $value) = split /:\s*/, $line;
$hash->{$key} = $value;
}
elsif ($line =~ m#/# && exists $hash->{Files})
{
$hash->{Files} .= "\n$line";
}
}
}
print 'data = ' . Dumper($data);
foreach my $prg (sort keys %{$data})
{
print "===========================================================
+=\n";
print " PROGRAM : $prg
+ \n";
print "===========================================================
+=\n";
foreach my $jira (sort keys %{$data->{$prg}})
{
print "******************\n";
print "JIRA ID : $jira\n";
print "******************\n";
foreach my $hash (#{$data->{$prg}{$jira}})
{
foreach my $key (keys %{$hash})
{
# print the data except Program and JIRA
next if $key =~ m/(Program|JIRA|DEV|QA)/;
print " $key => $hash->{$key}\n";
}
print "\n";
}
}
}
I have a requirement to print the output in the below format and currently unable to do so with my logic, any ideas would be really helpful.
PROGRAM: Development
Change IDs:
1.JIRA
a.COM-1234
PROGRAM: Testing
Change IDs:
1.JIRA
a.COM-6789
b.COM-8585
2.QA
a.COM-9696
3.DEV
a.COM-1258
I would write this
use strict;
use warnings 'all';
use List::Util 'uniq';
my $file = 'log.txt';
open my $fh, $file or die "Couldn't open file: [$!]\n";
my #data;
{
my %item;
while ( <$fh> ) {
chomp;
if ( eof or /\-{2,}/ ) {
push #data, { %item } if keys %item;
%item = ();
}
else {
my ( $key, $value ) = split /\s*:\s*/;
next unless $value;
$item{$key} = $value;
$item{jira} = $key if grep { $key eq $_ } qw/ JIRA DEV QA /;
}
}
}
my %data;
{
for my $item ( #data ) {
my ($prog, $jira) = #{$item}{qw/ Program jira /};
push #{ $data{$prog}{$jira} }, $item->{$jira};
}
}
for my $prog ( sort keys %data ) {
printf "PROGRAM: %s\n", $prog;
print "Change IDs:\n";
my $n = 1;
for my $jira ( qw/ JIRA QA DEV / ) {
next unless my $codes = $data{$prog}{$jira};
printf "%d.%s\n", $n++, $jira;
my $l = 'a';
printf " %s.%s\n", $l++, $_ for sort(uniq(#$codes));
}
print "\n";
}
output
PROGRAM: Development
Change IDs:
1.JIRA
a.COM-1234
PROGRAM: Testing
Change IDs:
1.JIRA
a.COM-6789
b.COM-8585
2.QA
a.COM-9696
3.DEV
a.COM-1258
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $file = 'test';
my $hash;
my $id_hash = ();
my $line_found = 0;
my $line_count = 1;
my $ID;
my $ID_num;
open (my $FH, '<', "$file") or warn $!;
while (my $line = <$FH> ) {
chomp($line);
if ( $line =~ m/------------------------------------------/){
$line_found = 1;
$line_count++;
next;
}
if ( $line_found ) {
$line =~ m/(.*?):(.*)/;
$ID = $1;
$ID_num = $2;
$line_found = 0;
}
if ( $line =~ m/Program:(.*)/ ) {
my $pro = $1;
push #{$hash->{$pro}->{$ID}}, ($ID_num) ;
}
$line_count++;
}
close $FH;
foreach my $pro (keys %$hash){
# print Dumper($pro);
print "PROGRAM:\t$pro\nChange IDs:\n";
foreach my $ids (keys $hash->{$pro}){
print "\t1. $ids\n";
foreach my $id (values $hash->{$pro}->{$ids}){
print "\t\ta. $id\n";
}
}
}
OUTPUT
PROGRAM: Testing
Change IDs:
1. QA
a. COM-9696
1. DEV
a. COM-1258
1. JIRA
a. COM-6789
a. COM-8585
PROGRAM: Development
Change IDs:
1. JIRA
a. COM-1234
a. COM-1234
Just change the output to your need!!

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

Extracting multiple lines of record/data using a subroutine or functions

Can you show me how to create a subroutine or function using this code?
Basically I want to make my code into a subroutine so I'll be able to re-use it without making my script too long.
Here is my script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my ($tmp_var, $rec_type, $country, $header, $Combline, $records, $line);
my $filename = 'data5.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'drp1.txt' or die $!;
open my $OUTPUTB, ">", 'drp2.txt' or die $!;
while (<$input_fh>) {
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if(/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}
This is the part of the script that I wanted to put in a subroutine or function:
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if (/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
I would suggest breaking it out a little differently and expand on your _trim function, turning it into a parse function:
use strict;
use warnings;
open( my $input_fh, '<', 'data5.txt' ) or die "Can't open $filename: $!";
open( my $OUTPUTA, '>', 'drp1.txt' ) or die $!;
open( my $OUTPUTB, '>', 'drp2.txt' ) or die $!;
my $header = '';
while (<$input_fh>) {
if ($_ =~ /^INVHDR/) {
$header = $_;
}
if ($_ =~ /^INVDET/) {
my #data = parse($_);
my $line = $header . join('|', #data);
# scalar <$input_fh> is almost certainly not doing what you expect,
# though I'm not sure what you're try to accomplish with it
if ( $data[1] =~ /^I/ ) {
print $OUTPUTA $line;
} else {
print $OUTPUTB $line;
}
}
}
sub parse {
my $input = shift || return;
my $input =~ s/"//g; # remove double quotes
# Here I've combined the removal of trailing spaces with the split.
my #fields = split( m{\s*\|}, $input );
return #fields;
}