Text file to hash table - perl

I want to create a hash table for the data in my file.
The file contains a bunch of commands that are written as
===|showcommand|
Every time I see this delimiter I want to create a hash key and store the data below it as an array in the value until it sees the next delimiter.
The next delimiter will do the same thing which is to create a hash key with the delimiter name and store the data on the next lines following it into an array as a value.

my %commands;
my $name;
my $body;
while (<>) {
if (my ($new_name) = /===\|([^|]*)\|/) {
$commands{$name} = $body if defined($name);
$name = $new_name;
$body = '';
} else {
$body .= $_;
}
}
$commands{$name} = $body if defined($name);
Assumes the body of the command starts on the line after the header, and stop on the line before the one with the next header.

You probably have it already working, but adding a small comment still, regarding the question on how to return the hash from a function.
Here's an example:
Input -file (used the following, which, I think contains similar structure as your input -file.
===|showcommand|
cmd1
cmd2
cmd3
cmd4
===|testcommand|
command1
command2
command3
===|anothercommand|
another1
another2
another3
another4
Perl -script:
use strict;
# Calling ReadCommandFile to build hash.
my %commands = ReadCommandFile("./commands.txt");
# ReadCommandFile - reads commands.txt and builds
# a hash.
sub ReadCommandFile()
{
my $file = shift;
my %hash = ();
my $name;
open(FILE, "<$file");
while(<FILE>)
{
if($_ =~ /===\|(.*)\|/)
{
$name = $1;
$hash{$name} = [];
}
else
{
my $line = $_;
$line =~ s/\n$//;
push(#{$hash{$name}}, $line);
}
}
close(FILE);
return %hash;
}
As a result, you should get the following hash (output from Data::Dumper):
$VAR1 = 'anothercommand';
$VAR2 = [
'another1',
'another2',
'another3',
'another4'
];
$VAR3 = 'showcommand';
$VAR4 = [
'cmd1',
'cmd2',
'cmd3',
'cmd4'
];
$VAR5 = 'testcommand';
$VAR6 = [
'command1',
'command2',
'command3'
];
You can then access individual elements like this:
Get the third command from "showcommand":
print "\nCommand #3: " . $commands{'showcommand'}[2];
Output: cmd3
The data from the file is copied to a hash and the commands are added as an array under the respective keywords.
Thanks!

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

Ignore empty lines in a CSV file using Perl

I have to read a CSV file, abc.csv, select a few fields from them and form a new CSV file, def.csv.
Below is my code. I am trying to ignore empty lines from abc.csv.
genNewCsv();
sub genNewCsv {
my $csv = Text::CSV->new();
my $aCsv = "abc.csv"
my $mCsv = "def.csv";
my $fh = FileHandle->new( $aCsv, "r" );
my $nf = FileHandle->new( $mCsv, "w" );
$csv->print( $nf, [ "id", "flops""ub" ] );
while ( my $row = $csv->getline($fh) ) {
my $id = $row->[0];
my $flops = $row->[2];
next if ( $id =~ /^\s+$/ ); #Ignore empty lines
my $ub = "TRUE";
$csv->print( $nf, [ $id, $flops, $ub ] );
}
$nf->close();
$fh->close();
}
But I get the following error:
Use of uninitialized value $flops in pattern match (m//)
How do I ignore the empty lines in the CSV file?
I have used Stack Overflow question Remove empty lines and space with Perl, but it didn't help.
You can skip the entire row if any fields are empty:
unless(defined($id) && defined($flop) && defined($ub)){
next;
}
You tested if $id was empty, but not $flops, which is why you got the error.
You should also be able to do
unless($id && $flop && $ub){
next;
}
There, an empty string would evaluate to false.
Edit: Now that I think about it, what do you mean by ignore lines that aren't there?
You can also do this
my $id = $row[0] || 'No Val' #Where no value is anything you want to signify it was empty
This will show a default value for the the variable, if the first value evaluated to false.
Run this on your file first to get non-empty lines
sub removeblanks {
my #lines = ();
while(#_) {
push #lines, $_ unless( $_ =~ /^\s*$/ ); #add any other conditions here
}
return \#lines;
}
You can do the following to skip empty lines:
while (my $row = $csv->getline($fh)){
next unless grep $_, #$row;
...
You could use List::MoreUtils to check if any of the fields are defined:
use List::MoreUtils qw(any);
while(my $row = ...) {
next unless any { defined } #{ $row };
...
}

Perl Help Needed: Replacing values

I am having an input file like this:
Input file
I need to replace the value #pSBSB_ID="*" of #rectype=#pRECTYPE="SBSB" with #pMEME_SSN="034184233", value of #pRECTYPE="SMSR", ..and have to delete the row where #rectype='#pRECTYPE="SMSR", '
Example:
So, after changes have been made, the file should be like this:
....#pRECTYPE="SBSB", #pGWID="17199269", #pINPUT_METHOD="E", #pGS08="005010X220A1", #pSBSB_FAM_UPDATE_CD="UP", #pSBSB_ID="034184233".....
....#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2013", #pSBEL_UPDATE_CD="TM", #pCSPD_CAT="M", #pCSPI_ID="MHMO1003"
.
.
.
Update
I tried below mentioned code:
Input file extension: mms and there are multiple files to process.
my $save_for_later;
my $record;
my #KwdFiles;
my $r;
my $FilePath = $ARGV[0];
chdir($FilePath);
#KwdFiles = <*>;
foreach $File(#KwdFiles)
{
unless(substr($File,length($File)-4,length($File)) eq '.mms')
{
next;
}
unless(open(INFILE, "$File"))
{
print "Unable to open file: $File";
exit(0);
}
print "Successfully opened the file: \"$File\" for processing\n\n";
while ( my $record = <INFILE> ) {
my %r = $record =~ /\#(\w+) = '(.*?)'/xg;
if ($r{rectype} eq "SMSR") {
$save_for_later = $r{pMEME_SSN};
next;
}
elsif ($r{rectype} eq "SBSB" and $r{pSBSB_ID} eq "*") {
$record =~ s|(\#pSBSB_ID = )'.*?'|$1'$save_for_later'|x;
}
close(INFILE);
}
}
But, I am still not getting the updated values in the file.
#!/usr/bin/perl
open IN, "< in.txt";
open OUT, "> out.txt";
my $CUR_RECID = 1^1;
while (<IN>) {
if ($CUR_RECID) {
s/recname='.+?'/recname='$CUR_RECID'/ if /rectype='DEF'/;
$CUR_RECID = 1^1;
print OUT;
}
$CUR_RECID = $1 if /rectype='ABC'.+?rec_id='(.+?)'/;
}
close OUT;
close IN;
Try that whole code. No need a separate function; This code does everything.
Run this script from your terminal with the files to be modified as arguments:
use strict;
use warnings;
$^I = '.bak'; #modify original file and create a backup of the old ones with .bak appended to the name
my $replacement;
while (<>) {
$replacement = $1 if m/(?<=\#pMEME_SSN=)("\d+")/; #assume replacement will be on the first line of every file.
next if m/^\s*\#pRECTYPE="SMSR"/;
s/(?<=\#pSBSB_ID=)("\*")/$replacement/g;
print;
}

Relative Record Separator in Perl

I have a data that looks like this:
id:40108689 --
chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),
id:40108116 --
chr22_scrambled_bysegments:25375481:F : chr22_scrambled_bysegments:25375481:F (1.0),
chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),
id:1 --
chr22:21133765:F : chr22:21133765:F (0.0),
So each record is separated by id:[somenumber] --
What's the way to access the data so that we can have a hash of array:
$VAR = { 'id:40108689' => [' chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),'],
'id:40108116' => ['chr22_scrambled_bysegments:25375481:F :chr22_scrambled_bysegments:25375481:F (1.0)',
'chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),'
#...etc
}
I tried to approach this using record separator. But not sure how to generalize it?
{
local $/ = " --\n"; # How to include variable content id:[number] ?
while ($content = <INFILE>) {
chomp $content;
print "$content\n" if $content; # Skip empty records
}
}
my $result = {};
my $last_id;
while (my $line = <INFILE>) {
if ($line =~ /(id:\d+) --/) {
$last_id = $1;
next;
}
next unless $last_id; # Just in case the file doesn't start with an id line
push #{ $result->{$last_id} }, $line;
}
use Data::Dumper;
print Dumper $result;
Uses the normal record separator.
Uses $last_id to keep track of the last id row encountered and is set to the next id when another one is encountered. Pushes non-id rows on to an array for the hash key of the last matched id line.

How to use a perl module that you have written?

I've just written my first Perl module and am having trouble getting it to work with a script I produced also. Here is the error that the Perl interpreter displays when I attempt to run the script that is using my newly created module.
Error message:
scraper_tools_v1.pm did not return a true value at getYid.pl line 5.
BEGIN failed--compilation aborted at getYid.pl line 5.
scraper_tools_v1.pm is the Perl module which I have written and getYid.pl is the Perl script which attempts to utilize the scraper_tools_v1.pm module.
Here is the code for the scraper_tools_v1.pm file:
#!/usr/bin/perl
package scraper_tools_v1;
use strict;
use warnings;
use WWW::Curl::Easy;
# Note this function expects a single parameter which should be in the form of a URL
sub getWebPage($)
{
# Setting up the Curl parameters
my $curl = WWW::Curl::Easy->new; # create a variable to store the curl object
# A parameter set to 1 tells the library to include the header in the body output.
# This is only relevant for protocols that actually have headers preceding the data (like HTTP).
$curl->setopt(CURLOPT_HEADER, 1);
# Setting the target URL to retrieve with the passed parameter
$curl->setopt(CURLOPT_URL, #_);
# Declaring a variable to store the response from the Curl request
my $response_body = '';
# Creating a file handle for CURL to output to, then redirecting our output to the $response_body variable
open(my $fileb, ">",\$response_body) or die $!;
$curl->setopt(CURLOPT_WRITEDATA, $fileb);
# getting the return code from the header to see if the GET was successful
my $return_code = $curl->perform;
# capturing the response code from the GET request in the HTTP header, i.e... 200, 404, 500, etc...
# 200 is success
my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);
# if the return code is zero than the request was a success
if ($return_code == 0)
{
# A little debug output to keep you informed
print ("Success ". $response_code.": ".#_."\n");
# return whatever was contained on the web page that we just got using a GET
return $response_body;
}
else
{
print ("Failure ". $response_code.": ".#_."\n");
}
close($fileb); # close the file-handle
}
And here is the getYid.pl script which attempts to use the above module
#!/usr/bin/perl
use strict;
use warnings;
use scraper_tools_v1;
my %cat_links; # Hash that stores categories and their numbers (ID's)
my $web_page = scraper_tools_v1->getWebPage("http://something.com/categoryindex.aspx");
my #lines = split(/\n/, $web_page);
foreach my $line (#lines)
{
chomp($line);
if ($line =~ /<option value=\"{1}(.+)\">(.+)<\/option>/)
{
my $num = $1;
my $desc = $2;
$desc =~ s/\s+&\s+/ & /;
$cat_links{$desc} = $num;
}
}
my #allTargetUrls; # make a new array to store all the links we need to extract listings from
$web_page = ''; # Reset this variable so we can reuse it.
my $totalNumberOfListings = 0;
foreach my $key (keys %cat_links)
{
my $target = "http://something.com/categorydetail.aspx?id=$cat_links{$key}&exact_phrase=0";
$web_page = scraper_tools_v1->getWebPage($target);
#lines = split(/\n/, $web_page);
foreach my $line (#lines)
{
my $pages;
chomp($line);
if ($line =~ /We found (\d) listings for your search\./)
{
my $listingsInCat = $1;
print ("$cat_links{$key}, $listingsInCat");
$totalNumberOfListings += $listingsInCat;
}
if ($line =~ /Page 1 of (\d)/)
{
$pages = $1;
}
for (my $i = 1; $i <= $pages; $i++)
{
#build the target urls
my $pageUrl = "http://something.com/categorydetail.aspx?id=$key&search=&exact_phrase=True&city=&state=&zipcode=&page=$i";
push(#allTargetUrls, $pageUrl);
}
}
print("Total number of listings = ".$totalNumberOfListings);
}
Any help in resolving this issue would greatly be appreciated and please note that I have tested both files independently for interpreter errors and found nothing. Thanks to all for taking a look.
When you write a Perl module, you should always end the file with the line
1;
Perl executes code at the module level when the module is imported. If you don't return a true value (1 is true), then you'll get the error you describe. Essentially, Perl is informing you that the initialisation code in your module didn't succeed.