Storing a file in a hash - Only stores first line? - perl

I am trying to read a file and store it into a hash. When i print out the contents of the hash only the first line from the file stores.
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dump;
local $/ = "";
my %parameters;
open(my $PARAMS, 'SimParams.conf')
or die "Unable to open file, $!";
while(<$PARAMS>) {
my #temp = split(/:\s*|\n/);
$parameters{$temp[0]} = $temp[1];
}
dd(\%parameters);
exit 0
The dd(\%parameters) shows only the first line of the file as key and value. How can I get all 3 lines to be Key and Value pairings in this hash?
EDIT: SimParams file as requested:
RamSize: 1000
PageSize: 200, 200
SysClock: 1
The datadump gives the output:
{ RamSize => "1000\r" }

The line
local $/ = "";
is reading your 3 line file as 1 chunk, the entire file. If you eliminate that code, your hash should be created.
You should probably chomp your input to remove the newline . Place it in your code before splitting to #temp.
chomp;
Borodin best explains what local $/ = ""; does.

Setting $/ to the null string enables paragraph mode. Each time you read from $PARAMS (which should be $params because it is a local variable) you will be given the next block of data until a blank line is encountered
It looks like there are no blank lines in your data, so the read will return the entire contents of the file
You don't say why you modified the value of $/, but it looks like just removing that assignment will get your code working properly

Related

Program argument is 100 but returns the value as 0100

Right now I am trying to do an assignment where I have to
- Extract information from an HTML file
- Save it to a scalar
- Run a regular expression to find the number of seats available in the designated course (the program argument is the course number for example 100 for ICS 100)
- If the course has multiple sessions, I have to find the sum of the seats available and print
- The output is just the number of seats available
The problem here is that when I was debugging and checking to make sure that my variable I have the program arg saved to was storing the correct value, it was storing the values with an extra 0 behind it.
ex.) perl filename.pl 100
ARGV[0] returns as 0100
I've tried storing the True regular expression values to an array, saving using multiple scalar variables, and changing my regular expression but none worked.
die "Usage: perl NameHere_seats.pl course_number" if (#ARGV < 1);
# This variable will store the .html file contents
my $fileContents;
# This variable will store the sum of seats available in the array #seatAvailable
my $sum = 0;
# This variable will store the program argument
my $courseNum = $ARGV[0];
# Open the file to read contents all at once
open (my $fh, "<", "fa19_ics_class_availability.html") or die ("Couldn't open 'fa19_ics_class_availability.html'\n");
# use naked brakets to limit the $/
{
#use local $/ to get <$fh> to read the whole file, and not one line
local $/;
$fileContents = <$fh>;
}
# Close the file handle
close $fh;
# Uncomment the line below to check if you've successfully extracted the text
# print $fileContents;
# Check if the course exists
die "No courses matched...\n" if ($ARGV[0] !~ m/\b(1[0-9]{2}[A-Z]?|2[0-8][0-9][A-Z]?|29[0-3])[A-Z]?\b/);
while ($fileContents =~ m/$courseNum(.+?)align="center">(\d)</) {
my $num = $2;
$sum = $sum + $num;
}
print $sum;
# Use this line as error checking to make sure #ARGV[0] is storing proper number
print $courseNum;
The current output I am receiving when program argument is 100 is just 0, and I assume it's because the regular expression is not catching any values as true therefore the sum remains at a value of 0. The output should be 15...
This is a link to the .html page > https://laulima.hawaii.edu/access/content/user/emeyer/ics/215/FA19/01/perl/fa19_ics_class_availability.html
You're getting "0100" because you have two print() statements.
print $sum;
...
print $courseNum;
And because there are no newlines or other output between them, you get the two values printed out next to each other. $sum is '0' and $courseNum is '100'.
So why is $sum zero? Well, that's because your regex isn't picking up the data you want it to match. Your regex looks like this:
m/$courseNum(.+?)align="center">(\d)</
You're looking for $courseNum followed by a number of other characters, followed by 'align="center">' and then your digit. This doesn't work for a number of reasons.
The string "100" appears many times in your text. Many times it doesn't even mean a course number (e.g. "100%"). Perhaps you should look for something more precise (ICS $coursenum).
The .+? doesn't do what you think it does. The dot doesn't match newline characters unless you use the /s option on the match operator.
But even if you fix those first two problems, it still won't work as there are a number of numeric table cells for each course and you're doing nothing to ensure that you're grabbing the last one. Your current code will get the "Curr. Enrolled" column, not the "Seats Avail" one.
This is a non-trivial HTML parsing problem. It shouldn't be addressed using regexes (HTML should never be parsed using regexes). You should look at one of the HTML parsing modules from CPAN - I think I'd use Web::Query.
Update: An example solution using Web::Query:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Basename;
use Web::Query;
my $course_num = shift
or die 'Usage: perl ' . basename $0 . " course_number\n";
my $source = 'fa19_ics_class_availability.html';
open my $fh, '<', $source
or die "Cannot open '$source': $!\n";
my $html = do { local $/; <$fh> };
my $count_free;
wq($html)
# Get each table row in the table
->find('table.listOfClasses tr')
->each(sub {
my ($i, $elem) = #_;
my #tds;
# Get each <td> in the <tr>
$elem->find('td')->each(sub { push #tds, $_[1] });
# Ignore rows that don't have 13 columns
return if #tds != 13;
# Ignore rows that aren't about the right course
return if $tds[2]->text ne "ICS $course_num";
# Add the number of available places
$count_free += $tds[8]->text;
});
say $count_free;

perl - fetch column names from file

I have the following command in my perl script:
my #files = `find $basedir/ -type f -iname '$sampleid*.summary.csv'`; #there are multiple summary.csv files in my basedir. I store them in an array
my $summary = `tail -n 1 $files[0]`; #Each summary.csv contains a header line and a line with data. I fetch here the last line.
chomp($summary);
my #sp = split(/,/,$summary); # I split based on ','
my $gender = $sp[11]; # the values from column 11 are stored in $gender
my $qc = $sp[2]; # the values from column 2 are stored in $gender
Now, I'm experiencing the situation where my *summary.csv files don't have the same number of columns. They do all have 2 lines, where the first line represents the header.
What I want now is not storing the values from column 11 in gender, but I want to store the values from the column 'Gender' in $gender.
How can I achieve this?
First try at solution:
my %hash = ();
my $header = `head -n 1 $files[0]`; #reading the header
chomp ($header);
my #colnames = split (/,/,$header);
my $keyfield = $colnames[#here should be the column with the name 'Gender']
push #{ $hash{$keyfield} };
my $gender = $sp[$keyfield]
You will have to read the header line as well as the data to know what column holds which information. This is done easiest by writing actual Perl code instead of shelling out to various command line utilities. See further below for that solution.
Fixing your solution also requires a hash. You need to read the header line first, store the header fields in an array (as you've already done), and then read the data line. The data needs to be a hash, not an array. A hash is a map of keys and values.
# read the header and create a list of header fields
my $header = `head -n 1 $files[0]`;
chomp ($header);
my #colnames = split (/,/,$header);
# read the data line
my $summary = `tail -n 1 $files[0]`;
chomp($summary);
my %sp; # use a hash for the data, not an array
# use a hash slice to fill in the columns
#sp{#colnames} = split(/,/,$summary);
my $gender = $sp{Gender};
The tricky part here is this line.
#sp{#colnames} = split(/,/,$summary);
We have declared %sp as a hash, but we now access it with a # sigil. That's because we are taking a hash slice, as indicated by the curly braces {}. The slice we take is all elements with the names of the values in #colnames. There is more than one value, so the return value is not a scalar (with a $) any more. There is a list of return values, so the sigil turns to #. Now we use that list on the left hand side (that's called an LVALUE), and assign the result of the split to that list.
Doing it with modern Perl
The following program will use File::Find::Rule to replace your find command, and Text::CSV to read the CSV file. It grabs all the files, then opens one at a time. The header line will be read first, and fed into the Text::CSV object, so that it can then give back a hash reference, which you can use to access every field by name.
I've written it in a way that it will only read one line for each file, as you said there are only two lines per file. You can easily extend that to be a loop.
use strict;
use warnings;
use File::Find::Rule;
use Text::CSV;
my $sampleid;
my $basedir;
my $csv = Text::CSV->new(
{
binary => 1,
sep => ',',
}
) or die "Cannot use CSV: " . Text::CSV->error_diag;
my #files = File::Find::Rule->file()->name("$sampleid*.summary.csv")->in($basedir);
foreach my $file (#files) {
open my $fh, '<', $file or die "Can't open $file: $!";
# get the headers
my #cols = #{ $csv->getline($fh) };
$csv->column_names(#cols);
# read the first line
my $row = $csv->getline_hr($fh);
# do whatever you you want with the row
print "$file: ", $row->{gender};
}
Please note that I have not tested this program.

How to read large files with different line delimiters?

I have two very large XML files that have different kinds of line endings.
File A has CR LF at the end of each XML record. File B has only CR at the end of each XML record.
In order to read File B properly, I need to set the built-in Perl variable $/ to "\r".
But if I'm using the same script with File A, the script does not read each line in the file and instead reads it as a single line.
How can I make the script compatible with text files that have various line ending delimiters? In the code below, the script is reading XML data and then using regex to split records based on a specific XML tag record ending tag like <\record>. Finally it writes the requested records to a file.
open my $file_handle, '+<', $inputFile or die $!;
local $/ = "\r";
while(my $line = <$file_handle>) { #read file line-by-line. Does not load whole file into memory.
$current_line = $line;
if ($spliceAmount > $recordCounter) { #if the splice amount hasn't been reached yet
push (#setofRecords,$current_line); #start adding each line to the set of records array
if ($current_line =~ m|$recordSeparator|) { #check for the node to splice on
$recordCounter ++; #if the record separator was found (end of that record) then increment the record counter
}
}
#don't close the file because we need to read the last line
}
$current_line =~/(\<\/\w+\>$)/;
$endTag = $1;
print "\n\n";
print "End Tag: $endTag \n\n";
close $file_handle;
While you may not need it for this, in theory, to parse .xml, you should use an xml parser. I'd recommend XML::LibXM or perhaps to start off with XML::Simple.
If the file isn't too big to hold in memory, you can slurp the whole thing into a scalar and split it into the correct lines yourself with a suitably flexible regular expression. For example,
local $/ = undef;
my $data = <$file_handle>;
my #lines = split /(?>\r\n)|(?>\r)|(?>\n)/, $data;
foreach my $line (#lines) {
...
}
Using a look-ahead assertion (?>...) preserves the end-of-line characters like the regular <> operator does. If you were just going to chomp them anyway, you can save yourself a step by passing /\r\n|\r|\n/ to split instead.

Perl: How to add a line to sorted text file

I want to add a line to the text file in perl which has data in a sorted form. I have seen examples which show how to append data at the end of the file, but since I want the data in a sorted format.
Please guide me how can it be done.
Basically from what I have tried so far :
(I open a file, grep its content to see if the line which I want to add to the file already exists. If it does than exit else add it to the file (such that the data remains in a sorted format)
open(my $FH, $file) or die "Failed to open file $file \n";
#file_data = <$FH>;
close($FH);
my $line = grep (/$string1/, #file_data);
if($line) {
print "Found\n";
exit(1);
}
else
{
#add the line to the file
print "Not found!\n";
}
Here's an approach using Tie::File so that you can easily treat the file as an array, and List::BinarySearch's bsearch_str_pos function to quickly find the insert point. Once you've found the insert point, you check to see if the element at that point is equal to your insert string. If it's not, splice it into the array. If it is equal, don't splice it in. And finish up with untie so that the file gets closed cleanly.
use strict;
use warnings;
use Tie::File;
use List::BinarySearch qw(bsearch_str_pos);
my $insert_string = 'Whatever!';
my $file = 'something.txt';
my #array;
tie #array, 'Tie::File', $file or die $!;
my $idx = bsearch_str_pos $insert_string, #array;
splice #array, $idx, 0, $insert_string
if $array[$idx] ne $insert_string;
untie #array;
The bsearch_str_pos function from List::BinarySearch is an adaptation of a binary search implementation from Mastering Algorithms with Perl. Its convenient characteristic is that if the search string isn't found, it returns the index point where it could be inserted while maintaining the sort order.
Since you have to read the contents of the text file anyway, how about a different approach?
Read the lines in the file one-by-one, comparing against your target string. If you read a line equal to the target string, then you don't have to do anything.
Otherwise, you eventually read a line 'greater' than your current line according to your sort criteria, or you hit the end of the file. In the former case, you just insert the string at that position, and then copy the rest of the lines. In the latter case, you append the string to the end.
If you don't want to do it that way, you can do a binary search in #file_data to find the spot to add the line without having to examine all of the entries, then insert it into the array before outputting the array to the file.
Here's a simple version that reads from stdin (or filename(s) specified on command line) and appends 'string to append' to the output if it's not found in the input. Outuput is printed on stdout.
#! /usr/bin/perl
$found = 0;
$append='string to append';
while(<>) {
$found = 1 if (m/$append/o);
print
}
print "$append\n" unless ($found);;
Modifying it to edit a file in-place (with perl -i) and taking the append string from the command line would be quite simple.
A 'simple' one-liner to insert a line without using any module could be:
perl -ni -le '$insert="lemon"; $eq=($insert cmp $_); if ($eq == 0){$found++}elsif($eq==-1 && !$found){print$insert} print'
giver a list.txt whose context is:
ananas
apple
banana
pear
the output is:
ananas
apple
banana
lemon
pear
{
local ($^I, #ARGV) = ("", $file); # Enable in-place editing of $file
while (<>) {
# If we found the line exactly, bail out without printing it twice
last if $_ eq $insert;
# If we found the place where the line should be, insert it
if ($_ gt $insert) {
print $insert;
print;
last;
}
print;
}
# We've passed the insertion point, now output the rest of the file
print while <>;
}
Essentially the same answer as pavel's, except with a lot of readability added. Note that $insert should already contain a trailing newline.

How can I check if contents of one file exist in another in Perl?

Requirement:-
File1 has contents like -
ABCD00000001,\some\some1\ABCD00000001,Y,,5 (this indicates there are 5 file in total in unit)
File2 has contents as ABCD00000001
So what i need to do is check if ABCD00000001 from File2 exist in File1 -
if yes{
print the output to Output.txt till it finds another ',Y,,X'}
else{ No keep checking}
Anyone? Any help is greatly appreciated.
Hi Arkadiy Output should be :- any filename from file 2 -ABCD00000001 in file1 and from Y to Y .
for ex :- file 1 structure will be :-
ABCD00000001,\some\some1\ABCD00000001,Y,,5
ABCD00000001,\some\some1\ABCD00000002
ABCD00000001,\some\some1\ABCD00000003
ABCD00000001,\some\some1\ABCD00000004
ABCD00000001,\some\some1\ABCD00000005
ABCD00000001,\some\some1\ABCD00000006,Y,,2
so out put should contain all line between
ABCD00000001,\some\some1\ABCD00000001,Y,,5 and
ABCD00000001,\some\some1\ABCD00000006,Y,,2
#!/usr/bin/perl -w
use strict;
my $optFile = "C:\\Documents and Settings\\rgolwalkar\\Desktop\\perl_scripts\\SampleOPT1.opt";
my $tifFile = "C:\\Documents and Settings\\rgolwalkar\\Desktop\\perl_scripts\\tif_to_stitch.txt";
print "Reading OPT file now\n";
open (OPT, $optFile);
my #opt_in_array = <OPT>;
close(OPT);
foreach(#opt_in_array){
print();
}
print "\nReading TIF file now\n";
open (TIF, $tifFile);
my #tif_in_array = <TIF>;
close(TIF);
foreach(#tif_in_array){
print();
}
so all it does it is reads 2 files "FYI -> I am new to programming"
Try breaking up your problem into discrete steps. It seems that you need to do this (although your question is not very clear):
open file1 for reading
open file2 for reading
read file1, line by line:
for each line in file1, check if there is particular content anywhere in file2
Which part are you having difficulty with? What code have you got so far? Once you have a line in memory, you can compare it to another string using a regular expression, or perhaps a simpler form of comparison.
OK, I'll bite (partially)...
First general comments. Use strict and -w are good, but you are not checking for the results of open or explicitly stating your desired read/write mode.
The contents of your OPT file kinda sorta looks like it is CSV and the second field looks like a Windows path, true? If so, use the appropriate library from CPAN to parse CSV and verify your file names. Misery and pain can be the result otherwise...
As Ether stated earlier, you need to read the file OPT then match the field you want. If the first file is CSV, first you need to parse it without destroying your file names.
Here is a small snippet that will parse your OPT file. At this point, all it does is print the fields, but you can add logic to match to the other file easily. Just read (slurp) the entire second file into a single string and match with your chosen field from the first:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my #opt_fields;
while (<DATA>) {
if ($csv->parse($_)) {
push #opt_fields, [ $csv->fields() ];
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
foreach my $ref (#opt_fields) {
# foreach my $field (#$ref) { print "$field\n"; }
print "The anon array: #$ref\n";
print "Use to match?: $ref->[0]\n";
print "File name?: $ref->[1]\n";
}
__DATA__
ABCD00000001,\some\some1\ABCD00000001,Y,,5
ABCD00000001,\some\some1\ABCD00000002
ABCD00000001,\some\some1\ABCD00000003
ABCD00000001,\some\some1\ABCD00000004
ABCD00000001,\some\some1\ABCD00000005
ABCD00000001,\some\some1\ABCD00000006,Y,,2