comparing hash element to table element perl - perl

I have a program that compares each line of two files, each line contains one word, if simply read the two files and stock the data into table, and compare the element of the two tables,
the first file contain:
straight
work
week
belief time
saturday
wagon
australia
sunday
french
...
and the second file contain
firepower
malaise
bryson
wagon
dalglish
french
...
this will take a long time to compare file, so I propose another solution, but this doesn't work
#!/usr/bin/perl
use strict;
use warnings;
open( FIC, $ARGV[0] );
open( FICC, $ARGV[1] );
print "choose the name of the file\n";
chomp( my $fic2 = <STDIN> );
open( FIC2, ">$fic2" );
my $i=0;
my $j=0;
my #b=();
my %stops;
while (<FIC>) #read each line into $_
{
# Remove newline from $_
chomp;
$_ =~ s/\s+$//;
$stops{$_} = $i; # add the line to
$i++;
}
close FIC;
while (<FICC>) {
my $ligne = $_;
$ligne =~ s/\s+$//;
$b[$i] = lc($ligne);
# $b contain the data
$i++;
}
foreach my $che (#b) {
chomp($che);
print FIC2 $che;
print FIC2 " ";
print FIC2 $stops{"$che"}; print FIC2 "\n";
#this returns nothing
}
The problem is inthis command $stop{"$che"}; in the case that the elment don't exist in the hash %stop, it return an integer and an error
Use of initalized value in print c:/ats2/hash.pl line 44, line 185B2

Does this what you want?
join <(sort file1) <(sort file2) >result
Works in bash.

Related

Attach frequent words in text file

I have two files. The first contains frequent
word sequences
extracted from a text file
a.txt :
big pizza
eat big pizza
...
the text file is
b.txt :
i eat big pizza .my big pizza ...
My problem is to add bbb between words from each sequence that exist in file a.txt and write a new file .
so the result will be
i eatbbbbigbbbpizza.my bigbbbpizza...
below is my script. It adds bbb only between pairs 9f words. How can I correct this?
use strict;
use warnings;
use autodie;
my ($f1, $f2) = ('a.txt', 'b.txt');
open( my $fh, $f1 );
my #seq;
foreach ( <$fh> ) {
chomp;
s/^\s+|\s+$//g;
push #seq, $_;
}
close $fh;
open($fh, $f2);
foreach (<$fh> ) {
foreach my $r (#seq) {
my $t = $r =~ s/ /bbb/r;
if (/$r/) {
s/$r/$t/g;
}
}
print ;
}
close $fh;
All that is wrong is your line
my $t = $r =~ s/ /bbb/r;
This substitution runs just once, and so replaces only the first space with bbb
You need to use a global substitution instead. And while we're changing this line it's best to also replace the space with \h+, which matches any amount of "horizontal space", including both tabs and spaces
my $t = $r =~ s/\h+/bbb/gr;
As it stands your code will find and replacing substrings of other entries in #seq if they appear earlier in the array. In this case, that means big pizza will be found first and converted to bigbbbpizza, after which
eat big pizza can no longer be found. You need to first sort your array in descending order of length so that longer phrases are found before shorter ones
#seq = sort { length($b) <=> length($a) } #seq;
Then your program will work a little better
Here is the modified code.
use strict;
use warnings;
use autodie;
my ($f1, $f2) = ('a.txt', 'b.txt');
open(my $fh, $f1);
my #seq;
foreach (<$fh> )
{
chomp;
s/^\s+|\s+$//g;
push #seq, $_;
}
close $fh;
#seq = sort bylen #seq; # need to sort #seq by length.
open($fh, $f2);
foreach (<$fh> ) {
foreach my $r (#seq) {
my $t = $r =~ s/ /bbb/gr;
s/$r/$t/g; # you may need to take care of cases of extra spaces
}
print;
}
close $fh;
exit 0;
sub bylen {
length($b) <=> length($a);
}

Comparing 2 strings, one from file and one declared

This program doesn't print that the strings are equal but when they get printed, they appear to be the same...someone please explain
#!/usr/bin/perl
$str = "print \"I want this to work\\n\";";
print $str."\n";
open FILE, "<", "check2.doc" or die "buhuhuhu";
my $str2;
while (<FILE>) {
$str2 = $_;
}
close FILE;
print "$str2\n";
if ( $str eq $str2) {
print "they are equal\n";
But when the output comes there is this extra line at the bottom due to the second string $str2
print "I want this to work\n";
print "I want this to work\n";
-----empty line-----
Here is the file check2.doc
print "I want this to work\n";
Does anyone know why they are not equal???
The file read includes the \n, so you have to remove it:
$str2 = $_;
chomp $str2;
And, if your file has only one line, replace the while loop by:
$str2 = <FILE>;
chomp $str2;
The line in the file is created by
$str."\n"
Of course that's not equal to
$str
You need to remove the trailing newline.
my $str2 = <FILE>;
chomp($str2);

Parsing a CSV file and Hashing

I am trying to parse a CSV file to read in all the other zip codes. I am trying to create a hash where each key is a zip code and the value is the number it appears in the file. Then I want to print out the contents as Zip Code - Number. Here is the Perl script I have so far.
use strict;
use warnings;
my %hash = qw (
zipcode count
);
my $file = $ARGV[0] or die "Need CSV file on command line \n";
open(my $data, '<', $file) or die "Could not open '$file $!\n";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if (exists($hash{$fields[2]})) {
$hash{$fields[1]}++;
}else {
$hash{$fields[1]} = 1;
}
}
my $key;
my $value;
while (($key, $value) = each(%hash)) {
print "$key - $value\n";
}
exit;
You don't say which column your zip code is in, but you are using the third field to check for an existing hash element, and then the second field to increment it.
There is no need to check whether a hash element already exists: Perl will happily create a non-existent hash element and increment it to 1 the first time you access it.
There is also no need to explicitly open any files passed as command line parameters: Perl will open them and read them if you use the <> operator without a file handle.
This reworking of your own program may work. It assumes the zip code is in the second column of the CSV. If it is anywhere else just change ++$hash{$fields[1]} appropriately.
use strict;
use warnings;
#ARGV or die "Need CSV file on command line \n";
my %counts;
while (my $line = <>) {
chomp $line;
my #fields = split /,/, $line;
++$counts{$fields[1]};
}
while (my ($key, $value) = each %counts) {
print "$key - $value\n";
}
Sorry if this is off-topic, but if you're on a system with the standard Unix text processing tools, you could use this command to count the number of occurrences of each value in field #2, and not need to write any code.
cut -d, -f2 filename.csv | sort | uniq -c
which will generate something like this output, where the count is listed first, and the zipcode second:
12 12345
2 56789
34 78912
1 90210

File manipulation in Perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.
To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;
If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.
See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt
#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;
#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}
I use 'for' for "C" style loops, and 'foreach' for iterating over lists.
#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}