I'm trying to go through a bunch of text files twice to look for two different values. However, the seek $fh, 0, 0 doesn't seem to work. Why?
Please help
My codes:
use strict;
use warnings;
...
read_in_data_employer();
read_in_data_union();
process_files ($FileFolder);
close $FileHandle;
...
sub process_files
{
opendir (DIR, $FileFolder)
or die "Unable to open $FileFolder: $!";
my #files = grep { /.pdf.txt/ } readdir (DIR);
closedir (DIR);
#files = map { $FileFolder . '/' . $_ } #files;
foreach my $file (#files)
{
open (my $txtfile, $file) or die "error opening $file\n";
print "$file";
LookForEmployer:
{
print $FileHandle "\t";
while (my $line=<$txtfile>)
{
foreach (#InputData_Employers)
{
if ($line =~ /\Q$_/i)
{
print $FileHandle "$_";
last LookForEmployer;
}
}
}
}
seek ($txtfile, 0, 0);
LookForUnion:
{
print $FileHandle "\t";
while (my $line=<$txtfile>)
{
print "$.\n";
foreach (#InputData_Unions)
{
if ($line =~ /\Q$_/i)
{
print $FileHandle "$_";
last LookForUnion;
}
}
}
}
close $txtfile
}
}
Output:
>perl "test.pl" test "employers.txt" "unions.txt" output.txt
test/611-2643-03 (801-0741).pdf.txt12
13
14
15
16
17
18
19
20
21
22
test/611-2643-05 (801-0741).pdf.txt
7
8
9
10
11
12
test/611-2732-21 (805-0083).pdf.txt
2
3
4
5
6
7
8
test/611-2799-17 (801-0152).pdf.txt
6
7
8
9
10
11
12
13
14
Thanks
Files don't have line numbers. They don't even have lines. Files just have bytes. That means you can't just ask the system "What line of the file is at this position?"
But, since you're seeking to the start of the file, all you need is to reset $..
use Fcntl qw( SEEK_SET );
seek($txtfile, 0, SEEK_SET)
or die("seek: $!\n");
$. = 0;
By the way, you program is insanely inefficient. Load the data into hashes or into a database!
I have a file input as below;
#
volume stats
start_time 1
length 2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24
volume stats
start_time 2
length 2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54
volume stats
start_time 3
length 2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84
#
I need output in below format;
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
#
Below is my code;
#!/usr/bin/perl
use strict;
use warnings;
#use File::Find;
# Define file names and its location
my $input = $ARGV[0];
# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{
if ($line =~ m/start/)
{
my #date_fields = split(/ /,$line);
$date_time = $date_fields[1];
}
if ($line =~ m/length/i)
{
my #length_fields = split(/ /,$line);
$length = $length_fields[1];
}
if ($line =~ m/0[xX][0-9a-fA-F]+/)
{
my #volume_fields = split(/,/,$line);
$col_1 += $volume_fields[1];
$col_2 += $volume_fields[2];
$col_3 += $volume_fields[3];
$col_4 += $volume_fields[4];
#print "$col_1\n";
}
if ($line =~ /^$/)
{
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
$col_1=0;$col_2=0;$col_3=0;$col_4=0;
}
}
close (INFILE);
#
my code result is;
1
33 36 39 42
2
123 126 129 132
#
BAsically, for each time interval, it just sums up the columns for all the lines and displays all the columns against each time interval.
$/ is your friend here. Try setting it to '' to enable paragraph mode (separating your data by blank lines).
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = '';
while ( <> ) {
my ( $start ) = m/start_time\s+(\d+)/;
my ( $length ) = m/length\s+(\d+)/;
my #row_sum;
for ( m/(0x.*)/g ) {
my ( $key, #values ) = split /,/;
for my $index ( 0..$#values ) {
$row_sum[$index] += $values[$index];
}
}
print join ( "\t", $start, #row_sum ), "\n";
}
Output:
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
NB - using tab stops for output. Can use sprintf if you need more flexible options.
I would also suggest that instead of:
my $input = $ARGV[0];
open (my $input_fh, '<', $input) or die "Could not open $input: $!";
You would be better off with:
while ( <> ) {
Because <> is the magic filehandle in perl, that - opens files specified on command line, and reads them one at a time, and if there isn't one, reads STDIN. This is just like how grep/sed/awk do it.
So you can still run this with scriptname.pl sample.txt or you can do curl http://somewebserver/sample.txt | scriptname.pl or scriptname.pl sample.txt anothersample.txt moresample.txt
Also - if you want to open the file yourself, you're better off using lexical vars and 3 arg open:
open ( my $input_fh, '<', $ARGV[0] ) or die $!;
And you really shouldn't ever be using 'numbered' variables like $col_1 etc. If there's numbers, then an array is almost always better.
Basically, a block begins with start_time and ends with a line of of whitespace. If instead end of block is always assured to be an empty line, you can change the test below.
It helps to use arrays instead of variables with integer suffixes.
When you hit the start of a new block, record the start_time value. When you hit a stat line, update column sums, and when you hit a line of whitespace, print the column sums, and clear them.
This way, you keep your program's memory footprint proportional to the longest line of input as apposed to the largest block of input. In this case, there isn't a huge difference, but, in real life, there can be. Your original program was reading the entire file into memory as a list of lines which would really cause your program's memory footprint to balloon when used with large input sizes.
#!/usr/bin/env perl
use strict;
use warnings;
my $start_time;
my #cols;
while (my $line = <DATA>) {
if ( $line =~ /^start_time \s+ ([0-9]+)/x) {
$start_time = $1;
}
elsif ( $line =~ /^0x/ ) {
my ($id, #vals) = split /,/, $line;
for my $i (0 .. $#vals) {
$cols[ $i ] += $vals[ $i ];
}
}
elsif ( !($line =~ /\S/) ) {
# guard against the possibility of
# multiple blank/whitespace lines between records
if ( #cols ) {
print join("\t", $start_time, #cols), "\n";
#cols = ();
}
}
}
# in case there is no blank/whitespace line after last record
if ( #cols ) {
print join("\t", $start_time, #cols), "\n";
}
__DATA__
volume stats
start_time 1
length 2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24
volume stats
start_time 2
length 2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54
volume stats
start_time 3
length 2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84
Output:
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
When I run your code, I get warnings:
Use of uninitialized value $date_time in concatenation (.) or string
I fixed it by using \s+ instead of / /.
I also added a print after your loop in case the file does not end with a blank line.
Here is minimally-changed code to produce your desired output:
use strict;
use warnings;
# Define file names and its location
my $input = $ARGV[0];
# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{
if ($line =~ m/start/)
{
my #date_fields = split(/\s+/,$line);
$date_time = $date_fields[1];
}
if ($line =~ m/length/i)
{
my #length_fields = split(/\s+/,$line);
$length = $length_fields[1];
}
if ($line =~ m/0[xX][0-9a-fA-F]+/)
{
my #volume_fields = split(/,/,$line);
$col_1 += $volume_fields[1];
$col_2 += $volume_fields[2];
$col_3 += $volume_fields[3];
$col_4 += $volume_fields[4];
}
if ($line =~ /^$/)
{
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
$col_1=0;$col_2=0;$col_3=0;$col_4=0;
}
}
print "$date_time $col_1 $col_2 $col_3 $col_4\n";
close (INFILE);
__END__
1 33 36 39 42
2 123 126 129 132
3 213 216 219 222
in blah.txt:
/a/b/c-test
in blah.pl
1 my #dirs;
2 $ws = '/a/b/c-test/blah/blah'; <--- trying to match this
3 sub blah{
4 my $err;
5 open(my $fh, "<", "blah.txt") or $err = "catn do it\n";
6 if ($err) {
7 print $err;
8 return;
9 } else {
10 while(<$fh>){
11 chomp;
12 push #dirs, $_;
13 }
14 }
15 close $fh;
16 print "successful\n";
17 }
18
19
20 blah();
21
22 foreach (#dirs) {
23 print "$_\n"; #/a/b/c-test
24 if ($_ =~ /$ws/ ) { <--- didnt match it
25 print "GOT IT!\n";
26 } else {
27 print "didnt get it\n";
28 }
29 }
~
perl blah.pl
successful
/a/b/c-test
didnt get it
I am not quite sure why it is not matching.
Anyone know?
Consider,
if ($ws =~ /$_/ ) {
instead of,
if ($_ =~ /$ws/ ) {
as /a/b/c-test/blah/blah contains /a/b/c-test string, not otherwise.
As a side notes:
use at least strict and warnings
read and process file in while() loop instead of filling array first
if you must fill array, use my #dirs = <$fh>; chomp(#dirs);
The following table represents a tab-delimited file that I have.
1 2 3
A Jack 01
A Mary 02
A Jack 03
B Mary 04
B Mike 05
B Mary 06
C Mike 07
C Mike 08
C Jack 09
I would like to parse this text file and create multiple text files based on columns 1 & 2. Each text file would contain data (column 3) where column 1&2 are the same. So in this example, the data would be organized as follows:
> file1.txt
A Jack 01
A Jack 03
> file2.txt
A Mary 02
> file3.txt
B Mary 04
B Mary 06
> file4.txt
B Mike 05
> file5.txt
C Mike 07
C Mike 08
> file6.txt
C Jack 09
#What would be the best way to tackle this? The only method I can think of is to create a 2-dimensional array and then comparing every row/col pair.
edit: The following code seems to work. Is there a better alternative?
#!/usr/bin/perl
use strict;
use warnings;
my $file = "/home/user/Desktop/a.txt";
my #array =();
open(FILE, $file) || die "cannot open file";
while(my $line = <FILE>){
my #row = split("\t", $line);
my $two = $row[1]."\t".$row[2];
push(#array, $two);
}
close FILE;
#array = uniq(#array);
my $counter = 0;
foreach my $x (#array){
#unique tab
open(SEC, $file) || die "cannot open 2nd time\n";
while(my $line = <SEC>){
if($line =~ /($x)/){
my $output = "txt".$counter.".txt";
print "hit!\n";
open(OUT, '>>', $output) || die "cannot write out";
print OUT $line."\n";
close OUT;
}
}
$counter++;
}
close SEC;
sub uniq {
return keys %{{ map { $_ => 1 } #_ }};
}
I know how to sort it via command line (sort -t: -k1,2 a.txt) but I'm wondering how to do it within perl and writeout the multiple files.
Concatenate the two first fields and use them as hash keys. Each hash key points to an array where you add all relevant lines:
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, "/home/johan/Desktop/tabdel.txt"
or die $!;
<$fh>; # Skip header
my $data = {};
while (my $line = <$fh>) {
# match the fields
next unless $line =~ /^(\S+)\s+(\S+)\s+\S+/;
# append $line to the hash value, key is the concatenated two first fields:
push #{ $data->{"$1 $2"}->{'lines'} }, "$line";
}
my $file_count = 0;
foreach my $key (sort keys %{$data}) {
my $qfn = "file".(++$file_count).".txt";
open(my $fh, '>', $qfn)
or die $!;
foreach my $line (#{ $data->{$key}->{'lines'} }) {
print $fh $line;
}
}
Perhaps the following will be helpful:
use strict;
use warnings;
my %hash;
my $n = 1;
while (<>) {
next if $. == 1;
my #a = split;
if ( !exists $hash{ $a[0] }{ $a[1] } ) {
open $hash{ $a[0] }{ $a[1] }, '>', 'file' . $n++ . '.txt' or die $!;
}
print { $hash{ $a[0] }{ $a[1] } } $_;
}
Usage: perl script.pl inFile.txt
Acknowledging ikegami's point about possibly running out of file handles in case your dataset is large, here's a option that collects results in a hash and then prints the results to files (the usage is the same as the above script):
use strict;
use warnings;
my ( %hash, %seen );
my $n = 0;
while (<>) {
next if $. == 1;
my ( $key, $elem ) = /(.+\s+)(\d+)\Z/;
push #{ $hash{$key} }, $elem;
}
for my $key ( sort { $hash{$a}->[0] <=> $hash{$b}->[0] } keys %hash ) {
$n++ if !$seen{$key}++;
open my $fh, '>', 'file' . $n . '.txt' or die $!;
print $fh "$key$_\n" for #{ $hash{$key} };
}
I am trying to read a file and save the lines which starts with $path till it encounters first balnk line in an array. I have the below code, bt it only prints the path name and not the lines. Could some-one have a look.
Below are the contents of the $file:
\sbd\archieve\date\form
-rwxrwxrwx 1 etd maadm 4354270 Aug 16 21:56 COMAHCUT.dat.20120816.ftpd.201208162156*
-rw-r--r-- 1 etd maadm 0 Aug 16 21:56 COMAHCUT.DONE.20120816.ftpd.201208162156
\sbd\single\archieve\date\form
-rwxr-xr-x 1 etd maadm 1362780 Aug 15 22:02 COMAINS.dat.ftpd.201208152203*
-rwxr-xr-x 1 etd maadm 0 Aug 15 22:02 COMAINS.DONE.ftpd.201208152203*
Below is the code i tried:
#!/usr/bin/perl
my $file = "/home/pauler/practice/DataIt/line.txt";
open (INFO, $file) or die "Cannot open the file $file :$! \n";
my $path = "\sbd\archieve\date\form";
foreach $line (<INFO>) {
if ($line =~ m/$path/) {
push (#array1, $line);
last if ($line =~ m/^$/);
print #array1;
}
}
You can take advantage of the fact, that filehandles remember their position in the file.
use strict;
use warnings;
my #array;
my $path = '\sbd\archieve\date\form';
while ( my $line = <DATA> ) {
next unless $line =~ /\Q$path\E/;
push #array, $line;
while ( my $line = <DATA> ) {
last if $line =~ /^\s*$/;
push #array, $line;
}
}
print #array;
__DATA__
\sbd\archieve\date\form
-rwxrwxrwx 1 etd maadm 4354270 Aug 16 21:56 COMAHCUT.dat.20120816.ftpd.201208162156*
-rw-r--r-- 1 etd maadm 0 Aug 16 21:56 COMAHCUT.DONE.20120816.ftpd.201208162156
\sbd\single\archieve\date\form
-rwxr-xr-x 1 etd maadm 1362780 Aug 15 22:02 COMAINS.dat.ftpd.201208152203*
-rwxr-xr-x 1 etd maadm 0 Aug 15 22:02 COMAINS.DONE.ftpd.201208152203*
The flip-flop operator .. saves life ... our you code. It stays false until the expression on the left returns true, and remains true until the expression on the right turns true ... then it is false again until the left expressions evaluates to true again.
# read lines into $_ for cleaner code
while (<INFO>) {
if (/$path/ .. /^$/) {
push #array1, $_;
}
}
print #array1;
Oh, and a note on paths ... I know no single Operating System that really needs backslashes, not even Windows … Using normal slashes / will save you from weird escape sequences and other magic that lurks in the dark