How can I select records from a CSV file based on timestamps in Perl? - perl

I have CSV file which has timestamp in the first column as shown below sample:
Time,head1,head2,...head3
00:00:00,24,22,...,n
00:00:01,34,55,...,n
and so on...
I want to filter out the data with specific time range like from 11:00:00 to 16:00:00 with the header and put into an array. I have written the below code to get the header in an array.
#!/usr/bin/perl -w
use strict;
my $start = $ARGV[0];
my $end = $ARGV[1];
my $line;
$line =<STDIN>;
my $header = [ split /[,\n]/, $line ];
I need help on how to filter data from file with selected time range and create an array of that.

I kind of cheated. A proper program would probably use DateTime and then compare with DateTime's compare function. If you're only expecting input in this format, my "cheat" should work.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use DateTime;
my $start = 110000;
my $end = 160000;
my $csv = Text::CSV->new () or die "Cannot use CSV: ".Text::CSV->error_diag ();
my #lines;
open my $fh, "<:encoding(utf8)", "file.csv" or die "Error opening file: $!";
while ( my $row = $csv->getline( $fh ) ) {
my $time = $row->[0];
$time =~ s/\://g;
if ($time >= $start and $time <= $end) {
push #lines, $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
#do something here with #lines

just a start
my $start="01:00:01";
my $end = "11:00:01";
while(<>){
chomp;
if ( /$start/../$end/ ){
#s = split /,/ ;
# do what you want with #s here.
}
}

#!/usr/bin/perl -w
use strict;
my $start = '11:00:00';
my $end = '16:00:00';
my #data;
chomp ($_ = <STDIN>); # remove trailing newline character
push #data, [split /,/]; # add header
while(<>) {
chomp;
my #line = split /,/;
next if $line[0] lt $start or $line[0] gt $end;
push #data, [#line]; # $data[i][j] contains j-th element of i-th line.
}

Related

Make the same edit for edit for each column in a multi-column file

I have multiple CSV files with varying numbers of columns that I need to reformat into a fixed-format text file.
At this stage, I hash and unhash the columns that need to be edited, but its tedious and I can't add new columns without changing the program first.
Is there a simpler way of reading, splitting and editing all columns, regardless of the number of columns in the file?
Here is my code thus far:
use strict;
use warnings;
my $input = 'FILENAME.csv';
my $output = 'FILENAME.txt';
open (INPUT, "<", "$input_file") or die "\n !! Cannot open $input_file: $!";
open (OUTPUT, ">>", "$output_file") or die "\n !! Cannot create $output_file: $!";
while ( <INPUT> ) {
my $line = $_;
$line =~ s/\s*$//g;
my ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j ) = split('\,', $line);
$a = sprintf '%10s', $a;
$b = sprintf '%10s', $b;
$c = sprintf '%10s', $c;
$d = sprintf '%10s', $d;
$e = sprintf '%10s', $e;
$f = sprintf '%10s', $f;
$g = sprintf '%10s', $g;
$h = sprintf '%10s', $h;
$i = sprintf '%10s', $i;
$j = sprintf '%10s', $j;
print OUTPUT "$a$b$c$d$e$f$g$h$i$j\n";
}
close INPUT;
close OUTPUT;
exit;
Do you mean something like this?
perl -aF/,/ -lne 'print map sprintf("%10s", $_), #F' FILENAME.csv > FILENAME.txt
Any time you're using sequential variables, you should be using an array. And in this case, since you only use the array once, you don't even need to do more than hold it temporarily.
Also: Use lexical filehandles, it's better practice.
#!/usr/bin/env perl
use strict;
use warnings;
my $input_file = 'FILENAME.csv';
my $output_file = 'FILENAME.txt';
my $format = '%10s';
open( my $input_fh, "<", $input_file ) or die "\n !! Cannot open $input_file: $!";
open( my $output_fh, ">>", $output_file ) or die "\n !! Cannot create $output_file: $!";
while (<$input_fh>) {
print {$output_fh} join "", map { sprintf $format, $_ } split /,/;
}
close $input_fh;
close $output_fh;
exit;

I need the output in following way in perl

# File-
# a,b,c,d,e,f
# 1,2,3,4,3,2
# 9,8,7,6,5,0
# 2,3,4,6,7,8
# i need output like this:-
# a=1,d=4,c=3,a=9,d=6,c=7,a=2,d=6,c=4
# but my program gives this:-
# a=1,d=4,c=3a=9,d=6,c=7a=2,d=6,c=4 (there is no , after c and a)
my script is :-
open ($fh, 'parse.txt');
my #arr;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
print "a=$a,d=$b,c=$c";
}
close (IN);
my $prefix = '';
while (<$fh>) {
chomp;
my #fields = split /,/;
print $prefix."a=$fields[0],d=$fields[3],c=$fields[2]";
$prefix = ',';
}
print("\n");
or
my #recs;
while (<$fh>) {
chomp;
my #fields = split /,/;
push #recs, "a=$fields[0],d=$fields[3],c=$fields[2]";
}
print(join(',', #recs), "\n");
Instead of printing out the values you could append them to a string and include a comma after the "c" value. Then at the end of the loop, erase the final comma from the string and print it out. There are some scalability problems if your input file is too large. But if it's a reasonable size there shouldn't be any substantial issue.
my $output;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
$output .= "a=$a,d=$b,c=$c,";
}
chop $output;
print $output;
If you have fields with separators split the line and collect needed elements
use warnings;
use strict;
use feature 'say';
my $file = 'parse.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
my $dummy = <$fh>;
my #res;
while (<$fh>)
{
my ($a, $d, $c) = (split /,/)[0,3,2];
push #res, "a=$a,d=$d,c=$c";
}
say join ',', #res;
or pick the order in the assignment
my ($a, $c, $d) = (split /,/)[0,2,3];

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

Dynamic array of hashes in Perl

I have a CSV file like this:
name,email,salary
a,b#b.com,1000
d,e#e.com,2000
Now, I need to transform this to an array of hash-maps in Perl, so when I do something like:
table[1]{"email"}
it returns e#e.com.
The code I wrote is :
open(DATA, "<$file") or die "Cannot open the file\n";
my #table;
#fetch header line
$line = <DATA>;
my #header = split(',',$line);
#fetch data tuples
while($line = <DATA>)
{
my %map;
my #row = split(',',$line);
for($index = 0; $index <= $#header; $index++)
{
$map{"$header[$index]"} = $row[$index];
}
push(#table, %map);
}
close(DATA);
But I am not getting desired results.. Can u help?? Thanks in advance...
This line
push(#table, %map)
should be
push(#table, \%map)
You want table to be a list of hash references; your code adds each key and value in %map to the list as a separate element.
There is no need to reinvent the wheel here. You can do this with the Text::CSV module.
#!/usr/bin/perl
use strict;
use warnings;
use v5.16;
use Text::CSV;
my $csv = Text::CSV->new;
open my $fh, "<:encoding(utf8)", "data.csv" or die "data.csv: $!";
$csv->column_names( $csv->getline ($fh) );
while (my $row = $csv->getline_hr ($fh)) {
say $row->{email};
}
Something like this perhaps:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #table;
chomp(my $header = <DATA>);
my #cols = split /,/, $header; # Should really use a real CSV parser here
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split /,/;
push #table, \%rec;
}
say $table[1]{email};
__END__
name,email,salary
a,b#b.com,1000
d,e#e.com,2000

How to extract the last element of a string and use it to grow an array inside a loop

I have a dataset like this:
10001;02/07/98;TRIO;PI;M^12/12/59^F^^SP^09/12/55
;;;;;M1|F1|SP1;11;10;12;10;12;11;1.82;D16S539
;;;;;M1|F1|SP1;8;8;8;8;10;8;3.45;D7S820
;;;;;M1|F1|SP1;14;12;12;11;14;11;1.57;D13S317
;;;;;M1|F1|SP1;12;12;13;12;13;8;3.27;D5S818
;;;;;M1|F1|SP1;12;12;12;12;12;8;1.51;CSF1PO
;;;;;M1|F1|SP1;8;11;11;11;11;8;1.79;TPOX
;;;;;M1|F1|SP1;6;9;9;6;8;6;1.31;TH01
I'm trying to extract the last element of the lines which does not start with a number, i.e. all lines except the first one. I want to put these values inside an array called #markers.
I'm trying that by the following code:
#!usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = (split /;/), $line;
if ($line !~ m/^[0-9]+/) {
my #markers = splice #fields, 0, #fields - 1;
}
}
But that does not work. Can anyone help please?
Thanks
You create a new variable named #markers every pass of the loop.
my #fields = (split /;/), $line; means (my #fields = (split /;/, $_)), $line;. You meant my #fields = (split /;/, $line);
'test' || die $! is the same as just 'test'.
use strict;
use warnings;
open my $FILE, '<', 'test'
or die $!;
my #markers;
while (<$FILE>) {
chomp;
next if /^\s*\z/; # Skip blank lines.
my #fields = split /;/;
push #markers, $fields[-1]
if $fields[0] eq '';
}
You aren't using function split() correctly. I have fixed it in the code below and printed the values:
#!/usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = split( /;/, $line);
if ($line !~ m/^[0-9]+/) {
print "$fields[-1]";
# my #markers = splice #fields, 0, #fields - 1;
}
}