How to extract the last element of a string and use it to grow an array inside a loop - perl

I have a dataset like this:
10001;02/07/98;TRIO;PI;M^12/12/59^F^^SP^09/12/55
;;;;;M1|F1|SP1;11;10;12;10;12;11;1.82;D16S539
;;;;;M1|F1|SP1;8;8;8;8;10;8;3.45;D7S820
;;;;;M1|F1|SP1;14;12;12;11;14;11;1.57;D13S317
;;;;;M1|F1|SP1;12;12;13;12;13;8;3.27;D5S818
;;;;;M1|F1|SP1;12;12;12;12;12;8;1.51;CSF1PO
;;;;;M1|F1|SP1;8;11;11;11;11;8;1.79;TPOX
;;;;;M1|F1|SP1;6;9;9;6;8;6;1.31;TH01
I'm trying to extract the last element of the lines which does not start with a number, i.e. all lines except the first one. I want to put these values inside an array called #markers.
I'm trying that by the following code:
#!usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = (split /;/), $line;
if ($line !~ m/^[0-9]+/) {
my #markers = splice #fields, 0, #fields - 1;
}
}
But that does not work. Can anyone help please?
Thanks

You create a new variable named #markers every pass of the loop.
my #fields = (split /;/), $line; means (my #fields = (split /;/, $_)), $line;. You meant my #fields = (split /;/, $line);
'test' || die $! is the same as just 'test'.
use strict;
use warnings;
open my $FILE, '<', 'test'
or die $!;
my #markers;
while (<$FILE>) {
chomp;
next if /^\s*\z/; # Skip blank lines.
my #fields = split /;/;
push #markers, $fields[-1]
if $fields[0] eq '';
}

You aren't using function split() correctly. I have fixed it in the code below and printed the values:
#!/usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = split( /;/, $line);
if ($line !~ m/^[0-9]+/) {
print "$fields[-1]";
# my #markers = splice #fields, 0, #fields - 1;
}
}

Related

I need the output in following way in perl

# File-
# a,b,c,d,e,f
# 1,2,3,4,3,2
# 9,8,7,6,5,0
# 2,3,4,6,7,8
# i need output like this:-
# a=1,d=4,c=3,a=9,d=6,c=7,a=2,d=6,c=4
# but my program gives this:-
# a=1,d=4,c=3a=9,d=6,c=7a=2,d=6,c=4 (there is no , after c and a)
my script is :-
open ($fh, 'parse.txt');
my #arr;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
print "a=$a,d=$b,c=$c";
}
close (IN);
my $prefix = '';
while (<$fh>) {
chomp;
my #fields = split /,/;
print $prefix."a=$fields[0],d=$fields[3],c=$fields[2]";
$prefix = ',';
}
print("\n");
or
my #recs;
while (<$fh>) {
chomp;
my #fields = split /,/;
push #recs, "a=$fields[0],d=$fields[3],c=$fields[2]";
}
print(join(',', #recs), "\n");
Instead of printing out the values you could append them to a string and include a comma after the "c" value. Then at the end of the loop, erase the final comma from the string and print it out. There are some scalability problems if your input file is too large. But if it's a reasonable size there shouldn't be any substantial issue.
my $output;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
$output .= "a=$a,d=$b,c=$c,";
}
chop $output;
print $output;
If you have fields with separators split the line and collect needed elements
use warnings;
use strict;
use feature 'say';
my $file = 'parse.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
my $dummy = <$fh>;
my #res;
while (<$fh>)
{
my ($a, $d, $c) = (split /,/)[0,3,2];
push #res, "a=$a,d=$d,c=$c";
}
say join ',', #res;
or pick the order in the assignment
my ($a, $c, $d) = (split /,/)[0,2,3];

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

Split list of delimited lines to hash

The following produces what i want.
#!/usr/bin/env perl
use 5.020;
use warnings;
use Data::Dumper;
sub command {
<DATA>
#in the reality instead of the DATA I have
#qx(some weird shell command what produces output like in the DATA);
}
my #lines = grep { !/^\s*$/ } command();
chomp #lines;
my $data;
#how to write the following nicer - more compact, elegant, etc.. ;)
for my $line (#lines) {
my #arr = split /:/, $line;
$data->{$arr[0]}->{text} = $arr[1];
$data->{$arr[0]}->{par} = $arr[2];
$data->{$arr[0]}->{val} = $arr[3];
}
say Dumper $data;
__DATA__
line1:some text1:par1:val1
line2:some text2:par2:val2
line3:some text3:par3:val3
Wondering how to write the loop in more perlish form. ;)
You can assign to a hash slice:
for my $line (#lines) {
my ($id, #arr) = split /:/, $line;
#{ $data->{$id} }{qw{ text par val }} = #arr;
}
Also, use the following instead of qx, so you don't need to store all the lines in an array:
open my $PIPE, '-|', 'command' or die $!;
while (<$PIPE>) {
# ...
}

how to write my results to external file in perl

I am trying to read some particular columns from myu data into my output file, i succeed in this reading one cloumn at a time but i want to read some more columns of my interest at a time (i have list of column i want to extract in a separate tex file) because extract individual column and joining them to make one separate file will become hectic to me, here is the code i tried to extract single coulmn,
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<file.txt") or die ("Unable to open file");
my $search_string = "IADC512444";
my $header = <DATA>;
my #header_titles = split /\t/, $header;
my $extract_col = 0;
for my $header_line (#header_titles) {
last if $header_line =~ m/$search_string/;
$extract_col++;
}
print "Extracting column $extract_col\n";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
print "$cells[$extract_col] ";
}
is there any possibility to extract all columns at a time instead of only IADC512444 i want from my textfile into outfile on to my harddisc? please help me in solving this problem,
Thanks
If you need to print the contents to a file on disk then you should open a file in write mode and write to it. Also if you want more columns you can do that by accessing corresponding element in the array cells. In this example i am printing the column you are printing plus column 1 and 2
open(OUT_FILE,">path_to_out_file") || die "cant open file...";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
#print "$cells[$extract_col] ";
print OUT_FILE "$cells[$extract_col],$cells[1],$cells[2]\n";
}
close(OUT_FILE)
I have tweaked the code little bit to suit your requirement.
In the variable req_hdr_string you should say the column names which you require separated by ,
So it will be splitted and stored in a hash.
Then from the header i get the position of the column and print only those
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<h11.txt") or die ("Unable to open file");
my $req_hdr_string = "abc,ghi,mno,";
my %req_hdrs = ();
my %extract_col = ();
foreach(split /,/, $req_hdr_string)
{
print "req hdr is:$_\n";
$req_hdrs{$_} = $_;
}
my $index = 0;
my $header = <DATA>;
chomp $header;
foreach (split /\t/, $header)
{
print "input is:|$_|\n";
if(exists $req_hdrs{$_})
{
print "\treq index is:$index\n";
$extract_col{$index} = 1;
}
$index++;
}
open(OUT_FILE,">out_file") || die "cant open file...";
while ( my $row = <DATA> )
{
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
foreach $index (sort keys%extract_col)
{
print OUT_FILE "$cells[$index],";
}
print OUT_FILE "\n";
}
close(OUT_FILE);
close(DATA);

How can I select records from a CSV file based on timestamps in Perl?

I have CSV file which has timestamp in the first column as shown below sample:
Time,head1,head2,...head3
00:00:00,24,22,...,n
00:00:01,34,55,...,n
and so on...
I want to filter out the data with specific time range like from 11:00:00 to 16:00:00 with the header and put into an array. I have written the below code to get the header in an array.
#!/usr/bin/perl -w
use strict;
my $start = $ARGV[0];
my $end = $ARGV[1];
my $line;
$line =<STDIN>;
my $header = [ split /[,\n]/, $line ];
I need help on how to filter data from file with selected time range and create an array of that.
I kind of cheated. A proper program would probably use DateTime and then compare with DateTime's compare function. If you're only expecting input in this format, my "cheat" should work.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use DateTime;
my $start = 110000;
my $end = 160000;
my $csv = Text::CSV->new () or die "Cannot use CSV: ".Text::CSV->error_diag ();
my #lines;
open my $fh, "<:encoding(utf8)", "file.csv" or die "Error opening file: $!";
while ( my $row = $csv->getline( $fh ) ) {
my $time = $row->[0];
$time =~ s/\://g;
if ($time >= $start and $time <= $end) {
push #lines, $row;
}
}
$csv->eof or $csv->error_diag();
close $fh;
#do something here with #lines
just a start
my $start="01:00:01";
my $end = "11:00:01";
while(<>){
chomp;
if ( /$start/../$end/ ){
#s = split /,/ ;
# do what you want with #s here.
}
}
#!/usr/bin/perl -w
use strict;
my $start = '11:00:00';
my $end = '16:00:00';
my #data;
chomp ($_ = <STDIN>); # remove trailing newline character
push #data, [split /,/]; # add header
while(<>) {
chomp;
my #line = split /,/;
next if $line[0] lt $start or $line[0] gt $end;
push #data, [#line]; # $data[i][j] contains j-th element of i-th line.
}