how to write my results to external file in perl - perl

I am trying to read some particular columns from myu data into my output file, i succeed in this reading one cloumn at a time but i want to read some more columns of my interest at a time (i have list of column i want to extract in a separate tex file) because extract individual column and joining them to make one separate file will become hectic to me, here is the code i tried to extract single coulmn,
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<file.txt") or die ("Unable to open file");
my $search_string = "IADC512444";
my $header = <DATA>;
my #header_titles = split /\t/, $header;
my $extract_col = 0;
for my $header_line (#header_titles) {
last if $header_line =~ m/$search_string/;
$extract_col++;
}
print "Extracting column $extract_col\n";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
print "$cells[$extract_col] ";
}
is there any possibility to extract all columns at a time instead of only IADC512444 i want from my textfile into outfile on to my harddisc? please help me in solving this problem,
Thanks

If you need to print the contents to a file on disk then you should open a file in write mode and write to it. Also if you want more columns you can do that by accessing corresponding element in the array cells. In this example i am printing the column you are printing plus column 1 and 2
open(OUT_FILE,">path_to_out_file") || die "cant open file...";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
#print "$cells[$extract_col] ";
print OUT_FILE "$cells[$extract_col],$cells[1],$cells[2]\n";
}
close(OUT_FILE)
I have tweaked the code little bit to suit your requirement.
In the variable req_hdr_string you should say the column names which you require separated by ,
So it will be splitted and stored in a hash.
Then from the header i get the position of the column and print only those
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<h11.txt") or die ("Unable to open file");
my $req_hdr_string = "abc,ghi,mno,";
my %req_hdrs = ();
my %extract_col = ();
foreach(split /,/, $req_hdr_string)
{
print "req hdr is:$_\n";
$req_hdrs{$_} = $_;
}
my $index = 0;
my $header = <DATA>;
chomp $header;
foreach (split /\t/, $header)
{
print "input is:|$_|\n";
if(exists $req_hdrs{$_})
{
print "\treq index is:$index\n";
$extract_col{$index} = 1;
}
$index++;
}
open(OUT_FILE,">out_file") || die "cant open file...";
while ( my $row = <DATA> )
{
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
foreach $index (sort keys%extract_col)
{
print OUT_FILE "$cells[$index],";
}
print OUT_FILE "\n";
}
close(OUT_FILE);
close(DATA);

Related

How to check whether one file's value contains in another text file? (perl script)

I would like to check one of the file's values contains on another file. if one of the value contains it will show there is existing bin for that specific, if no, it will show there is no existing bin limit. the problem is I am not sure how to check all values at once.
first DID1 text file value contain :
L84A:D:O:M:
L84C:B:E:D:
second DID text file value contain :
L84A:B:E:Q:X:F:i:M:Y:
L84C:B:E:Q:X:F:i:M:Y:
L83A:B:E:Q:X:F:i:M:Y:
if first 4words value are match, need to check all value for that line.
for example L84A in first text file & second text file value has M . it should print out there is an existing M bin
below is my code :
use strict;
use warnings;
my $filename = 'DID.txt';
my $filename1 = 'DID1.txt';
my $count = 0;
open( FILE2, "<$filename1" )
or die("Could not open log file. $!\n");
while (<FILE2>) {
my ($number) = $_;
chomp($number);
my #values1 = split( ':', $number );
open( FILE, "<$filename" )
or die("Could not open log file. $!\n");
while (<FILE>) {
my ($line) = $_;
chomp($line);
my #values = split( ':', $line );
foreach my $val (#values) {
if ( $val =~ /$values1[0]/ ) {
$count++;
if ( $values[$count] =~ /$values1[$count]/ ) {
print
"Yes ,There is an existing bin & DID\n #values1\n";
}
else {
print "No, There is an existing bin & DID\n";
}
}
}
}
}
I cannot check all value. please help to give any advice on it since this is my first time learning for perl language. Thanks a lot :)
Based on my understanding I write this code:
use strict;
use warnings;
#use ReadWrite;
use Array::Utils qw(:all);
use vars qw($my1file $myfile1cnt $my2file $myfile2cnt #output);
$my1file = "did1.txt"; $my2file = "did2.txt";
We are going to read both first and second files (DID1 and DID2).
readFileinString($my1file, \$myfile1cnt); readFileinString($my2file, \$myfile2cnt);
In first file, as per the OP's request the first four characters should be matched with second file and then if they matched we need to check rest of the characters in the first file with the second one.
while($myfile1cnt=~m/^((\w){4})\:([^\n]+)$/mig)
{
print "<LineStart>";
my $lineChk = $1; my $full_Line = $3; #print ": $full_Line\n";
my #First_values = split /\:/, $full_Line; #print join "\n", #First_values;
If the first four digit matched then,
if($myfile2cnt=~m/^$lineChk\:([^\n]+)$/m)
{
Storing the rest of the content in the same and to be split with colon and getting the characters to be matched with first file contents.
my $FullLine = $1; my #second_values = split /:/, $FullLine;
Then search each letter first and second content which matched line...
foreach my $sngletter(#First_values)
{
If the letters are matched with first and second file its going to be printed.
if( grep {$_ eq "$sngletter"} #second_values)
{
print "Matched: $sngletter\t";
}
}
}
else { print "Not Matched..."; }
This is just information that the line end.
print "<LineEnd>\n"
}
#------------------>Reading a file
sub readFileinString
#------------------>
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
Read search pattern and data into hash (first field is a key), then go through data and select only field included into pattern for this key.
use strict;
use warnings;
use feature 'say';
my $input1 = 'DID1.txt'; # look for key,pattern(array)
my $input2 = 'DID.txt'; # data - key,elements(array)
my $pattern;
my $data;
my %result;
$pattern = file2hash($input1); # read pattern into hash
$data = file2hash($input2); # read data into hash
while( my($k,$v) = each %{$data} ) { # walk through data
next unless defined $pattern->{$k}; # skip those which is not in pattern hash
my $find = join '|', #{ $pattern->{$k} }; # form search pattern for grep
my #found = grep {/$find/} #{ $v }; # extract only those of interest
$result{$k} = \#found; # store in result hash
}
while( my($k,$v) = each %result ) { # walk through result hash
say "$k has " . join ':', #{ $v }; # output final result
}
sub file2hash {
my $filename = shift;
my %hash;
my $fh;
open $fh, '<', $filename
or die "Couldn't open $filename";
while(<$fh>) {
chomp;
next if /^\s*$/; # skip empty lines
my($key,#data) = split ':';
$hash{$key} = \#data;
}
close $fh;
return \%hash;
}
Output
L84C has B:E
L84A has M

truncate all lines in a file while preserving whole words

I'm trying to shorten each line of a file to 96 characters while preserving whole words. If a line is under or equal to 96 chars, I want to do nothing with that line. If it over 96 chars, I want it cut it down to the closest amount less than 96 while preserving whole words. When I run this code, I get a blank file.
use Text::Autoformat;
use strict;
use warnings;
#open the file
my $filename = $ARGV[0]; # store the 1st argument into the variable
open my $file, '<', $filename;
open my $fileout, '>>', $filename.96;
my #file = <$file>; #each line of the file into an array
while (my $line = <$file>) {
chomp $line;
foreach (#file) {
#######
sub truncate($$) {
my ( $line, $max ) = #_;
# always do nothing if already short enough
( length( $line ) <= $max ) and return $line;
# forced to chop a word anyway
if ( $line =~ /\s/ ) {
return substr( $line, 0, $max );
}
# otherwise truncate on word boundary
$line =~ s/\S+$// and return $line;
die; # unreachable
}
#######
my $truncated = &truncate($line,96);
print $fileout "$truncated\n";
}
}
close($file);
close($fileout);
You have no output because you have no input.
1. my #file = <$file>; #each line of the file into an array
2. while (my $line = <$file>) { ...
The <$file> operation line 1 is in list context "consumes" all the input and loads it into #file. The <$file> operation in line 2 has no more input to read, so the while loop does not execute.
You either want to stream from the filehandle
# don't call #file = <$file>
while (my $line = <$file>) {
chomp $line;
my $truncated = &truncate($line, 96);
...
}
Or read from the array of file contents
my #file = <$file>;
foreach my $line (#file) {
chomp $line;
my $truncated = &truncate($line, 96);
...
}
If the input is large, the former format has the advantage of just loading a single line into memory at a time.

cant retrieve values from hash reversal (Perl)

I've initialized a hash with Names and their class ranking as follows
a=>5,b=>2,c=>1,d=>3,e=>5
I've this code so far
my %Ranks = reverse %Class; #As I need to find out who's ranked first
print "\nFirst place goes to.... ", $Ranks{1};
The code only prints out
"First place goes to...."
I want it to print out
First place goes to....c
Could you tell me where' I'm going wrong here?
The class hash prints correctly
but If I try to print the reversed hash using
foreach $t (keys %Ranks) {
print "\n $t $Ranks{$t}"; }
It prints
5
abc23
cab2
ord
If this helps in any way
FULL CODE
#Script to read from the data file and initialize it into a hash
my %Code;
my %Ranks;
#Check whether the file exists
open(fh, "Task1.txt") or die "The File Does Not Exist!\n", $!;
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}
close(fh);
#Prints the dataset
print "Code \t Name\n";
foreach $code ( keys %Code) {
print "$code \t $Code{$code}\n";
}
#Find out who comes first
my %Ranks = reverse %Class;
foreach $t (keys %Ranks)
{
print "\n $t $Ranks{$t}";
}
print "\nFirst place goes to.... ", $Ranks{1}, "\n";
When you want to check what your data structures actually contain, use Data::Dumper. use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper(\%Class));. You'll find un-chomped newlines.
You need to use chomp. At present your $fields[2] value has a trailing newline.
Change your file read loop to this
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}

How to extract the last element of a string and use it to grow an array inside a loop

I have a dataset like this:
10001;02/07/98;TRIO;PI;M^12/12/59^F^^SP^09/12/55
;;;;;M1|F1|SP1;11;10;12;10;12;11;1.82;D16S539
;;;;;M1|F1|SP1;8;8;8;8;10;8;3.45;D7S820
;;;;;M1|F1|SP1;14;12;12;11;14;11;1.57;D13S317
;;;;;M1|F1|SP1;12;12;13;12;13;8;3.27;D5S818
;;;;;M1|F1|SP1;12;12;12;12;12;8;1.51;CSF1PO
;;;;;M1|F1|SP1;8;11;11;11;11;8;1.79;TPOX
;;;;;M1|F1|SP1;6;9;9;6;8;6;1.31;TH01
I'm trying to extract the last element of the lines which does not start with a number, i.e. all lines except the first one. I want to put these values inside an array called #markers.
I'm trying that by the following code:
#!usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = (split /;/), $line;
if ($line !~ m/^[0-9]+/) {
my #markers = splice #fields, 0, #fields - 1;
}
}
But that does not work. Can anyone help please?
Thanks
You create a new variable named #markers every pass of the loop.
my #fields = (split /;/), $line; means (my #fields = (split /;/, $_)), $line;. You meant my #fields = (split /;/, $line);
'test' || die $! is the same as just 'test'.
use strict;
use warnings;
open my $FILE, '<', 'test'
or die $!;
my #markers;
while (<$FILE>) {
chomp;
next if /^\s*\z/; # Skip blank lines.
my #fields = split /;/;
push #markers, $fields[-1]
if $fields[0] eq '';
}
You aren't using function split() correctly. I have fixed it in the code below and printed the values:
#!/usr/bin/perl
use warnings;
use strict;
open FILE, 'test' || die $!;
while (my $line = <FILE>) {
my #fields = split( /;/, $line);
if ($line !~ m/^[0-9]+/) {
print "$fields[-1]";
# my #markers = splice #fields, 0, #fields - 1;
}
}

How can I find the elements appearing in two columns of a tab-delimited file?

I have a file which is tab delimited, and has two columns, A and B.
I want to count the number of times an element in B is repeated in A. I could have done it in Excel, but since the two columns contain more than 200k elements, it hangs.
I tried with this code but it counts elements in itself:
my %counts = ();
for (#A) {
$count{$_}++;
}
foreach my $k(keys %counts) {
print "$k\t$count{$k}\n";
}
Try this solution:
use strict;
use warnings;
my %countx;
my #y;
my $file = 'ab.txt';
open my $fh, '<', $file or die "Couldn't open $file";
while (my $line = <$fh>) {
chomp $line; # remove newline
# I've avoided using $a and $b because they are special variables in perl
my ( $x, $y ) = split /\t/, $line;
$countx{ $x }++;
push #y, $y;
}
close $fh;
foreach my $y (#y) {
my $count = $countx{ $y } || 0;
print "$y\t$count\n";
}