perl array prints as GLOB(#x#########) - perl

I have a file which contains a list of email addresses which are separated by a semi-colon which is configured much like this (but much larger) :
$ cat email_casper.txt
casper1#foo.com; casper2#foo.com; casper3#foo.com; casper.casper4#foo.com;
#these throw outlook error :
#casper101#foo.com ; casper100#foo.com
#cat /tmp/emailist.txt | tr '\n' '; '
#cat /tmp/emallist.txt | perl -nle 'print /\<(.*)\>/' | sort
I want to break it up on the semicolon - so I suck the whole file into an array supposedly the contents are split on semicolon.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0 ;
But the file awards me with a golb. I just don't know what is going one.
$ ./spliton_semi.pl email_casper.txt
GLOB(0x80070b90)
If I use Data::Dumper I get
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper ;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
print Dumper \#values ;
This is what the Dumper returns :
$ ./spliton_semi.pl email_casper.txt
$VAR1 = [
'GLOB(0x80070b90)'
];

You do not "suck the whole file into an array". You don't even attempt to read from the file handle. Instead, you pass the file handle to split. Expecting a string, it stringifies the file handle into GLOB(0x80070b90).
You could read the file into an array of lines as follows:
my #lines = <$fh>;
for my $line ($lines) {
...
}
But it's far simpler to read one line at a time.
while ( my $line = <$fh> ) {
...
}
In fact, there is no reason not to use ARGV here, simplifying your program to the following:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
while (<>) {
chomp;
say for split /\s*;\s*/, $_;
}

This line
my #values = split(';', $fh);
is not reading from the filehandle like you think it is. You're actually calling split on the filehandle object itself.
You want this:
my $line = <$fh>;
my #values = split(';', $line);

Starting point:
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, '<', 'dummy.txt')
or die "$!";
my #values;
while (<$fh>) {
chomp;
# original
# push(#values, split(';', $_));
# handle white space
push(#values, split(/\s*;\s*/, $_));
}
close($fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0;
Output for your example:
$ perl dummy.pl
casper1#foo.com
casper2#foo.com
casper3#foo.com
casper.casper4#foo.com

Related

perl- extract duplicate sequences from a multi-fasta file

I have a big fasta file input.fasta which consists many duplicate sequences. I want to enter a header name and extract out all the sequences with the matching header. I know this could be done easily done with awk/sed/grep but I need a Perl code.
input.fasta
>OGH38127_some_organism
PAAALGFSHLARQEDSALTPKHYTWTAPGEGDVRAPCPVLNTLANHEFLPHNGKNITVDK
AITALGDAMNISPALATTFFTGGLKTNPTPNATWFDLDMLHKHNVLEHDGSLSRRDMHFD
TSNKFDAATFANFLSYFDANATVLGVNETADARARHAYDMSKMNPEFTITSSMLPIMVGE
SVMMMLVWGSVEEPGAQRDYFEYFFRNERLPVELGWTPGETEIGVPVVTAMITAMVAASP
TDVP
>ABC14110_some_different_org_name
WWVAPGPGDSRGPCPGLNTLANHGYLPHDGKGITLSILADAMLDGFNIARSDALLLFTQ
AIRTSPQYPATNSFNLHDLGRDQLNRHNVLEHDASLSRADDFFGSNHIFNETVFDESRAY
AMLANSKIARQINSKAFNPQYKFTSKTEQFSLGEIAAPIIAFGNSTSGEVNRTLVEYFFM
NERLPIELGWKKSEDGIALDDILRVTQMISKAASLITPSALSWTAETLTP
>OGH38127_some_organism
LPWSRPGPGAVRAPCPMLNTLANHGFLPHDGKNISEARTVQALGRALNIEKELSQFLFEK
ALTTNPHTNATTFSLNDLSRHNLLEHDASLSRQDAYFGDNHDFNQTIFDETRSYWPHPVI
DIQAAALSRQARVNTSIAKNPTYNMSELGLDFSYGETAAYILILGDKDFGKVNRSWVEYL
FENERLPVELGWTRHNETITSDDLNTMLEKVVN
.
.
.
I have tried with the following script but it is not giving any output.
script.pl
#!/perl/bin/perl -w
use strict;
use warnings;
print "Enter a fasta header to search for:\n";
my $head = <>;
my $file = "input.fasta";
open (READ, "$file") || die "Cannot open $file: $!.\n";
my %seqs;
my $header;
while (my $line = <READ>){
chomp $line;
$line =~ s/^>(.*)\n//;
if ($line =~ m/$head/){
$header = $1;
}
}
close (READ);
open( my $out , ">", "out.fasta" ) or die $!;
my #count_seq = keys %seqs;
foreach (#count_seq){
print $out $header, "\n";
print $out $seqs{$header}, "\n";
}
exit;
Please help me correct this script.
Thanks!
If you use the Bioperl module Bio::SeqIO to handle the parsing of the fasta files, it becomes really simple:
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
my ($file, $name) = #ARGV;
my $in = Bio::SeqIO->new(-file => $file, -format => "fasta");
my $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => "fasta");
while (my $s = $in->next_seq) {
$out->write_seq($s) if $s->display_id eq $name;
}
run with perl grep_fasta.pl input.fasta OGH38127_some_organism
There's no need to store the sequences in memory, you can print them directly when reading the file. Use a flag variable ($inside in the example) that tells you whether you're reading the desired sequence or not.
#! /usr/bin/perl
use warnings;
use strict;
my ($file, $header) = #ARGV;
my $inside;
open my $in, '<', $file or die $!;
while (<$in>) {
$inside = $1 eq $header if /^>(.*)/;
print if $inside;
}
Run as
perl script.pl file.fasta OGH38127_some_organism > output.fasta

how to assign data into hash from an input file

I am new to perl.
Inside my input file is :
james1
84012345
aaron5
2332111 42332
2345112 18238
wayne[2]
3505554
Question: I am not sure what is the correct way to get the input and set the name as key and number as values. example "james" is key and "84012345" is the value.
This is my code:
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $input= $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
my #names = split ' ', $data;
my #values = split ' ', $data;
#hash{#names} = #values;
print Dumper \%hash;
I'mma go over your code real quick:
#!/usr/bin/perl -w
-w is not recommended. You should use warnings; instead (which you're already doing, so just remove -w).
use strict;
use warnings;
Very good.
use Data::Dumper;
my $input= $ARGV[0];
OK.
my %hash;
Don't declare variables before you need them. Declare them in the smallest scope possible, usually right before their first use.
open my $data , '<', $input or die " cannot open file : $_\n";
You have a spurious space at the beginning of your error message and $_ is unset at this point. You should include $input (the name of the file that failed to open) and $! (the error reason) instead.
my #names = split ' ', $data;
my #values = split ' ', $data;
Well, this doesn't make sense. $data is a filehandle, not a string. Even if it were a string, this code would assign the same list to both #names and #values.
#hash{#names} = #values;
print Dumper \%hash;
My version (untested):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#ARGV == 1
or die "Usage: $0 FILE\n";
my $file = $ARGV[0];
my %hash;
{
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
local $/ = '';
while (my $paragraph = readline $fh) {
my #words = split ' ', $paragraph;
my $key = shift #words;
$hash{$key} = \#words;
}
}
print Dumper \%hash;
The idea is to set $/ (the input record separator) to "" for the duration of the input loop, which makes readline return whole paragraphs, not lines.
The first (whitespace separated) word of each paragraph is taken to be the key; the remaining words are the values.
You have opened a file with open() and attached the file handle to $data. The regular way of reading data from a file is to loop over each line, like so:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $input = $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
while (my $line = <$data>) {
chomp $line; # Removes extra newlines (\n)
if ($line) { # Checks if line is empty
my ($key, $value) = split ' ', $line;
$hash{$key} = $value;
}
}
print Dumper \%hash;
OK, +1 for using strict and warnings.
First Take a look at the $/ variable for controlling how a file is broken into records when it's read in.
$data is a file handle you need to extract the data from the file, if it's not to big you can load it all into an array, if it's a large file you can loop over each record at a time. See the <> operator in perlop
Looking at you code it appears that you want to end up with the following data structure from your input file
%hash(
james1 =>[
84012345
],
aaron5 => [
2332111,
42332,
2345112,
18238
]
'wayne[2]' => [
3505554,
]
)
See perldsc on how to do that.
All the documentation can be read using the perldoc command which comes with Perl. Running perldoc on its own will give you some tips on how to use it and running perldoc perldoc will give you possibly far more info than you need at the moment.

Perl print loop and replace print result before

Its posible to replace print result in perl output
Contents of Simple.csv:
string1
string2
string3
My script:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'simple.csv';
open my $info, $file or die "Could not open $file: $!";
while( my $line = <$info>) {
sleep(2);
print $line ;
}
close $info;
output its like:
string1
string2
string3
How to change the output in single line replace each other like string1 ..then replace string2... then replace string 3
Use printf to control the imposition of newline characters. Use a backspace (\b) to move backwards over the last output line, or simply issue a carriage-return (\r) to move to the beginning of the line. Line buffering is also disabled. Something like this meets your requirement (as I understand it):
#!/usr/bin/env perl
use strict;
use warnings;
my $file = 'simple.csv';
open my $info, '<', $file or die "Can't open '$file': $!\n";
$|++; #...don't buffer output...
while ( my $line = <$info> ) {
chomp $line; #...remove ending newline...
# printf "%s%s", $line, "\b" x length($line); # alternative-1
printf "%s\r", $line; # alternative-2
sleep 2;
}
close $info;
print "\n"; #...leave a clean output...
Hope, print statement itself can do that.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'simple.csv';
open my $info, $file or die "Could not open $file: $!";
while( my $line = <$info>) {
chomp($line);
print "$line\r";
sleep(2);
}
close $info;
This Carriage return moves to the beginning of the line for each print.

Loop through file in perl and remove strings with less than 4 characters

I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list. I come from a javascript world and perl is brand new to me.
use strict;
use warnings;
sub lessThan4 {
open( FILE, "<names.txt" );
my #LINES = <FILE>;
close( FILE );
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
use strict;
use warnings;
# automatically throw exception if open() fails
use autodie;
sub lessThan4 {
my #LINES = do {
# modern perl uses lexical, and three arg open
open(my $FILE, "<", "names.txt");
<$FILE>;
};
# remove newlines
chomp(#LINES);
open(my $FILE, ">", "names.txt");
foreach my $LINE ( #LINES ) {
print $FILE "$LINE\n" unless length($LINE) < 4;
# possible alternative to 'unless'
# print $FILE "$LINE\n" if length($LINE) >= 4;
}
close($FILE);
}
You're basically there. I hope you'll find some comments on your code useful.
# Well done for including these. So many new Perl users don't
use strict;
use warnings;
# Perl programs traditionally use all lower-case subroutine names
sub lessThan4 {
# 1/ You should use lexical variables for filehandles
# 2/ You should use the three-argument version of open()
# 3/ You should always check the return value from open()
open( FILE, "<names.txt" );
# Upper-case variable names in Perl are assumed to be global variables.
# This is a lexical variable, so name it using lower case.
my #LINES = <FILE>;
close( FILE );
# Same problems with open() here.
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
# This is your biggest problem. Perl doesn't yet embrace the idea of
# calling methods to get properties of a variable. You need to call
# length() as a function.
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
Rewriting to take all that into account, we get the following:
use strict;
use warnings;
sub less_than_4 {
open( my $in_file_h, '<', 'names.txt' ) or die "Can't open file: $!";
my #lines = <$in_file_h>;
close( $in_file_h );
open( my $out_file_h, '>', 'names.txt' ) or die "Can't open file: $!";
foreach my $line ( #lines ) {
# Note: $line will include the newline character, so you might need
# to increase 4 to 5 here
print $out_file_h $line unless length $line < 4;
}
close( $out_file_h );
}
I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list.
I suppose you need to remove strings from the file which are less than 4 chars in length.
#!/usr/bin/perl
use strict;
use warnings;
open ($FH, "<", "names.txt");
my #final_list;
while (my $line = <$FH>) {
map {
length($_) > 4 and push (#final_list, $_) ;
} split (/\s/, $line);
}
print "\nWords with more than 4 chars: #final_list\n";
#Please try this one:
use strict;
use warnings;
my #new;
while(<DATA>)
{
#Push all the values less than 4 characters
push(#new, $_) unless(length($_) > '4');
}
print #new;
__DATA__
Williams
John
Joe
Lee
Albert
Francis
Sun

How can I read the lines of a file into an array in Perl?

I have a file named test.txt that is like this:
Test
Foo
Bar
But I want to put each line in a array and print the lines like this:
line1 line2 line3
But how can I do this?
#!/usr/bin/env perl
use strict;
use warnings;
my #array;
open(my $fh, "<", "test.txt")
or die "Failed to open file: $!\n";
while(<$fh>) {
chomp;
push #array, $_;
}
close $fh;
print join " ", #array;
Here is my single liner:
perl -e 'chomp(#a = <>); print join(" ", #a)' test.txt
Explanation:
read file by lines into #a array
chomp(..) - remove EOL symbols for each line
concatenate #a using space as separator
print result
pass file name as parameter
One more answer for you to choose from:
#!/usr/bin/env perl
open(FILE, "<", "test.txt") or die("Can't open file");
#lines = <FILE>;
close(FILE);
chomp(#lines);
print join(" ", #lines);
If you find yourself slurping files frequently, you could use the File::Slurp module from CPAN:
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('test.txt');
chomp #lines;
print "#lines\n";
The most basic example looks like this:
#!/usr/bin/env perl
use strict;
use warnings;
open(F, "<", "test.txt") or die("Cannot open test.txt: $!\n"); # (1)
my #lines = ();
while(<F>) { chomp; push(#lines, $_); } # (2)
close(F);
print "#lines"; # (3) stringify
(1) is the place where the file is opened.
(2) File handles work nicely within list enviroments (scalar/list environments are defined by the left value), so if you assign an array to a file handle, all the lines are slurped into the array. The lines are delimited (ended) by the value of $/, the input record separator. If you use English;, you can use $IRS or $INPUT_RECORD_SEPARATOR. This value defaults to the newline character \n;
While this seemed to be a nice idea, I've just forgot the fact that if you print all the lines, the ending \n will be printed too. Baaad me.
Originally the code was:
my #lines = <F>;
instead of the while loop. This is still a viable alternative, but you should swap (3) with chomping and then printing/stringifying all the elements:
for (#lines) { chomp; }
print "#lines";
(3) Stringifying means converting an array to a string and inserting the value $" between the array elements. This defaults to a space.
See: the perlvar page.
So the actual 2nd try is:
#!/usr/bin/env perl
use strict;
use warnings;
open(F, "<", "test.txt") or die("Cannot open test.txt: $!\n"); # (1)
my #lines = <F>; # (2)
close(F);
chomp(#lines);
print "#lines"; # (3) stringify
This is the simplest version I could come up with:
perl -l040 -pe';' < test.txt
Which is roughly equivalent to:
perl -pe'
chomp; $\ = $/; # -l
$\ = 040; # -040
'
and:
perl -e'
LINE:
while (<>) {
chomp; $\ = $/; # -l
$\ = " "; # -040
} continue {
print or die "-p destination: $!\n";
}
'
This is the code that do this (assume the below code inside script.pl) :
use strict;
use warnings
my #array = <> ;
chomp #array;
print "#array";
It is run by:
scirpt.pl [your file]