I am looking for the easiest way to look for text in huge file and save it into same variables for later use.
The file format is:
>gi|24585363|ref|NP_724239.1| short neuropeptide F precursor [Drosophila melanogaster]
MFHLKRELSQGCALALICLVSLQMQQPAQAEVSSAQGTPLSNLYDNLLQREYAGPVVFPNHQVERKAQRS
PSLRLRFGRSDPDMLNSIVEKRWFGDVNQKPIRSPSLRLRFGRRDPSLPQMRRTAYDDLLERELTLNSQQ
QQQQLGTEPDSDLGADYDGLYERVVRKPQRLRWGRSVPQFEANNADNEQIERSQWYNSLLNSDKMRRMLV
ALQQQYEIPENVASYANDEDTDTDLNNDTSEFQREVRKPMRLRWGRSTGKAPSEQKHTPEETSSIPPKTQ
N
>gi|442619471|ref|NP_001262643.1| neuropeptide F, isoform C [Drosophila melanogaster]
MCQTMRCILVACVALALLAAGCRVEASNSRPPRKNDVNTMADAYKFLQDLDTYYGDRARVRFGKRGSLMD
ILRNHEMDNINLGKNANNGGEFARGFNEEEIF
>gi|442619469|ref|NP_001262642.1| neuropeptide F, isoform B [Drosophila melanogaster]
MCQTMRCILVACVALALLAAGCRVEASNSRPPRKNDVNTMADAYKFLQDLDTYYGDRARVRFGKRGSLMD
ILRNHEMDNINLGKNANNGGEFARGFNEEEIF
every sequence starts with ">"
I tried this:
open (FILE, $fastaFile);
while (<FILE>) {
chomp;
($name, $name2) = split(/:/);
print "Name: $name\n";
print "Name2: $name2\n";
} close (FILE);
exit;
I never needed to look for specific text. Perhaps it would be easy to use grep only I donĀ“t know that.
The bigest problem for me is that I have in another file result from my other program and I need to find these results in another file.
My main program gave me these results:
>gi|24585363|ref|NP_724239.1|
>gi|442619469|ref|NP_001262642.1|
and I need to find it in the second file and save this into $name and into $sequence put the sequnces for that name.:
$name = ">gi|24585363|ref|NP_724239.1|"
$sequnce = "MFHLKRELSQGCALALICLVSLQMQQPAQAEVSSAQGTPLSNLYDNLLQREYAGPVVFPNHQVERKAQRS
PSLRLRFGRSDPDMLNSIVEKRWFGDVNQKPIRSPSLRLRFGRRDPSLPQMRRTAYDDLLERELTLNSQQ
QQQQLGTEPDSDLGADYDGLYERVVRKPQRLRWGRSVPQFEANNADNEQIERSQWYNSLLNSDKMRRMLV
ALQQQYEIPENVASYANDEDTDTDLNNDTSEFQREVRKPMRLRWGRSTGKAPSEQKHTPEETSSIPPKTQ
N"
Can anyone give me advice on how to proceed? If use grep or some another way to get there.
This should help get you what you want. You will need to install Bio Perl
#!/usr/bin/perl
use warnings;
use strict;
use Bio::SeqIO;
my $seqFile = Bio::SeqIO->new('-format' => 'fasta', '-file' => 'myFasta.fasta');
while((my $seqObj = $seqFile->next_seq())){
print "Seen Sequence " . $seqObj->display_id . "\n";
print "Sequence: " . $seqObj->seq() . "\n";
}
Related
I'm trying to write a perl script where I'm trying to save whole contents of those files which contain a specific string 'PYAG_GENERATED', in a single .txt/.tmp file one after another. These file names are in a specific pattern and this pattern is 'output_nnnn.txt' where nnnn is 0001,0002 and so on. But I don't know how many number of files are present with this 'output_nnnn.txt' name.
I'm new in perl and I don't know how I can resolve this issue to get the output correctly. Can anyone help me. Thanks in advance.
I've tried to write perl script in different ways but nothing is coming in output file. I'm giving here one of those I've tried. 'new_1.txt' is the new file where I want to save the expected output and "PYAG_GENERATED" is that specific string I'm finding for in the files.
open(NEW,">>new_1.txt") or die "could not open:$!";
$find2="PYAG_GENERATED";
$n='0001';
while('output_$n.txt'){
if(/find2/){
print NEW;
}
$n++;
}
close NEW;
I expect that the output file 'new_1.txt' will save the whole contents of the the files(with filename pattern 'output_nnnn.txt') which have 'PYAG_GENERATED' string at least once inside.
Well, you tried I guess.
Welcome to the wonderful world of Perl where there are always a dozen ways of doing X :-) One possible way to achieve what you want. I put in a lot of comments I hope are helpful. It's also a bit verbose for the sake of clarity. I'm sure it could be golfed down to 5 lines of code.
use warnings; # Always start your Perl code with these two lines,
use strict; # and Perl will tell you about possible mistakes
use experimental 'signatures';
use File::Slurp;
# this is a subroutine/function, a block of code that can be called from
# somewhere else. it takes to arguments, that the caller must provide
sub find_in_file( $filename, $what_to_look_for )
{
# the open function opens $filename for reading
# (that's what the "<" means, ">" stands for writing)
# if successfull open will return we will have a "file handle" in the variable $in
# if not open will return false ...
open( my $in, "<", $filename )
or die $!; # ... and the program will exit here. The variable $! will contain the error message
# now we read the file using a loop
# readline will give us the next line in the file
# or something false when there is nothing left to read
while ( my $line = readline($in) )
{
# now we test wether the current line contains what
# we are looking for.
# the index function gives us the index of a string within another string.
# for example index("abc", "c") will give us 3
if ( index( $line, $what_to_look_for ) > 0 )
{
# we found what we were looking for
# so we don't need to keep looking in this file anymore
# so we must first close the file
close( $in );
# and then we indicate to the caller the search was a successfull
# this will immedeatly end the subroutine
return 1;
}
}
# If we arrive here the search was unsuccessful
# so we tell that to the caller
return 0;
}
# Here starts the main program
# First we get a list of files
# we want to look at
my #possible_files = glob( "where/your/files/are/output_*.txt" );
# Here we will store the files that we are interested in, aka that contain PYAG_GENERATED
my #wanted_files;
# and now we can loop over the files and see if they contain what we are looking for
foreach my $filename ( #possible_files )
{
# here we use the function we defined earlier
if ( find_in_file( $filename, "PYAG_GENERATED" ) )
{
# with push we can add things to the end of an array
push #wanted_files, $filename;
}
}
# We are finished searching, now we can start adding the files together
# if we found any
if ( scalar #wanted_files > 0 )
{
# Now we could code that us ourselves, open the files, loop trough them and write out
# line by line. But we make life easy for us and just
# use two functions from the module File::Slurp, which comes with Perl I believe
# If not you have to install it
foreach my $filename ( #wanted_files )
{
append_file( "new_1.txt", read_file( $filename ) );
}
print "Output created from " . (scalar #wanted_files) . " files\n";
}
else
{
print "No input files\n";
}
use strict;
use warnings;
my #a;
my $i=1;
my $find1="PYAG_GENERATED";
my $n=1;
my $total_files=47276; #got this no. of files by writing 'ls' command in the terminal
while($n<=$total_files){
open(NEW,"<output_$n.txt") or die "could not open:$!";
my $join=join('',<NEW>);
$a[$i]=$join;
#print "$a[10]";
$n++;
$i++;
}
close NEW;
for($i=1;$i<=$total_files;$i++){
if($a[$i]=~m/$find1/){
open(NEW1,">>new_1.tmp") or die "could not open:$!";
print NEW1 $a[$i];
}
}
close NEW1;
I am reading values from my sql database for a column. Now I want to convert columns values in row in comma(ex:-"abc","xyz")
Source Data:-
amol
aakash
shami
krishna
Output expected: ( "amol","aakash","shami","krishna")
Code which I am trying:-
#!/usr/bin/env perl
$t = `date`;
#print $t;
$GCMS_SERVER = $ENV{DSQUERY};
$GCMS_USERNAME = $ENV{GCMS_USERNAME};
$GCMS_PASSWORD = $ENV{GCMS_PASSWORD};
$GCMS_DATABASE = $ENV{GCMS_DATABASE};
#print "test\n";
my $query = "SELECT Label FROM FeedGenSource WHERE BaseFileName ='aldgctna'";
#print "SQL =$query\n";
my $sqlcmd = (qq/
set nocount on
go
use $GCMS_DATABASE
go
$query
/);
open(::DBCMD, "sqsh -S$GCMS_SERVER -U$GCMS_USERNAME -h -w100 -P- <<EOF
$GCMS_PASSWORD
$sqlcmd
go
EOF
|") || die("Could not communicate with DB ");
while(<::DBCMD>){
print "$_";
}
#print "done\n";
close ::DBCMD;
exit
This is the perl script. I want to use output in query as in statement.
Your output comes from this section of your code:
while(<::DBCMD>){
print "$_";
}
The filehandle ::DBCMD is connected to the output from your sqsh command. Your code reads each record from that filehandle and prints it to STDOUT.
If you want to do something cleverer with the output, then you're going to have to store the data in some kind of data structure (probably an array in this case) and manipulate that.
I expect you want something like this:
my #data;
while (<::DBCMD>) {
chomp; # remove the newline
push #data, $_;
}
# And then:
print join(',', #data), "\n";
To print the exact output that you ask for, you would need this:
print '(', join(',', map { qq["$_"] } #data), ")\n";
But I have to ask... why are you making your life so difficult by manipulating data that comes back from sqsh? You should really look at Perl's database interface library, DBI. That will make your life far simpler.
A few other tips:
Always have use strict and use warnings in your code. And fix the issues they will reveal.
Use Perl's built-in date and time tools instead of shelling out to date.
Using :: on your bareword filehandle achieves nothing. Just DBCMD works the same way and is less confusing.
I try to find a way to print a progressbar on the commandline while parsing logfiles. Get logfiles=> foreach file => foreach line {do}.
The idea: I want to print a part of the progressbar in every "foreach file" loop. Meaing: print the whole bar if you just parse 1 file. print half of the bar for every file when u parse 2 files and so on. You find the specific code at the bottom.
The problem: The output (print "*") is printed after ALL foreach iteration are done - not in between. Details are in the Code.
Does someone have an idea how to print inside a foreach? Or can tell me the problem? I don't get it :(.
my #logfiles=glob($logpath);
print "<------------------>\n";
$vari=20/(scalar #logfiles);
foreach my $logfile (#logfiles){
open(LOGFILEhandle, $logfile);
#lines = <LOGFILEhandle>;
print "*" x $vari; #won't work, only after loop. Even a "print "*";" doesn't work
foreach my $line (#lines){
#print "*"; works "in between". print "*" x $vari; does not.
if ($line=~/xyz/){
......
......
}
close(LOGFILEhandle);
}
}
I would like to suggest Term::ProgressBar module to avoid reinventing the wheel.
#!/usr/bin/perl
use strict;
use warnings;
use Term::ProgressBar;
my #files = qw (file1 file2 file3 file4);
my $progress = Term::ProgressBar->new(scalar #files);
for (0..#files) {
$| = 1;
sleep(1); #introducing sleep for demo purpose otherwise bar will fill up quickly
#open the file, do some operations and when you are done
#update the bar
$progress->update($_);
}
You are suffering from buffering. The output is buffered until a certain amount is reached or you print a newline. To change this behaviour simply add
$| = 1 ;
at the top of your file. This will turn on autoflush for STDOUT.
There is more than one way to do it and a little bit longer and less cryptic is Borodins suggestion:
STDOUT->autoflush();
I'm experiencing a rather odd problem while using Data::Dumper to try and check on my importing of a large list of data into a hash.
My Data looks like this in another file.
##Product ID => Market for product
ABC => Euro
XYZ => USA
PQR => India
Then in my script, I'm trying to read in my list of data into a hash like so:
open(CONFIG_DAT_H, "<", $config_data);
while(my $line = <CONFIG_DAT_H>) {
if($line !~ /^\#/) {
chomp($line);
my #words = split(/\s*\=\>\s/, $line);
%product_names->{$words[0]} = $words[1];
}
}
close(CONFIG_DAT_H);
print Dumper (%product_names);
My parsing is working for the most part that I can find all of my data in the hash, but when I print it using the Data::Dumper it doesn't print it properly. This is my output.
$VAR1 = 'ABC';
';AR2 = 'Euro
$VAR3 = 'XYZ';
';AR4 = 'USA
$VAR5 = 'PQR';
';AR6 = 'India
Does anybody know why the Dumper is printing the '; characters over the first two letters on my second column of data?
There is one unclear thing in the code: is *product_names a hash or a hashref?
If it is a hash, you should use %product_names{key} syntax, not %product_names->{key}, and need to pass a reference to Data::Dumper, so Dumper(\%product_names).
If it is a hashref then it should be labelled with a correct sigil, so $product_names->{key} and Dumper($product_names}.
As noted by mob if your input has anything other than \n it need be cleaned up more explicitly, say with s/\s*$// per comment. See the answer by ikegami.
I'd also like to add, the loop can be simplified by loosing the if branch
open my $config_dat_h, "<", $config_data or die "Can't open $config_data: $!";
while (my $line = <$config_dat_h>)
{
next if $line =~ /^\#/; # or /^\s*\#/ to account for possible spaces
# ...
}
I have changed to the lexical filehandle, the recommended practice with many advantages. I have also added a check for open, which should always be in place.
Humm... this appears wrong to me, even you're using Perl6:
%product_names->{$words[0]} = $words[1];
I don't know Perl6 very well, but in Perl5 the reference should be like bellow considering that %product_names exists and is declared:
$product_names{...} = ... ;
If you could expose the full code, I can help to solve this problem.
The file uses CR LF as line endings. This would become evident by adding the following to your code:
local $Data::Dumper::Useqq = 1;
You could convert the file to use unix line endings (seeing as you are on a unix system). This can be achieved using the dos2unix utility.
dos2unix config.dat
Alternatively, replace
chomp($line);
with the more flexible
$line =~ s/\s+\z//;
Note: %product_names->{$words[0]} makes no sense. It happens to do what you want in old versions of Perl, but it rightfully throws an error in newer versions. $product_names{$words[0]} is the proper syntax for accessing the value of an element of a hash.
Tip: You should be using print Dumper(\%product_names); instead of print Dumper(%product_names);.
Tip: You might also find local $Data::Dumper::Sortkeys = 1; useful. Data::Dumper has such bad defaults :(
Tip: Using split(/\s*=>\s*/, $line, 2) instead of split(/\s*=>\s*/, $line) would permit the value to contain =>.
Tip: You shouldn't use global variable without reason. Use open(my $CONFIG_DAT_H, ...) instead of open(CONFIG_DAT_H, ...), and replace other instances of CONFIG_DAT_H with $CONFIG_DAT_H.
Tip: Using next if $line =~ /^#/; would avoid a lot of indenting.
Hello everyone I'm a beginner in perl and I'm facing some problems as I want to put my strings starting from AA to \ in to an array and want to save it. There are about 2000-3000 strings in a txt file starting from same initials i.e., AA to / I'm doing it by this way plz correct me if I'm wrong.
Input File
AA c0001
BB afsfjgfjgjgjflffbg
CC table
DD hhhfsegsksgk
EB jksgksjs
\
AA e0002
BB rejwkghewhgsejkhrj
CC chair
DD egrhjrhojohkhkhrkfs
VB rkgjehkrkhkh;r
\
Source code
$flag = 0
while ($line = <ifh>)
{
if ( $line = m//\/g)
{
$flag = 1;
}
while ( $flag != 0)
{
for ($i = 0; $i <= 10000; $i++)
{ # Missing brace added by editor
$array[$i] = $line;
} # Missing brace added by editor
}
} # Missing close brace added by editor; position guessed!
print $ofh, $line;
close $ofh;
Welcome to StackOverflow.
There are multiple issues with your code. First, please post compilable Perl; I had to add three braces to give it the remotest chance of compiling, and I had to guess where one of them went (and there's a moderate chance it should be on the other side of the print statement from where I put it).
Next, experts have:
use warnings;
use strict;
at the top of their scripts because they know they will miss things if they don't. As a learner, it is crucial for you to do the same; it will prevent you making errors.
With those in place, you have to declare your variables as you use them.
Next, remember to indent your code. Doing so makes it easier to comprehend. Perl can be incomprehensible enough at the best of times; don't make it any harder than it has to be. (You can decide where you like braces - that is open to discussion, though it is simpler to choose a style you like and stick with it, ignoring any discussion because the discussion will probably be fruitless.)
Is the EB vs VB in the data significant? It is hard to guess.
It is also not clear exactly what you are after. It might be that you're after an array of entries, one for each block in the file (where the blocks end at the line containing just a backslash), and where each entry in the array is a hash keyed by the first two letters (or first word) on the line, with the remainder of the line being the value. This is a modestly complex structure, and probably beyond what you're expected to use at this stage in your learning of Perl.
You have the line while ($line = <ifh>). This is not invalid in Perl if you opened the file the old fashioned way, but it is not the way you should be learning. You don't show how the output file handle is opened, but you do use the modern notation when trying to print to it. However, there's a bug there, too:
print $ofh, $line; # Print two values to standard output
print $ofh $line; # Print one value to $ofh
You need to look hard at your code, and think about the looping logic. I'm sure what you have is not what you need. However, I'm not sure what it is that you do need.
Simpler solution
From the comments:
I want to flag each record starting from AA to \ as record 0 till record n and want to save it in a new file with all the record numbers.
Then you probably just need:
#!/usr/bin/env perl
use strict;
use warnings;
my $recnum = 0;
while (<>)
{
chomp;
if (m/^\\$/)
{
print "$_\n";
$recnum++;
}
else
{
print "$recnum $_\n";
}
}
This reads from the files specified on the command line (or standard input if there are none), and writes the tagged output to standard output. It prefixes each line except the 'end of record' marker lines with the record number and a space. Choose your output format and file handling to suit your needs. You might argue that the chomp is counter-productive; you can certainly code the program without it.
Overly complex solution
Developed in the absence of clear direction from the questioner.
Here is one possible way to read the data, but it uses moderately advanced Perl (hash references, etc). The Data::Dumper module is also useful for printing out Perl data structures (see: perldoc Data::Dumper).
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #data;
my $hashref = { };
my $nrecs = 0;
while (<>)
{
chomp;
if (m/^\\$/)
{
# End of group - save to data array and start new hash
$data[$nrecs++] = $hashref;
$hashref = { };
}
else
{
m/^([A-Z]+)\s+(.*)$/;
$hashref->{$1} = $2;
}
}
foreach my $i (0..$nrecs-1)
{
print "Record $i:\n";
foreach my $key (sort keys $data[$i])
{
print " $key = $data[$i]->{$key}\n";
}
}
print Data::Dumper->Dump([ \#data ], [ '#data' ]);
Sample output for example input:
Record 0:
AA = c0001
BB = afsfjgfjgjgjflffbg
CC = table
DD = hhhfsegsksgk
EB = jksgksjs
Record 1:
AA = e0002
BB = rejwkghewhgsejkhrj
CC = chair
DD = egrhjrhojohkhkhrkfs
VB = rkgjehkrkhkh;r
$#data = [
{
'EB' => 'jksgksjs',
'CC' => 'table',
'AA' => 'c0001',
'BB' => 'afsfjgfjgjgjflffbg',
'DD' => 'hhhfsegsksgk'
},
{
'CC' => 'chair',
'AA' => 'e0002',
'VB' => 'rkgjehkrkhkh;r',
'BB' => 'rejwkghewhgsejkhrj',
'DD' => 'egrhjrhojohkhkhrkfs'
}
];
Note that this data structure is not optimized for searching except by record number. If you need to search the data in some other way, then you need to organize it differently. (And don't hand this code in as your answer without understanding it all - it is subtle. It also does no error checking; beware faulty data.)
It can't be right. I can see two main issues with your while-loop.
Once you enter the following loop
while ( $flag != 0)
{
...
}
you'll never break out because you do not reset the flag whenever you find an break-line. You'll have to parse you input and exit the loop if necessary.
And second you never read any input within this loop and thus process the same $line over and over again.
You should not put the loop inside your code but instead you can use the following pattern (pseudo-code)
if flag != 0
append item to array
else
save array to file
start with new array
end
I believe what you want is to split the files content at \ though it's not too clear.
To achieve this you can slurp the file into a variable by setting the input record separator, then split the content.
To find out about Perl's special variables related to filehandlers read perlvar
#!perl
use strict;
use warnings;
my $content;
{
open my $fh, '<', 'test.txt';
local $/; # slurp mode
$content = <$fh>;
close $fh;
}
my #blocks = split /\\/, $content;
Make sure to localize modifications of Perl's special variables to not interfere with different parts of your program.
If you want to keep the separator you could set $/ to \ directly and skip split.
#!perl
use strict;
use warnings;
my #blocks;
{
open my $fh, '<', 'test.txt';
local $/ = '\\'; # seperate at \
#blocks = <$fh>;
close $fh;
}
Here's a way to read your data into an array. As I said in a comment, "saving" this data to a file is pointless, unless you change it. Because if I were to print the #data array below to a file, it would look exactly like the input file.
So, you need to tell us what it is you want to accomplish before we can give you an answer about how to do it.
This script follows these rules (exactly):
Find a line that begins with "AA",
and save that into $line
Concatenate every new line from the
file into $line
When you find a line that begins with
a backslash \, stop concatenating
lines and save $line into #data.
Then, find the next line that begins
with "AA" and start the loop over.
These matching regexes are pretty loose, as they will match AAARGH and \bonkers as well. If you need them stricter, you can try /^\\$/ and /^AA$/, but then you need to watch out for whitespace at the beginning and end of line. So perhaps /^\s*\\\s*$/ and /^\s*AA\s*$/ instead.
The code:
use warnings;
use strict;
my $line="";
my #data;
while (<DATA>) {
if (/^AA/) {
$line = $_;
while (<DATA>) {
$line .= $_;
last if /^\\/;
}
}
push #data, $line;
}
use Data::Dumper;
print Dumper \#data;
__DATA__
AA c0001
BB afsfjgfjgjgjflffbg
CC table
DD hhhfsegsksgk
EB jksgksjs
\
AA e0002
BB rejwkghewhgsejkhrj
CC chair
DD egrhjrhojohkhkhrkfs
VB rkgjehkrkhkh;r
\