How to read specific lines from file and store in an array using perl? - perl

How can i read/store uncommented lines from file into an array ?
file.txt looks like below
request abcd uniquename "zxsder,azxdfgt"
request abcd uniquename1 "nbgfdcbv.bbhgfrtyujk"
request abcd uniquename2 "nbcvdferr,nscdfertrgr"
#request abcd uniquename3 "kdgetgsvs,jdgdvnhur"
#request abcd uniquename4 "hvgsfeyeuee,bccafaderryrun"
#request abcd uniquename5 "bccsfeueiew,bdvdfacxsfeyeueiei"
Now i have to read/store the uncommented lines (first 3 lines in this script) into an array. is it possible to use it by pattern matching with string name or any regex ? if so, how can i do this ?
This below code stores all the lines into an array.
open (F, "test.txt") || die "Could not open test.txt: $!\n";
#test = <F>;
close F;
print #test;
how can i do it for only uncommented lines ?

If you know your comments will contain # at the beginning you can use
next if $_ =~ m/^#/
Or use whatever variable you have to read each line instead of $_
This matches # signs at the beginning of the line.
As far as adding the others to an array you can use push (#arr, $_)
#!/usr/bin/perl
# Should always include these
use strict;
use warnings;
my #lines; # Hold the lines you want
open (my $file, '<', 'test.txt') or die $!; # Open the file for reading
while (my $line = <$file>)
{
next if $line =~ m/^#/; # Look at each line and if if isn't a comment
push (#lines, $line); # we will add it to the array.
}
close $file;
foreach (#lines) # Print the values that we got
{
print "$_\n";
}

You could do:
push #ary,$_ unless /^#/;END{print join "\n",#ary}'
This skips any line that begins with #. Otherwise the line is added to an array for later use.

The smallest change to your original program would probably be:
open (F, "test.txt") || die "Could not open test.txt: $!\n";
#test = grep { $_ !~ /^#/ } <F>;
close F;
print #test;
But I'd highly recommend rewriting that slightly to use current best practices.
# Safety net
use strict;
use warnings;
# Lexical filehandle, three-arg open
open (my $fh, '<', 'test.txt') || die "Could not open test.txt: $!\n";
# Declare #test.
# Don't explicitly close filehandle (closed automatically as $fh goes out of scope)
my #test = grep { $_ !~ /^#/ } <$fh>;
print #test;

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.
paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.
To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);
You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

To trim lines based on line number in perl

My Perl file generates the text file which usually contains 200 lines. Sometimes it exceeds 200 lines (For example 217 lines). I need to trim off the rest of the lines from the 201st line. I have used the counter method to trim the exceeded lines. Is there any other simple and efficient way to do this?
Code:
#!/usr/bin/perl -w
use strict;
use warnings;
my $filename1="channel.txt";
my $filename2="channel1.txt";
my $fh;
my $fh1;
my $line;
my $line1;
my $count=1;
open $fh, '<', $filename1 or die "Can't open > $filename1: $!";
open $fh1, '>', $filename2 or die "Can't open > $filename2: $!";
while(my $line = <$fh>)
{
chomp $line;
chomp $line1;
if($count<201)
{
print $fh1 "$line\n";
}
$count++;
}
close ($fh1);
close($fh);
I have already mentioned in my comment, this is short version of that comment If you actually trying to trim the file you can use the Perl One Liner instead of writing the whole code
perl -pe 'last if($. == 201);' input.text >result.txt
-p used for process the file line by line an print the output
-e execute flag, to execute the Perl syntax
With Perl script you can do this also
open my $fh,"<","input.txt";
open my $wh,">","result.txt";
print $wh scalar <$fh> for(1..10);
xxfelixxx already gave you the correct answer. I am just changing my earlier posted answer, to clean up your code and to write back to the original file:
use strict;
use warnings;
my #array;
my $filename="channel.txt";
open my $fh, '<', $filename or die "Can't open > $filename: $!";
while( my $line = <$fh> ) {
last if $. > 200;
push #array, $line;
}
close($fh);
open $fh, '>', $filename or die "Can't open > $filename: $!";
print $fh #array;
close($fh);
There is no need to keep your own counter, perl has a special variable $. which keeps track of the input line number. You can simplify your loop like so:
while( chomp( my $line = <$fh> ) ) {
last if $. > 200;
print $fh1 "$line\n";
}
perldoc perlvar - Search for INPUT_LINE_NUMBER.
To write back to the original file: input.txt without using redirection:
perl -pi.tmp -we "last if $.>200;" input.txt
where
-i : opens a temp file and automatically replaces the file to be
edited with the temporary file after processing (the '.tmp'
is the suffix to use for the temp file during processing)
-w : command line flag to 'use warnings'
-p : magic; basically equivalent to coding:
LINE: while (defined $_ = <ARGV>)) {
"your code here"
}
-e : perl code follows this flag (enclosed in double quotes for MSWin32 aficiandos)

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

Want to add random string to identifier line in fasta file

I want to add random string to existing identifier line in fasta file.
So I get:
MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Then the sequence on the next lines as normal. I am have problem with i think in the format output. This is what I get:
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
It's added to every line. (I altered length to fit here.) I want just to add to the identifier line.
This is what i have so far:
use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
my $header_line;
my $seq;
my $uniqueID;
open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");
while( <$fh> ){
if ($_ =~ m/^(\S+)\s+(.*)/) {
$header_line = $1;
$seq = $2;
$uniqueID = $currentId++;
print $out_fh "$header_line$uniqueID\n$seq";
} # if
} # while
close $fh;
close $out_fh;
Thanks very much, any ideas will be greatly appreciated.
Your program isn't working because the regex ^(\S+)\s+(.*) matches every line in the input file. For instance, \S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT; the newline at the end of the line matches \s+; and nothing matches .*.
Here's how I would encode your solution. It simply appends $current_id to the end of any line that contains a pipe | character
use strict;
use warnings;
use 5.010;
use autodie;
my ($filename) = #ARGV;
my $current_id = 'a' x 57;
open my $in_fh, '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";
while ( my $line = <$in_fh> ) {
chomp $line;
$line .= $current_id if $line =~ tr/|//;
print $line, "\n";
}
close $out_fh;
output
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT

Extract file contents between given lines using perl

I want to use only Sed in Perl to capture the file contents between 1000 and 2000 lines in a given file.
I tried the below but it didn't work,Can someone help me on this please.
$firstLIne="1000";
$lastline="2000";
$output=`sed -n '$firstLIne,$lastline'p sample.txt`;
Here is another pure perl solution:
my ($firstline, $lastline) = (1000,2000);
open my $fh, '<', 'sample.txt' or die "$!";
while(<$fh>){
print if $. == $firstline .. $. == $lastline;
}
if you don't use the variables anywhere else, you can use the special use case of .. with constants (4th paragraph if you use constant expression they automatically get compared to $.):
while(<$fh>){
print if 1000 .. 2000;
}
Here is the important part from the perldoc for the .. operator:
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors.
Edit Per request, with storing the intermediate lines in a variable.
my ($firstline, $lastline) = (1000,2000);
my $output = '';
open my $fh, '<', 'sample.txt' or die $!;
while(<$fh>){
$output .= $_ if $. == $firstline .. $. == $lastline;
}
print $ouput;
Also, if your file isn't too big (it fits completely into memory) you also can read it into a list and select the lines you're interested in:
my $output = join '', (<$fh>)[$firstline+1..$lastline]
For comparison, to do this in Perl only, one could write:
my $firstLine=1000;
my $lastLine=2000;
my $fn="sample.txt";
my $output;
open (my $fh, "<", $fn) or die "Could not open file '$fn': $!\n";
while (<$fh>) {
last if $. > $lastLine;
$output .= $_ if $. >= $firstLine;
}
close($fh);
Note that this will stop reading from file after line $lastLine.. so if the file contains 100,000 lines it will only read the first 2000 lines..
If you just want to print out the lines then:
perl -ne 'print if 1000 .. 2000' example_data.txt
should work.
If you want to incorporate that into a script somehow then you can "semi-slurp" the filehandle:
use strict;
use warnings;
open my $filehandle, 'example_data.txt' or die $!;
my $lines_1k_to_2k ;
while (<$filehandle>) {
$lines_1k_to_2k .= $_ if 1000 .. 2000 ;
}
print $lines_1k_to_2k ;
The .= operator will add the lines to the string in variable $lines_1k_to_2k only if they are in the range 1000 .. 2000