open multiline text file in perl

open multiline text file in perl - perl

I have the following text file
Hello my
name is
Jeff
and I want to open it using perl. However, when using this code
use strict;
use warnings;
open(my $fh, "<", "input.txt") or die "cannot open input text!";
my $text= <$fh>;
print $text;
The output text is only Hello my. How can I make it print all of my input text lines and not just the first one ?

If you read the section on I/O operators in perldoc perlop, you will see the following:
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file
And, a bit further down:
If a <FILEHANDLE> is used in a context that is looking for a list, a list comprising all input lines is returned, one line per list element.
If you assign <FILEHANDLE> to a scalar variable, you will get the next line (actually record) from the file. This is what you are doing:
my $text = <$fh>;
If you assign <FILEHANDLE> to an array, you will get all of the remaining data from the file, with each line (actually record) in a separate element of the array.
my #text = <$fh>;
If you want to get all data from a file into a scalar, there are a few approaches you can take. The naive approach is to read the data in list context and then join it using an empty string:
my $text = join '', <$fh>;
You can use $/ to change Perl's idea of a "record" (often done in a do block):
my $text = do { local $/; <$fh> };
I like the slurp method from Path::Tiny.
use Path::Tiny;
my $text = path($filename)->slurp;

In scalar context, evaluating a filehandle in angle brackets yields the next line from that file
so you can use while loop to read the lines (preferred method)
while (<$fh>) #output will store into default variable $_
{
print ;
}
Or you can use the array but you are trying to storing the all content of the file at time in the variable
my #array = <$fh>;
And the slurp mode to read all file in the scalar variable
my $multiline = do { local $/; <$fh>; } #$/ input record seprator

Related

reading and printing a text file in Perl

I have simple question:
why the first code does not print the first line of the file but the second one does?
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (<FH>) {
print (<FH>);
}
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (my $file = <FH>) {
print ("$file");
}

Context.
Your first program tests for end-of-file on FH by reading the first line, then reads FH in list context as an argument to print. That translates to the whole file, as a list with one line per item. It then tests for EOF again, most likely detects it, and stops.
Your second program iterates by line, each one read in scalar context to variable $file, and prints them individually. It detects EOF by a special case in the while syntax. (see the code samples in the documentation)
So the specific reason why your program doesn't print the first line in one case is that it's lost in the argument to while. Do note that the two programs' structure is pretty different: the first only runs a single while iteration, while the second iterates once per line.
PS: nowadays, the recommended way to manage files tends towards lexical filehandles (open my $file, 'name'; print <$file>;).

Because you are comsuming the first line with the <> operator and then using it again in the print, so the first line has already gone but you are not printing it. <> is the readline operator. You need to print the $_ variable, or assign it to a defined variable as you are doing in the second code. You could rewrite the first:
print;
And it would work, because print uses $_ if you don't give it anything.

When used in scalar context, <FH> returns the next single line from the file.
When used in list context, <FH> returns a list of all remaining lines in the file.
while (my $file = <FH>) is a scalar context, since you're assigning to a scalar. while (<FH>) is short for while(defined($_ = <FH>)), so it is also a scalar context. print (<FH>); makes it a list context, since you're using it as argument to a function that can take multiple arguments.
while (<FH>) {
print (<FH>);
}
The while part reads the first line into $_ (which is never used again). Then the print part reads the rest of the lines all at once, then prints them all out again. Then the while condition is checked again, but since there are now no lines left, <FH> returns undef and the loop quits after just one iteration.
while (my $file = <FH>) {
print ("$file");
}
does more what you probably expect: reads and then prints one line during each iteration of the loop.
By the way, print $file; does the same as print ("$file");

while (<FH>) {
print (<FH>);
}
use this instead:
while (<FH>) {
print $_;
}

I want to replace a sequence name in fasta file with another name

I have one fasta file and one text file fasta file contains sequences in fasta format and text file contains name of genes now I want to replace name of the sequences in fasta file after '>' sign with the gene names in text file
I am new to perl though I have written a script but I don't know why its not working can anyone help me on that please
following is my script:
print"Enter annotated file...";
$f1=<STDIN>;
print"Enter sequence file...";
$f2=<STDIN>;
open(FILE1,$f1) || die"Can't open $f1";
#annotfile=<FILE1>;
open(FILE2,$f2) || die"Can't open $f2";
#seqfile=<FILE2>;
#d=split('\t',#annotfile[0]);
for($i=0;$i<scalar(#annotfile);$i++)
{
#curr_all=split('\t',#annotfile[$i]);
#curr_id[$i]=#curr_all[0];
#gene_nm[$i]=#curr_all[1];
}
for($j=0;$j<scalar(#seqfile);$j++)
{
$id=#curr_id[$j];
$gene=#gene_nm[$j];
#seqfile[$j]=~s/$id[$j]/$gene[$j]/g;
print #seqfile[$j];
}
my files looks like following:
annot.txt
pool75_contig_389 ubiquitin ligase e3a
pool75_contig_704 tumor susceptibility
pool75_contig_1977 serine threonine-protein phosphatase 4 catalytic subunit
pool75_contig_3064 bardet-biedl syndrome 2 protein P
pool75_contig_2499 succinyl- ligase
goat300.fasta
goat300.fasta
>pool75_contig_704
CCCTTTCTCCCTTCCCAACATTCAGAGATACTGAATCGAAACTCTTACTGTCTGTTAGAT
GACAAAGAGTTATCCATCCTACATACTCCAATTTCCTTCCGCAACTTGTGATTTCGCCGC
TTGAATCTTGACGCCGTGCGTCCACAGTTTGTTGTGTTTTATCAATCAAGGTCATTATCA
ACCGAAGACGCTATCTATTTTCTTGGCGAAGCTCTCGGAAAGGAGCCATCGAAATGGAAG
TATTTCTCAAGAAAGTCCGCGAGTTATCCCGGAAGCAGTTC
>pool75_contig_389
GACCTATACCGGACCGTCACTGAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ACGATCCAGGCATGGAGTTGTGGTGACGAGTAGGAGGGTCACCGTGGTGAGCGGGAAGCC
TCGGGCGTGAGCCTGGGTGGAGCCGCCACGGGTGCAGATCTTGGTGGTAGTAGCAAATAT
TCAAGTGAGAACCTTGAAGGCCGAGGTGGAGAAGGNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCATTTGTAT
CGCCCGGAAAACGTCACAAGAACGGGAGTTGCGTACAGAA
>pool75_contig_1977
AAGGGACACCGTTGGGTGAGGCGAGCTGCGTTCCTCGAACCATGGCTTCAAAAAGCGACT
TAGACCGTCAGATTGAACAGCTCAGGGCCTGCAAGCTCATTACAGAGGATGAGGTTAAGG
CACTCTGCGCTAAGGCGCGTGAGATTTTAATTGAAGAGAGTAATGTCCAGTGCGTGGACT
CACCTGTCACGGTTTGTGGCGATATCCACGGCCAGTTTTACGACTTGATTGAACTGTTTA
AAGTGGGCGGAGATGTTC
>pool75_contig_3064
TTACTATTTCTGGGCCTTAAGACTGGCTTAGTCGCTTACGACCCTTATAACAATGTAGAT
GTATATTATAAGGATCTTCCTGATGGTGCTAACGCTATGTTAATTTATTCAAACTCACCG
ACAAAGGAACAGAATATGCTTTGGCAGGTGGAAACTGTTCGATAATTGGATTGAACGACG
GCGGATGCGAGGTATTTTGGACAGTCACTGGCGACTCCGTTTGCTCTCTTTGCTCGATTA
AATCCGACAGCGATAAGTCAAGAGATTTTGTGGTTGGCTCTGAAGATTTTGACATCCGAA
TCTTCCATGGGGATGCCATAATATATGAAATCACGGAGTCTGATG
>pool75_contig_2499
AAGAGAAGAGGTGAGTTTGAGTATTGTTTGTGTGTGTGTGGTTGGGTGAGTGTGTGGTAT
GTGGTGTATGTGTGTGATGAATGTATGTGAAAGAGAGTGATGAATCTCATGGATATGTTC
GAGTTCGTGGTTTCCATTGATCGGTTATAGCCGAGATGATGGATGTGTTCCATGTGTCTG
ATTTCAGTTTAGGATTGTGTTGATGATGTTGATGATGAAAATTGTTGATGGTGATGACGA
TAGTGATGATGATGACGATGTTTCGGATAATGGTGATGATGATGATGGTTCCGACGATGA
TGTTTCGCTTGATGATGGTGATAATGATGACTCCGAAAATAACGTTGACTCGGATGAG

Consider using Bio::SeqIO to parse your Fasta dataset, instead of doing it yourself. Bio::SeqIO lives for this task, and is well developed for it. Additionally, if you're in bioinformatics, it would serve you well to get to know Bio::SeqIO. Given this, consider the following:
use strict;
use warnings;
use Bio::SeqIO;
open my $fh, '<', 'annot.txt' or die $!;
my %annot = map { /(\S+)\s+(.+)/; $1 => $2 } <$fh>;
close $fh;
my $in = Bio::SeqIO->new( -file => 'goat300.fasta', -format => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
my $seqID = $annot{ $seq->id } // $seq->id;
print "$seqID\n" . $seq->seq . "\n";
}
Output on your datasets:
tumor susceptibility
CCCTTTCTCCCTTCCCAACATTCAGAGATACTGAATCGAAACTCTTACTGTCTGTTAGATGACAAAGAGTTATCCATCCTACATACTCCAATTTCCTTCCGCAACTTGTGATTTCGCCGCTTGAATCTTGACGCCGTGCGTCCACAGTTTGTTGTGTTTTATCAATCAAGGTCATTATCAACCGAAGACGCTATCTATTTTCTTGGCGAAGCTCTCGGAAAGGAGCCATCGAAATGGAAGTATTTCTCAAGAAAGTCCGCGAGTTATCCCGGAAGCAGTTC
ubiquitin ligase e3a
GACCTATACCGGACCGTCACTGAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACGATCCAGGCATGGAGTTGTGGTGACGAGTAGGAGGGTCACCGTGGTGAGCGGGAAGCCTCGGGCGTGAGCCTGGGTGGAGCCGCCACGGGTGCAGATCTTGGTGGTAGTAGCAAATATTCAAGTGAGAACCTTGAAGGCCGAGGTGGAGAAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCATTTGTATCGCCCGGAAAACGTCACAAGAACGGGAGTTGCGTACAGAA
serine threonine-protein phosphatase 4 catalytic subunit
AAGGGACACCGTTGGGTGAGGCGAGCTGCGTTCCTCGAACCATGGCTTCAAAAAGCGACTTAGACCGTCAGATTGAACAGCTCAGGGCCTGCAAGCTCATTACAGAGGATGAGGTTAAGGCACTCTGCGCTAAGGCGCGTGAGATTTTAATTGAAGAGAGTAATGTCCAGTGCGTGGACTCACCTGTCACGGTTTGTGGCGATATCCACGGCCAGTTTTACGACTTGATTGAACTGTTTAAAGTGGGCGGAGATGTTC
bardet-biedl syndrome 2 protein P
TTACTATTTCTGGGCCTTAAGACTGGCTTAGTCGCTTACGACCCTTATAACAATGTAGATGTATATTATAAGGATCTTCCTGATGGTGCTAACGCTATGTTAATTTATTCAAACTCACCGACAAAGGAACAGAATATGCTTTGGCAGGTGGAAACTGTTCGATAATTGGATTGAACGACGGCGGATGCGAGGTATTTTGGACAGTCACTGGCGACTCCGTTTGCTCTCTTTGCTCGATTAAATCCGACAGCGATAAGTCAAGAGATTTTGTGGTTGGCTCTGAAGATTTTGACATCCGAATCTTCCATGGGGATGCCATAATATATGAAATCACGGAGTCTGATG
succinyl- ligase
AAGAGAAGAGGTGAGTTTGAGTATTGTTTGTGTGTGTGTGGTTGGGTGAGTGTGTGGTATGTGGTGTATGTGTGTGATGAATGTATGTGAAAGAGAGTGATGAATCTCATGGATATGTTCGAGTTCGTGGTTTCCATTGATCGGTTATAGCCGAGATGATGGATGTGTTCCATGTGTCTGATTTCAGTTTAGGATTGTGTTGATGATGTTGATGATGAAAATTGTTGATGGTGATGACGATAGTGATGATGATGACGATGTTTCGGATAATGGTGATGATGATGATGGTTCCGACGATGATGTTTCGCTTGATGATGGTGATAATGATGACTCCGAAAATAACGTTGACTCGGATGAG
The hash %annot is initialized by reading and capturing the contents of your annot.txt data. A Bio::SeqIO object is created using your goat300.fasta file data. The while loop iterates through your fasta sequences. The variable $seqID either takes the associated value of the key in the %annot hash or it keeps the current sequence ID (the // notation means defined or, so that insures $seqID will be defined). Finally, the Fasta record is printed.
Hope this helps!

There were a lot of warnings in your code, and your approach was inefficient. Let me first show you a working Perl program. I'll explain afterwards.
#!/usr/bin/perl
use strict;
use warnings;
# Read the annotations file
print"Enter annotated file...\n";
# my $f1 = <STDIN>;
my $f1 = 'annot.txt';
open(my $fh_annotations, '<', $f1) or die "Can't open $f1";
my #annotfile = <$fh_annotations>;
close $fh_annotations;
# Read the sequence file
print"Enter sequence file...\n";
# my $f2 = <STDIN>;
my $f2 = 'goat300.fasta';
open(my $fh_genes, '<', $f2) or die "Can't open $f2";
my #seqfile = <$fh_genes>;
close $fh_genes;
# Process the annotations data
my %names; # this hash is going to hold the names
foreach my $line (#annotfile) {
chomp $line; # remove newline
my #fields = split /\t/, $line; # split into array
$names{$fields[0]} = $fields[1]; # save in the hash as key->value pair
}
# Process the sequence data
foreach my $line (#seqfile) {
# Look at each line
if ($line =~ m/>(.+)$/) {
# If there is a heading there, remember it...
if (exists $names{$1}) {
# ... check if we know a name for it and replace it in the line
$line =~ s/($1)/$names{$1}/;
}
}
# output the line (this would be done to another filehandle)
print $line;
}
This reads both files and saves them in memory, just like yours did. But instead of trying to build two arrays for the names, I went with a hash, which is a key/value pair. Think of it like an array with names instead of numbers and no particular sorting.
Once these names are set up, I can process the sequence file. I simply look at each line and check if there is a heading there, by looking for the > sign. If it's there (it goes into $1 because of the parenthesis), I look if we have a hash entry (with exists) in our %names hash. If we do, we can replace the heading with the proper name.
After that, we could write it out to a new file. I'm just printing it.
I've used a few other techniques. Unfortunately the literature people get in a BioPerl context is quite outdated. Please take this advice, it will make your live easier.
Always use strict and warnings. They will tell you about problems with your code.
Always declare your variables with my. This is not like other languages, where you need to set up a variable at the top of your problem. You can declare it where you need it. The vars only live in a certain scope, which means between the nearest enclosing { and } brackets, or block.
Use three-argument open and lexical file handles for security. Read more here.
Perl offers foreach as an alternative to the C for loop. In this case, it made things a lot easier.
One more thing about this program: While this example data was rather short, I believe your actual data might be a lot larger. Consider processing the sequence file while you read it so you do not run out of memory. There's no need to save all the lines, unless you want to do something else with them.
open my $fh_out, '>', $filename_out or die $!;
open my $fh_in, '<', $filename_in or die $!;
while (my $line = <$fh_in>) {
# do stuff with the line, like your regex
print $fh_out $line;
}
close $fh_in;
close $fh_out;

Perl - Move Pointer to Start of Line

I have 2 files.
Obfuscated file called input.txt
A second file called mapping.txt consisting of key value pairs.
I want to find every occurrence of the key from mapping.txt in input.txt and replace it with the value corresponding to the key.
Please note that I want to overwrite the contents of the line in input.txt everytime a successful match occurs.
I have written the following code:
#! /usr/bin/perl
use strict;
use warnings;
(my $mapping,my $input)=#ARGV;
open(MAPPING,'<',$mapping) || die("couldn't read from the file, $mapping with error: $!\n");
while(<MAPPING>)
{
chomp $_;
my $line=$_;
(my $key,my $value)=split("=",$line);
open(INPUT,'+<',$input);
while(<INPUT>)
{
chomp $_;
if(index($_,$key)!=-1)
{
$_=~s/\Q$key/$value/g;
# move pointer to beginning of line
print INPUT $_."\n";
}
}
close INPUT;
}
close MAPPING;
Brief Overview of the code:
Opens the mapping.txt file in read mode.
Since each line is a key value pair, it splits it into key and value.
Opens the input.txt file in overwrite mode.
Checks if the key is found in the current line.
If the key is found, then substitute the key with the value ignoring any meta characters in the key (by prefixing \Q)
At this point, the file pointer would be at the end of the line since the previous statement would scan the entire line to find the key and replace it.
If I could move the file pointer to the start of the line, then I can overwrite with:
print INPUT $_,"\n"
I tried looking up the seek function however unable to figure out a way to use it for this purpose.
Once this is done, then the code will close the file. It will pick the next key value pair from mapping.txt and again scan the input file from beginning looking for matches and replacing them.
The most important point is, each time the inner while loop will be operating on the input.txt which was modified in the previous iteration of inner while loop. This way, any successful Find and Replace operations would keep on getting saved in the input.txt file.
How do I do this?
Thanks.

First of all you should use lexical file handles, the three-parameter form of open, and always check the status to make sure that an open has succeeded (as you do with the mapping file but not the input file).
The solution you suggest, of rewinding to the start of the line before using print will not work because you cannot update part of a file unless your replacement data is exactly the same size as the data it is replacing. This will not generally be true in your situation.
There are a number of solutions to this, the first and simplest is to invert the loops and put the read loop for the mapping file inside the read loop for the input file. Your code would look like this:
use strict;
use warnings;
my ($mapping, $input) = #ARGV;
open my $infh, '<', $input or die "Unable to open '$input': $!";
while (my $line = <$input>) {
open my $mapfh, '<', $mapping or die "Unable to open '$mapping': $!";
while (<$mapfh>) {
chomp;
my ($key, $value) = split /=/;
$line =~ s/\Q$key/$value/g;
}
print $line;
}
but your output is sent to STDOUT and you will have to arrange the output to be saved to a file and renamed appropriately.
An alternative here is to use the -I command-line option, which forces a file to be renamed automatically, and a backup saved if required. Using a bare -I will modify the file in-place by deleting the old file and renaming the new output, while giving the parameter a value like -I.bak will rename the old file by appending .bak instead of deleting it. The -I option applies only to files read from ARGV using an empty <> operator, and setting the built-in variable $^I to a value (or to the empty string '') has the same effect. The code looks like this:
use strict;
use warnings;
my $mapping = shift #ARGV;
$^I = '.bak';
while (my $line = <>) {
open my $mapfh, '<', $mapping or die "Unable to open '$mapping': $!";
while (<$mapfh>) {
chomp;
my ($key, $value) = split /=/;
$line =~ s/\Q$key/$value/g;
}
print $line;
}
A third, and neater alternative is to use Tie::File, which maps a Perl array to the file contents and reflects all modifications of the array back to the original file. This is an example:
use strict;
use warnings;
use Tie::File;
my ($mapping, $input) = #ARGV;
tie my #input, 'Tie::File', $input or die "Unable to open '$input': $!";
for my $line (#input) {
open my $mapfh, '<', $mapping or die "Unable to open '$mapping': $!";
while (<$mapfh>) {
chomp;
my ($key, $value) = split /=/;
$line =~ s/\Q$key/$value/g;
}
}
Finally, it is highly inefficient to keep opening and reading the mapping file for every line of input, and it is best to build a regex from its contents and use it throughout the program. This version first builds a hash %mapping from the mapping file and then creates a regex by applying quotemeta to each hash key to escape any regex metacharacters, and then joining them with the regex alternation operator |. The keys are sorted by descending length so that the longest matches are found and replaced in priority over the shorter ones.
use strict;
use warnings;
use Tie::File;
my ($mapping, $input) = #ARGV;
open my $mapfh, '<', $mapping or die "Unable to open '$mapping': $!";
my %mapping = map { chomp; /\S/ ? split /=/ : () } <$mapfh>;
my $regex = join '|', map quotemeta, sort { length $b <=> length $b } keys %mapping;
tie my #input, 'Tie::File', $input or die "Unable to open '$input': $!";
for my $line (#input) {
$line =~ s/($regex)/$mapping{$1}/g;
}

If I could move the file pointer to the start of the line, then I can overwrite with:
print INPUT $_,"\n"
Your premise is wrong: Assuming the byte sequence 00 01 02 and the rule 01 = A1 A2, the resulting byte sequence would be 00 A1 A2 and not 00 A1 A2 02. Ways around this include:
Use the Tie::File module.
Write to another file, and rename the second file to the original, once your pass is complete. This is probably most efficient and scalable.
seeking is not a good idea: You would be constrained to fix-length substitutions, and seek and tell operate on bytes, not characters. If you really have to use in-place editing, you could use this loop:
my $beginning_of_line = tell $fh;
while (<$fh>) {
# do processing
seek $fh, $beginning_of_line, 0;
# do update
} continue {$beginning_of_line = tell $fh}
Also, you make several passes over the input file. Assuming the token sequence a b c and the rules b = d e and d = f, you would produce the sequences a f e c or a d e c depending on the order of the rules! This may not be what you want.
Also, consider the ambiguity between the rules a = c and a b = d over the input a b. Does this produce c b or d?

Using Perl to parse a CSV file from a particular row to the end of the file

am very new to Perl and need your help
I have a CSV file xyz.csv with contents:
here level1 and er values are strings names...not numbers...
level1,er
level2,er2
level3,er3
level4,er4
I parse this CSV file using the script below and pass the fields to an array in the first run
open(my $d, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$d>) {
chomp $line;
my #data = split "," , $line;
#XYX = ( [ "$data[0]", "$data[1]" ], );
}
For the second run I take an input from a command prompt and store in variable $val. My program should parse the CSV file from the value stored in variable until it reaches the end of the file
For example
I input level2 so I need a script to parse from the second line to the end of the CSV file, ignoring the values before level2 in the file, and pass these values (level2 to level4) to the #XYX = (["$data[1]","$data[1]"],);}
level2,er2
level3,er3
level4,er4
I input level3 so I need a script to parse from the third line to the end of the CSV file, ignoring the values before level3 in the file, and pass these values (level3 and level4) to the #XYX = (["$data[0]","$data[1]"],);}
level3,er3
level4,er4
How do I achieve that? Please do give your valuable suggestions. I appreciate your help

As long as you are certain that there are never any commas in the data you should be OK using split. But even so it would be wise to limit the split to two fields, so that you get everything up to the first comma and everything after it
There are a few issues with your code. First of all I hope you are putting use strict and use warnings at the top of all your Perl programs. That simple measure will catch many trivial problems that you could otherwise overlook, and so it is especially important before you ask for help with your code
It isn't commonly known, but putting a newline "\n" at the end of your die string prevent Perl from giving file and line number details in the output of where the error occurred. While this may be what you want, it is usually more helpful to be given the extra information
Your variable names are verly unhelpful, and by convention Perl variables consist of lower-case alphanumerics and underscores. Names like #XYX and $W don't help me understand your code at all!
Rather than splitting to an array, it looks like you would be better off putting the two fields into two scalar variables to avoid all that indexing. And I am not sure what you intend by #XYX = (["$data[1]","$data[1]"],). First of all do you really mean to use $data[1] twice? Secondly, your should never put scalar variables inside double quotes, as it does something very specific, and unless you know what that is you should avoid it. Finally, did you mean to push an anonymous array onto #XYX each time around the loop? Otherwise the contents of the array will be overwritten each time a line is read from the file, and the earlier data will be lost
This program uses a regular expression to extract $level_num from the first field. All it does it find the first sequence of digits in the string, which can then be compared to the minimum required level $min_level to decide whether a line from the log is relevant
use strict;
use warnings;
my $file = 'xyz.csv';
my $min_level = 3;
my #list;
open my $fh, '<', $file or die "Could not open '$file' $!";
while (my $line = <$fh>) {
chomp $line;
my ($level, $error) = split ',', $line, 2;
my ($level_num) = $level =~ /(\d+)/;
next unless $level_num >= $min_level;
push #list, [ $level, $error ];
}

For deciding which records to process you can use the "flip-flop" operator (..) along these lines.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $level = shift || 'level1';
while (<DATA>) {
if (/^\Q$level,/ .. 0) {
print;
}
}
__DATA__
level1,er
level2,er2
level3,er3
level4,er4
The flip-flop operator returns false until its first operand is true. At that point it returns false until its second operand is true; at which point it returns false again.
I'm assuming that your file is ordered so that once you start to process it, you never want to stop. That means that the first operand to the flip-flop can be /^\Q$level,/ (match the string $level at the start of the line) and the second operand can just be zero (as we never want it to stop processing).
I'd also strongly recommend not parsing CSV records using split /,/. That may work on your current data but, in general, the fields in a CSV file are allowed to contain embedded commas which will break this approach. Instead, have a look at Text::CSV or Text::ParseWords (which is included with the standard Perl distribution).
Update: I seem to have got a couple of downvotes on this. It would be great if people would take the time to explain why.

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my #XYZ;
my $file = 'xyz.csv';
open my $fh, '<', $file or die "$file: $!\n";
my $level = shift; # get level from commandline
my $getall = not defined $level; # true if level not given on commandline
my $parser = Text::CSV->new({ binary => 1 }); # object for parsing lines of CSV
while (my $row = $parser->getline($fh)) # $row is an array reference containing cells from a line of CSV
{
if ($getall # if level was not given on commandline, then put all rows into #XYZ
or # if level *was* given on commandline, then...
$row->[0] eq $level .. 0 # ...wait until the first cell in a row equals $level, then put that row and all subsequent rows into #XYZ
)
{
push #XYZ, $row;
}
}
close $fh;

#!/usr/bin/perl
use strict;
use warnings;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my $level = shift ||"level1";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if($fields[0] eq $level .. 0){
print "\n$fields[0]\n";
print "$fields[1]\n";
}}
This worked....thanks ALL for your help...

Reading Input File and Putting It Into An Array with Space Delimiter in Perl

I am trying to take an input file with #ARGV array and write it's all elements to an Array with delimiter space in perl.
sample input parameter is a txt file for example:
0145 2145
4578 47896
45 78841
1249 24873
(there are multiple of lines but here it is not shown)
the problem is:
I do not know how to take for example ARG[0] to an array
I want to take every string of that inputfile as single strings in other words the line1 0145 2145 will not be a string it will be two distinct string by delimiting with space.

I think this is what you want. The #resultarray in this code will end up holding a list of digits.
If you're giving your program a file name to use as input from the command line, take it off the ARGV array and open it as a filehandle:
my $filename = $ARGV[0];
open(my $filehandle, '<', $filename) or die "Could not open $filename\n";
Once you have the filehandle you can loop over each line of the file using a while loop. chomp takes that new line character off the end of each line. Use split to split each line up based on whitespace. This returns an array (#linearray in my code) containing a list of the numbers within that line. I then push my line array on to the end of my #resultarray.
my #resultarray;
while(my $line = <$filehandle>){
chomp $line;
my #linearray = split(" ", $line);
push(#resultarray, #linearray);
}
And remember to add
use warnings;
use strict;
at the top of your perl file to help you if you get any problems in your code.
Just to clarify how you can deal with different inputs to your program. The following command:
perlfile.pl < inputfile.txt
Will take the contents of inputfile.txt and pipe it to the STDIN filehandle. You can then use this filehandle to access the contents of inputfile.txt
while(my $line = <STDIN>){
# do something with this $line
}
But you can also give your program a number of file names to read by placing them after the execution command:
perlfile.pl inputfile1.txt inputfile2.txt
These file names will be read as strings and placed into the #ARGV array so that the array will look like this:
#ARGV:
[0] => "inputfile1.txt"
[1] => "inputfile2.txt"
Since these are just the names of files, you need to open the file in perl before you access the file's contents. So for inputfile1.txt:
my $filename1 = shift(#ARGV);
open(my $filehandle1, '<', $filename1) or die "Can't open $filename1";
while(my $line = <$filehandle1>){
# do something with this line
}
Note how I've used shift this time to get the next element within the array #ARGV. See the perldoc for more details on shift. And also more information on open.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

open multiline text file in perl - perl

Related

reading and printing a text file in Perl

I want to replace a sequence name in fasta file with another name

Perl - Move Pointer to Start of Line

Using Perl to parse a CSV file from a particular row to the end of the file

Reading Input File and Putting It Into An Array with Space Delimiter in Perl

Categories

Resources