How do i pattern match and keep writing to new file until another pattern match - perl

My goal is to find and print all the lines in a "big.v" file starting from pattern match "module" until "endmodule" into individual files.
big.v: module test;
<bunch of code>
endmodule
module foo;
<bunch of code>
endmodule
And the individual files would look like:
test.v : module test;
..
endmodule
foo.v: module test1;
..
endmodule
I got most of it working using:
use strict;
use warnings;
#open(my $fh, ">", $f1) || die "Couldn't open '".$f."' for writing because: ".$!;
while (<>) {
my $line = $_;
if ($line =~ /(module)(\s+)(\w+)(.*)/) {
my $modname = $3;
open(my $fh1, ">", $modname.".v") ;
print $fh1 $line."\n";
## how do i keep writing next lines to this file until following pattern
if ($line =~ /(endmodule)(\s+)(.*)/) { close $fh1;}
}
}
Thanks,

There's a useful perl construct called the 'range operator':
http://perldoc.perl.org/perlop.html#Range-Operators
It works like this:
while ( <$file> ) {
if ( m/startpattern/ .. m/endpattern/ ) {
print;
}
}
So given your example - I think this should do the trick:
my $output;
while ( my $line = <STDIN> ) {
if ( $line =~ m/module/ .. m/endmodule/ ) {
my ( $modname ) = ( $line =~ m/module\s+(\w+)/ );
if ( defined $modname) {
open ( $output, ">", "$modname.v" ) or warn $!;
}
print {$output} $line;
}
}
Edit: But given your source data - you don't actually need to use a range operator I don't think. You could just close/reopen new 'output' files as you go. This assumes that you could 'cut up' your file based on 'module' lines, which isn't necessarily a valid assumption.
But sort of more like this:
use strict;
use warnings;
open ( my $input, "<", "big.v" ) or die $!;
my $output;
while ( my $line = <$input> ) {
if ( $line =~ m/^\s*module/ ) {
#start of module line found
#close filehandle if it's open
close($output) if defined $output;
#extract the module name from the line.
my ($modulename) = ( $line =~ m/module\s+(\w+)/ );
#open new output file (overwriting)
open( $output, ">", "$modulename.v" ) or warn $!;
}
#this test might not be necessary.
if ( defined $output ) {
print {$output} $line;
}
}

Related

Duplicate values in column

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}
I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile
You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Merge two files based on the starting of the line

I want to merge two files into one using perl. Below are the sample files.
***FILE 1***
XDC123
XDC456
XDC678
BB987
BB654
*** FILE 2 ***
XDC876
XDC234
XDC789
BB456
BB678
And I want the merged file to look like:
***MERGED FILE***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB987
BB654
BB456
BB678
For the above functionality I have written the below perl script snippet:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
open( FILEONE, '<$file1' );
open( FILETWO, '<$file2' );
open( FILETHREE, '>$file3' );
while (<FILEONE>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILETWO>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILEONE>) {
if (/^BB/) {
print FILETHREE;
}
}
while (<FILETWO>) {
if (/^BB/) {
print FILETHREE;
}
}
close($file1);
close($file2);
close($file3);
But the merged file that is generated from the above code looks like:
***FILE 3***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB654
BB678
The first line that starts from BB is missed out from both the files. Any help on this will be appreciated. Thank you.
The problem is, you iterate each file to the end, but never 'rewind' for if you're wanting to start over.
So your while ( <FILEONE> ) { line consumes (and discards) the first line that matches m/^BB/ - the last exits the "while" loop, but only after it's already read the line.
However that's assuming you get your open statements right, because:
open( FILEONE, '>$file1' );
Actually empties it, it doesn't read from it. So I am assuming you've transposed your code, and introduced new errors whilst doing so.
As a style point - you should really use 3 argument open, with lexical filehandles.
So instead:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my #lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
push( #lines, <$input> );
close($input);
}
open( my $output, '>', $file3 ) or die $!;
print {$output} sort #lines;
close($output)
(Although as noted in the comments - if that's all you want to do, the unix sort utility is probably sufficient).
However, if you need to preserve the numeric ordering, whilst sorting on the alphabetical, you need a slightly different data structure:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my %lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
while ( my $line = <$file> ) {
my ( $key ) = $line =~ m/^(\D+)/;
push %{$lines{$key}}, $line;
}
close($input);
}
open( my $output, '>', $file3 ) or die $!;
foreach my $key ( sort keys %lines ) {
print {$output} #{$lines{$key}};
}
close($output)

Correct use of Perl "exists"

I have two files. The first two columns in both are chromosome loci and genotypes, for instance chr1:1736464585 and T/G.
I have put the first two columns into a hash. I want to check whether the hash key (the chromosome locus) exists in the second file.
I have written this Perl program and have tried many variations but I'm not sure if I'm using exists correctly: it gives the error exists is not an HASH or ARRAY element or a subroutine.
#!/usr/bin/perl
use strict;
use warnings;
my $output = "annotated.txt";
open( O, ">>$output" );
my $filename = "datatest.txt";
my $filename2 = "MP2.txt";
chomp $filename;
chomp $filename2;
my %hash1 = ();
open( FN1, $filename ) or die "Can't open $filename: $!";
my #lines = <FN1>;
foreach my $line (#lines) {
my #split = split /\t/, $line;
if ( $line =~ /^chr/ ) {
my ( $key, $value ) = ( $split[0], $split[1] );
$hash1{$key} = $value;
}
}
my $DATA;
open( $DATA, $filename2 ) or die $!;
my #lines2 = <$DATA>;
foreach my $line2 (#lines2) {
my #split2 = split /\t/, $line2;
if ( $line2 =~ /^chr/ ) {
if ( exists %hash1{$key} ) {
print "$line2\n";
}
}
}
The syntax of the following line is incorrect:
if (exists %hash1{$key}) { ... }
This should be:
if (exists $hash1{$key}) { ... }

match string in file and replacement with other string

I have a file containing lines as follows
#comments abc
#comments xyz
SerialPort=100
Baudrate=9600
Parity=2
Databits=8
Stopbits=1
also I have array #in = ( SerialPort=500 , Baudrate=300, parity=0, Databits=16, Stopbits=0 ),these array elements read from browser, I am trying to write perl script to match "SerialPort" in file and replace SerialPort=100 in file with SerialPort=500 of array, I want match all other elments in loop I tried code not working please improve the code which is below, I think regular expression is not working and each time if condition to match and substitution resulting false, and also when I look at file after execution of script file consists of duplicates.
#!/usr/bin/perl
$old_file = "/home/work/conf";
open (fd_old, "<", $old_file) || die "cant open file";
#read_file = <fd_old>;
close (fd_old);
#temp = ();
$flag = 0;
foreach $infile ( #read_file )
{
foreach $rr ( #in )
{
($key, $value ) = split(/=/, $rr );
if ( $infile =~ s/\b$key\b(.*)/$rr/ )
{
push ( #temp , $infile );
$flag = 0;
}
else
{
$flag = 1;
}
}
if ( $flag )
{
push (#temp, $infile );
}
}
open ( fd, ">", $old_file ) || die "can't open";
print fd #temp;
close(fd);
Perl 101: use strict; use warnings;.
Prefix variable names with $.
$old_file is undef when you try to open it.
And spell falg correctly, which if you'd turned on those options, you'd have been told about.
Also: When asking questions on SO, it's helpful if you point out what's not working.
#Maruti: Never write a perl program without use strict; and use warnings;. I have modified your code. Just have a look.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $old_file = "/home/work/conf";
open (my $fh, "<", $old_file) || die "cant open file";
my #read_file = <$fh>;
close ($fh);
my #temp = ();
my #in = ('SerialPort=500' , 'Baudrate=300', 'parity=0', 'Databits=16', 'Stopbits=0');
foreach my $infile ( #read_file )
{
foreach my $rr ( #in )
{
my ($key, $value) = split(/=/, $rr );
if ( $infile =~ m/\b$key\b\=\d+/ && $infile =~ /#.*/)
{
$infile =~ s/\b$key\b\=\d+/$rr/ig;
}
}
push (#temp, $infile );
}
open (my $out, ">", $old_file ) || die "can't open";
foreach my $res(#temp)
{
print $out $res;
}
close($out);

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";
Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.
print "$1" if $line =~ /(\S+)_NN/;
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";