How can I count paragraphs in text file using Perl? - perl

I need to create Perl code which allows counting paragraphs in text files. I tried this and doesn't work:
open(READFILE, "<$filename")
or die "could not open file \"$filename\":$!";
$paragraphs = 0;
my($c);
while($c = getc(READFILE))
{
if($C ne"\n")
{
$paragraphs++;
}
}
close(READFILE);
print("Paragraphs: $paragraphs\n");

See perlfaq5: How can I read in a file by paragraphs?
local $/ = ''; # enable paragraph mode
open my $fh, '<', $file or die "can't open $file: $!";
1 while <$fh>;
my $count = $.;

Have a look at the Beginning Perl book at http://www.perl.org/books/beginning-perl/. In particular, the following chapter will help you: http://docs.google.com/viewer?url=http%3A%2F%2Fblob.perl.org%2Fbooks%2Fbeginning-perl%2F3145_Chap06.pdf

If you're determining paragraphs by a double-newline ("\n\n") then this will do it:
open READFILE, "<$filename"
or die "cannot open file `$filename' for reading: $!";
my #paragraphs;
{local $/; #paragraphs = split "\n\n", <READFILE>} # slurp-split
my $num_paragraphs = scalar #paragraphs;
__END__
Otherwise, just change the "\n\n" in the code to use your own paragraph separator. It may even be a good idea to use the pattern \n{2,}, just in case someone went crazy on the enter key.
If you are worried about memory consumption, then you may want to do something like this (sorry for the hard-to-read code):
my $num_paragraphs;
{local $/; $num_paragraphs = #{[ <READFILE> =~ /\n\n/g ]} + 1}
Although, if you want to keep using your own code, you can change if($C ne"\n") to if($c eq "\n").

Related

Recursive search in Perl?

I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.

censored words in perl

im trying to make a censored words script,
i don't know why but my script isn't censoring the words properly.
the censored status is 80% ~
this is my code:
#!/usr/bin/perl -w
use strict;
my #text;
my #cencoredText;
my $file = "blabla\\text.txt";
open(FH, "<", $file) or die "cant open file";
while(<FH>)
{
push(#text,$_);
}
close(FH);
my $cencoredFile = "blabla\\forbidden.txt";
open(FH2, "<", $cencoredFile) or die "cant open file";
while(<FH2>)
{
push(#cencoredText,$_);
}
close(FH2);
for(my $i=0; $i<#cencoredText; $i++)
{
for(my $j=0; $j<#text; $j++)
{
$text[$j] =~ s/${cencoredText[$i]}/censored/g;
}
}
the two files open and the perl script get the info from them..
i don't know whats wrong..
thanks!
To answer your direct question, you need to chomp the newline off of the end of each input line that you read into your two arrays #text and #censoredText:
...
while( <FH> ) {
chomp;
push(#text,$_);
}
close(FH);
my $cencoredFile = "blabla\\forbidden.txt";
open(FH2, "<", $cencoredFile) or die "cant open file";
while(<FH2>) {
chomp;
push(#cencoredText,$_);
}
...
A few points not directly related to what you asked:
Are arrays really the best data structure choice to indicate that a word should be censored?
I am going to say no. One problem is that to identify words that should be censored, you currently loop through each word in #censoredText then for each of those words you loop through each line of #text. If you have N lines of text and M forbidden words then you be an overall complexity of O(N*M) which is not very good as N and M increase. If you used a hash to represent words that should be censored, you could reduce this to O(max(N,M)).
Alternatively, you could construct a pattern with each forbidden word and do a global substitution across your entire input file.

Reading and writing to the same file

I'm using this code I found online to read a properties file in my Perl script:
open (CONFIG, "myfile.properties");
while (CONFIG){
chomp; #no new line
s/#.*//; #no comments
s/^\s+//; #no leading white space
s/\s+$//; #no trailing white space
next unless length;
my ($var, $value) = split (/\s* = \s*/, $_, 2);
$$var = $value;
}
Is it posssible to also write to the text file inside this while loop? Let's say the text file looks like this:
#Some comments
a_variale = 5
a_path = /home/user/path
write_to_this_variable = ""
How can I put some text in write_to_this_variable?
It is not really practical to overwrite text files where you have variable length records (lines). It is normal to copy the file, something like this:
my $filename = 'myfile.properites';
open(my $in, '<', $filename) or die "Unable to open '$filename' for read: $!";
my $newfile = "$filename.new";
open(my $out, '>', $newfile) or die "Unable to open '$newfile' for write: $!";
while (<$in>) {
s/(write_to_this_variable =) ""/$1 "some text"/;
print $out;
}
close $in;
close $out;
rename $newfile,$filename or die "unable to rename '$newfile' to '$filename': $!";
You might have to sanitse the text you are writing with something like \Q if it contains non-alphanumerics.
This is an example of a program that uses the Config::Std module to read an write a simple config file like yours. As far as I know it is the only module that will preserve any comments in the original file.
There are two points to note:
The first hash key in $props{''}{write_to_this_variable} forms the name of the config file section that will contain the value. If there are no sections, as for your file, then you must use an empty string here
If you need quotes around the a value then you must add these explicitly when you are assigning to the hash element, as I do here with '"Some text"'
I think the rest of the program is self-explanatory.
use strict;
use warnings;
use Config::Std { def_sep => ' = ' };
my %props;
read_config 'myfile.properties', %props;
$props{''}{write_to_this_variable} = '"Some text"';
write_config %props;
output
#Some comments
a_variale = 5
a_path = /home/user/path
write_to_this_variable = "Some text"

Read and Write to a file in perl

this
is just
an example.
Lets assume the above is out.txt. I want to read out.txt and write onto the same file.
<Hi >
<this>
<is just>
<an example.>
Modified out.txt.
I want to add tags in the beginning and end of some lines.
As I will be reading the file several times I cannot keep writing it onto a different file each time.
EDIT 1
I tried using "+<" but its giving an output like this :
Hi
this
is just
an example.
<Hi >
<this>
<is just>
<an example.>
**out.txt**
EDIT 2
Code for reference :
open(my $fh, "+<", "out.txt");# or die "cannot open < C:\Users\daanishs\workspace\CCoverage\out.txt: $!";
while(<$fh>)
{
$s1 = "<";
$s2 = $_;
$s3 = ">";
$str = $s1 . $s2 . $s3;
print $fh "$str";
}
The very idea of what you are trying to do is flawed. The file starts as
H i / t h i s / ...
If you were to change it in place, it would look as follows after processing the first line:
< H i > / i s / ...
Notice how you clobbered "th"? You need to make a copy of the file, modify the copy, the replace the original with the copy.
The simplest way is to make this copy in memory.
my $file;
{ # Read the file
open(my $fh, '<', $qfn)
or die "Can't open \"$qfn\": $!\n";
local $/;
$file = <$fh>;
}
# Change the file
$file =~ s/^(.*)\n/<$1>\n/mg;
{ # Save the changes
open(my $fh, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
print($fh $file);
}
If you wanted to use the disk instead:
rename($qfn, "$qfn.old")
or die "Can't rename \"$qfn\": $!\n";
open(my $fh_in, '<', "$qfn.old")
or die "Can't open \"$qfn\": $!\n";
open(my $fh_out, '>', $qfn)
or die "Can't create \"$qfn\": $!\n";
while (<$fh_in>) {
chomp;
$_ = "<$_>";
print($fh_out "$_\n");
}
unlink("$qfn.old");
Using a trick, the above can be simplified to
local #ARGV = $qfn;
local $^I = '';
while (<>) {
chomp;
$_ = "<$_>";
print(ARGV "$_\n");
}
Or as a one-liner:
perl -i -pe'$_ = "<$_>"' file
Read contents in memory and then prepare required string as you write to your file. (SEEK_SET to zero't byte is required.
#!/usr/bin/perl
open(INFILE, "+<in.txt");
#a=<INFILE>;
seek INFILE, 0, SEEK_SET ;
foreach $i(#a)
{
chomp $i;
print INFILE "<".$i.">"."\n";
}
If you are worried about amount of data being read in memory, you will have to create a temporary result file and finally copy the result file to original file.
You could use Tie::File for easy random access to the lines in your file:
use Tie::File;
use strict;
use warnings;
my $filename = "out.txt";
my #array;
tie #array, 'Tie::File', $filename or die "can't tie file \"$filename\": $!";
for my $line (#array) {
$line = "<$line>";
# or $line =~ s/^(.*)$/<$1>/g; # -- whatever modifications you need to do
}
untie #array;
Disclaimer: Of course, this option is only viable if the file is not shared with other processes. Otherwise you could use flock to prevent shared access while you modify the file.
Disclaimer-2 (thanks to ikegami): Don't use this solution if you have to edit big files and are concerned about performance. Most of the performance loss is mitigated for small files (less than 2MB, though this is configurable using the memory arg).
One option is to open the file twice: Open it once read-only, read the data, close it, process it, open it again read-write (no append), write the data, and close it. This is good practice because it minimizes the time you have the file open, in case someone else needs it.
If you only want to open it once, then you can use the +< file type - just use the seek call between reading and writing to return to the beginning of the file. Otherwise, you finish reading, are at the end of the file, and start writing there, which is why you get the behavior you're seeing.
Need to specify
use Fcntl qw(SEEK_SET);
in order to use
seek INFILE, 0, SEEK_SET;
Thanks user1703205 for the example.

How do I read the contents of a small text file into a scalar in Perl?

I have a small text file that I'd like to read into a scalar variable exactly as it is in the file (preserving line separators and other whitespace).
The equivalent in Python would be something like
buffer = ""
try:
file = open("fileName", 'rU')
try:
buffer += file.read()
finally:
file.close()
except IOError:
buffer += "The file could not be opened."
This is for simply redisplaying the contents of the file on a web page, which is why my error message is going into my file buffer.
From the Perl Cookbook:
my $filename = 'file.txt';
open( FILE, '<', $filename ) or die 'Could not open file: ' . $!;
undef $/;
my $whole_file = <FILE>;
I would localize the changes though:
my $whole_file = '';
{
local $/;
$whole_file = <FILE>;
}
As an alternative to what Alex said, you can install the File::Slurp module (cpan -i File::Slurp from the command line) and use this:
use File::Slurp;
# Read data into a variable
my $buffer = read_file("fileName");
# or read data into an array
my #buffer = read_file("fileName");
Note that this dies (well... croaks, but that's just the proper way to call die from a module) on errors, so you may need to run this in an eval block to catch any errors.
If I don't have Slurp or Perl6::Slurp near by then I normally go with....
open my $fh, '<', 'file.txt' or die $!;
my $whole_file = do { local $/; <$fh> };
There is a discussion of the various ways to read a file here.
I don't have enough reputation to comment, so I apologize for making this another post.
# Harold Bamford: $/ should not be an obscure variable to a Perl programmer. A beginner may not know it, but he or she should learn it. The join method is a poor choice for the reasons stated in the article linked by hackingwords above. Here's the relevant quotation from the article:
That needlessly splits the input file into lines (join provides a list context to ) and then joins up those lines again. The original coder of this idiom obviously never read perlvar and learned how to use $/ to allow scalar slurping.
You could do something like:
$data_file="somefile.txt";
open(DAT, $data_file);
#file_data = <DAT>;
close(DAT);
That'll give you the file contents in an array, that you can use for whatever you want, for example, if you wanted each individual line, you could do something like:
foreach $LINE (#file_data)
{
dosomethingwithline($LINE);
}
For a full usage example:
my $result;
$data_file = "somefile.txt";
my $opened = open(DAT, $data_file);
if (!$opened)
{
$result = "Error.";
}
else
{
#lines = <DAT>;
foreach $LINE (#lines)
{
$result .= $LINE;
}
close(DAT);
}
Then you can use $result however you need. Note: This code is untested, but it should give you an idea.
I'd tweak draegtun's answer like this, to make it do exactly what was being asked:
my $buffer;
if ( open my $fh, '<', 'fileName' ) {
$buffer = do { local $/; <$fh> };
close $fh;
} else {
$buffer = 'The file could not be opened.';
}
Just join all lines together into a string:
open(F, $file) or die $!;
my $content = join("", <F>);
close F;
(It was previously suggested to use join "\n" but that will add extra newlines. Each line already has a newline at its end when it's read.)