I'm trying Perl on Mac.
I have to read an RTF text file. the content of the file is "36" (without double quotes). thats it, just two characters.
Here is the code I have to read it.
#!/usr/bin/perl
use strict;
use warnings;
my $file = "verInfo.rtf";
unless(open FILE, $file) {
# Die with error message
# if we can't open it.
die "\nUnable to open $file\n";
}
my $oldversion = <FILE>;
print "conent is $oldversion";
close FILE;
Remember all I want is to read the value 36 from file and store it as a integer in $oldversion
But when I read the file and print it, it prints following
conent is {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360
Im not able to read 36.
You're not reading a text file, you're reading an RTF file. You made the file with TextEdit, right? TextEdit saves things as text/rtf rather than text/plain by default, if you want to save the file as plain text you should use "Format | Make Plain Text" (AKA Shift-Cmd-T) before you save it; then you'll get a simple text file with just your "36" in it.
The text is there:
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360
^^
You have an RTF file. Use an RTF parser
Related
I have a CSV file that I'm parsing using Perl. The file is a BOM produced by Solidworks 2015 that was saved as an XLS file, then opened in Excel and saved as a CSV file.
There are cells that have line breaks. When I read a line with such a cell from the file, the line comes in with the line breaks. For example, here is one of the lines read looks like this:
74,,74,1,1,"SJ-TL303202-DET-074-
001",PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
It reads in as a single line in Perl.
When I turn the Show All Characters in Notepad++, I can see the line breaks are cause by [CR][LF].
So I thought this would work to remove the line feeds:
$line =~ s/[\r\n]+//g;
but it does not.
You don't give much of a sample of your CSV data, but what you show is perfectly valid. A text field may contain newlines if you wish, as long as it is enclosed in double-quotes
The Text::CSV module will process it quite happily as long as you enable the binary option in the constructor call, and you may reformat the data as you wish before you write it back out again
This program expects the path to the input file as a parameter on the command line, and it will write the modified data to STDOUT, which you can redirect on the command line, like this
$ perl fix_csv.pl input.csv > output.csv
I've assumed that your data contains only 7-bit ASCII data, and it should work whether you're running it on a Windows system or on Linux
use strict;
use warnings 'all';
my ($csv_file) = #ARGV;
use Text::CSV;
open my $fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};
my $csv = Text::CSV->new( { binary => 1 } );
while ( my $row = $csv->getline( $fh ) ) {
tr/\r\n//d for #$row;
$csv->combine(#$row);
print $csv->string, "\n";
}
output
74,,74,1,1,SJ-TL303202-DET-074-001,PDSI,"2.25"" DIA. X 8.00""",A2,513,1,
i am having following code for extracting the text from the html files and writing to a text file. in html it contain kannada text(utf-8) when programs runs i am getting a text file in that i am getting text but its not in proper formate. text is in unreadable formate
enter code here
use utf8;
use HTML::FormatText;
my $string = HTML::FormatText->format_file(
'a.html',
leftmargin => 0, rightmargin => 50
);
open mm,">t1.txt";
print mm "$string";
so please do help me.how to handle the file formates while we are processing it.
If I understand you correctly, you want the output file to be UTF-8 encoded so that the characters from the Kannada language are encoded in the output correctly. Your code is probably trying (and failing) to encode incorrectly into ISO-8859-1 instead.
If so, then what you can do is make sure your file is opened with a UTF-8 encoding filter.
use HTML::FormatText;
open my $htmlfh, '<:encoding(UTF-8)', 'a.html' or die "cannot open a.html: $!";
my $content = do { local $/; <$htmlfh> }; # read all content from file
close $htmlfh;
my $string = HTML::FormatText->format_string(
$content,
leftmargin => 0, rightmargin => 50
);
open my $mm, '>:encoding(UTF-8)', 't1.txt' or die "cannot open t1.txt: $!";
print $mm $string;
For further reading, I recommend checking out these docs:
perlunitut
perlunifaq
perlunicode
A few other notes:
The use utf8 line only makes it so that your Perl script/library may contain UTF formatting. It does not make any changes to how you read or write files.
Avoid using two-argument forms of open() like in your example. It may allow a malicious user to compromise your system in certain cases. (Though, your usage in this example happens to safe.
When opening a file, you need to add an or die afterwards or failures to read or write the file will be silently ignored.
Update 3/12: I changed it to read the file in UTF-8 and send that to HTML::FormatText. If your a.html file is saved with a BOM character at the start, it may have done the right thing anyway, but this should make it always assume UTF-8 for the incoming file.
I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).
I downloaded a csv file using Net::FTP. When I look at this file in text editor or excel or even when I cut/paste it has line breaks and looks like this:
000000000G911|06
0000000000CDR|25|123
0000000000EGP|19
When I read the file in Perl it sees the entire text as one line like this:
000000000G911|060000000000CDR|25|1230000000000EGP|19
I have tried reading it using
tie #lines, 'Tie::File', "C:/Programs/myfile.csv", autochomp=>0 or die "Can't read file: $!\n";
foreach $l (#lines1)
{print "$l\n";
}
and
open FILE, "`<`$filename" or die $!;
my #lines=`<`FILE>;
foreach $l (#lines)
{print "$l\n";
}
close FILE;
The file has line breaks in a format that Perl is not recognizing because it is coming from a different operating system. The other programs are automatically detecting the different line break format, but Perl doesn't do that.
If you have Net::FTP perform the transfer in ASCII mode (e.g. $ftp->ascii to enable this mode), this should be taken care of and corrected for you.
Alternatively, you can figure out what is being used for line breaks and then set the special $/ variable to that value.
I'm writing out a CSV file using Perl. The data going into the CSV contains Unicode characters. I'm using the following to write the CSV out:
#OPEN THE FILE FOR WRITE
open(my $fh, ">:utf8", "rpt-".$datestring.".csv")
or die "cannot open < rpt.csv: $!";
That is writing the characters correctly inside the file but doesn't appear to be including the UTF8 Byte Order Mark. This in turn throws off my users trying to open the file in Excel. Is there a way to force the Byte Order Mark to be written?
I attempted it the following way:
print $fh "\x{EFBBBF};
I ended up with gibberish at the top of the file. Any help would be greatly appreciated.
Try doing this:
print $fh chr(65279);
after opening the file.