Parse UTF-8 HTML to CSV Ascii using Perl

Parse UTF-8 HTML to CSV Ascii using Perl - perl

First off I am a little new to this, so the answer may be that it is up to the consumer, however, I have the following code:
#!/usr/bin/perl
open(RESPONSE,"response.xml")
$result ="";
while(<RESPONSE>){
next unless $. > 1
$line = $_
$line =~ "<html><body>";
$line =~ "</body></html>";
$result .= $line
}
print "$result";
exit 0;
But this still outputs \n and \r\n explicitly. I tried adding the following...
use Encode
...
$final = decode_utf8($result);
print "$final";
But I still see the chars when I open up the doc generated by this shell command....
perl parse.pl > "outfile.csv"
So for example
<html><body>test,a\r\ntest2,b<body></html>
Stays as test,a\r\ntest2,b in the csv
Thanks!

If you want to parse HTML or XML then use an HTML or XML parser. If you want to create a CSV file then use a CSV file module.
This problem has nothing at all do to with the differences between Unicode and ASCII.

Related

Perl Substitute String in Text File [duplicate]

I want to replace the word "blue" with "red" in all text files named as 1_classification.dat, 2_classification.dat and so on. I want to edit the same file so I tried the following code, but it does not work. Where am I going wrong?
#files = glob("*_classification.dat");
foreach my $file (#files)
{
open(IN,$file) or die $!;
<IN>;
while(<IN>)
{
$_ = '~s/blue/red/g';
print IN $file;
}
close(IN)
}

Use a one-liner:
$ perl -pi.bak -e 's/blue/red/g' *_classification.dat
Explanation
-p processes, then prints <> line by line
-i activates in-place editing. Files are backed up using the .bak extension
The regex substitution acts on the implicit variable, which are the contents of the file, line-by-line

None of the existing answers here have provided a complete example of how to do this from within a script (not a one-liner). Here is what I did:
rename($file, $file . '.bak');
open(IN, '<' . $file . '.bak') or die $!;
open(OUT, '>' . $file) or die $!;
while(<IN>)
{
$_ =~ s/blue/red/g;
print OUT $_;
}
close(IN);
close(OUT);

$_='~s/blue/red/g';
Uh, what??
Just
s/blue/red/g;
or, if you insist on using a variable (which is not necessary when using $_, but I just want to show the right syntax):
$_ =~ s/blue/red/g;

It can be done using a single line:
perl -pi.back -e 's/oldString/newString/g;' inputFileName
Pay attention that oldString is processed as a Regular Expression.
In case the string contains any of {}[]()^$.|*+? (The special characters for Regular Expression syntax) make sure to escape them unless you want it to be processed as a regular expression.
Escaping it is done by \, so \[.

Perl LWP::Simple File.txt in Array not spaces

The file does not have spaces and do i need to keep every word in the corresponding array,
content in var, the file is more large, but this is ok.
my $file = "http://www.ausa.com.ar/autopista/carteleria/plano/mime.txt";
&VPM4362=008000&VPM4381=FFFFFF&VPM4372=FFFFFF&VPM4391=008000&VPM4382=FFFF00&VPM4392=FF0000&VPM4182=FFFFFF&VPM4181=FFFF00&VPM4402=FFFFFF&VPM4401=FFFF00&VPM4412=008000&VPM4411=FF0000&VPM4422=FFFFFF&VPM4421=FFFFFF&VPM4322=FFFF00&CPMV001_1_Ico=112&CPMV001_1_1=AHORRE 15%&CPMV001_1_2=ADHIERASE AUPASS&CPMV001_1_3=AUPASS.COM.AR&CPMV002_1_Ico=0&CPMV002_1_1=ATENCION&CPMV002_1_2=RADARES&CPMV002_1_3=OPERANDO&CPMV003_1_Ico=0&CPMV003_1_1=ATENCION&CPMV003_1_2=RADARES&CPMV003_1_3=OPERANDO&CPMV004_1_Ico=255&CPMV004_1_1= &CPMV004_1_2=&CPMV004_1_3=&CPMV05 _1_Ico=0&CPMV05 _1_1=ATENCION&CPMV05 _1_2=RADARES&CPMV05 _1_3=OPERANDO&CPMV006_1_Ico=0&CPMV006_1_1=ATENCION&CPMV006_1_2=RADARES&CPMV006_1_3=OPERANDO&CPMV007_1_Ico=0&CPMV007_1_1=ATENCION&CPMV007_1_2=RADARES&CPMV007_1_3=OPERANDO&CPMV08 _1_Ico=0&CPMV08 _1_1=ATENCION&CPMV08
the code.
#!/bash/perl .T
use strict;
use warnings;
use LWP::Simple;
my $file = "http://www.ausa.com.ar/autopista/carteleria/plano/mime.txt";
my $mime = get($file);
my #new;
foreach my $line ($mime) {
$line =~ s/&/ /g;
push(#new, $line);
}
print "$new[0]\n";
Try this way but when I start the array is equal to (all together)
the output I need
print "$new[1]\n";
VPM4381=FFFFFF

You don't want to substitute on &, you want to split on &.
#new = split /&/, $line;

Perl - customize log files for displaying only specific errors

I am new to perl, we have a log file similar to below:
SQL> #D:\Luntbuild_Testing\ASCMPK\Files\MAIN\DATABASE\HOST\FILES\DDL\20120412_152632__1_CLTM_EVENT_ACC_ROLE_BLOCK.DDL
SQL> CREATE TABLE CLTM_EVENT_ACC_ROLE_BLOCK
2 (
3 EVENT_CODE VARCHAR2(4) ,
4 ACC_ROLE VARCHAR2(20)
5 )
6 ;
CREATE TABLE CLTM_EVENT_ACC_ROLE_BLOCK
*
ERROR at line 1:
ORA-00955: name is already used by an existing object
SQL> #D:\Luntbuild_Testing\ASCMPK\Files\MAIN\DATABASE\HOST\FILES\DDL\20120412_173845__2_CLTM_EVENT_ACC_ROLE_BLOCK.DDL
SQL> DROP TABLE CLTM_EVENT_ACC_ROLE_BLOCK;
Table dropped.
Now I need a script to display only the script paths that have ORA-XXX errors, script should display only the path of the SQL> #D:\Luntbuild_Testing\ associated with ORA-xxx errors, I have tried below can you please help me to enhance the same.
$file = 'c:\data.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /> #/; #here i want the output to display the path of the script with only ORA-xxx errors and ignore if there are no errors
print "$line" if $line =~ /ORA-/;
}
close(txt);

Instead of immediately printing the line when you see the > # marker, store it in a variable, and only print it out if and when you actually see an error:
$file = 'c:\data.txt';
open(txt, $file);
while($line = <txt>) {
$fn = $line if $line =~ /> #/; #here i want the output to display the path of the script with only ORA-xxx errors and ignore if there are no errors
print $fn, $line if $line =~ /ORA-/;
}
close(txt);
Also: it's good practice to write use strict; and use warnings; at the top of your script. use strict; forces you to explicitly name your local variables with my, which catches a lot of errors due to misspellings.

I would do something pretty similar to what you tried:
$file = 'c:\data.txt';
open(F, $file);
my $last_cmd = '';
while (<F>) {
$last_cmd = $_ if /^SQL\> \#D:/;
print $last_cmd if /^ORA-/;
}

This is a Perl script to grep a pattern entered from docx files. Please anybody, please point out my mistakes to make it work?

#!usr/bin/perl
#script: patternsearch.pl : Program to search for specific pattern inside the file.
print ("Prgramme name: $0 \n");
print ("Enter pattern: \n");
chop ($pattern = <STDIN>);
print ("Enter the absolute folder path: \n");
chop ($folder = <STDIN>);
print ("Enter file type: \n");
chop ($filetype = <STDIN>);
die ("pattern not entered??? \n") if ($pattern eq " ");
if ($filetype eq "txt") {
foreach $search (`find $folder -type f -name "*.$filetype"`) {
do `grep -H $pattern $search>> patternsearch.txt`;
}
}
else {
foreach $search (`find $folder -type f -name "*.$filetype"`) {
do `antiword $search | grep -H $pattern >> patternsearch.txt`;
}
}
print ("Taskcompleted \n");

*.docx files are not plain text or even actually XML -- they're zipped bundles of XML and other stuff. You can't grep for text in the zipped file. You could unzip a *.docx, and then grep in the contents -- although in my experience the XML is written without line breaks, such that each grep hit would be the entire contents of the document.

You really should
use strict;
use warnings;
at the start of every program, and declare all you variables with my at the point of first use. This applies especially if you are asking for help with your program, and will quickly draw attention to a lot of simple mistakes.
You ought to use chomp instead of chop, as the latter just removes the last character from a string whereas the former checks to see if it is a line terminator (newline) before it removes it.
The only problems I can find is that you don't chomp the output from your backtick find commands: you should write chomp $search before the grep or antiword commands. Also (to paraphrase Yoda) there is no do before a backticks command. Remove that from before grep and antiword and your program may work.
If you have any further problems, please explain what output you expect, and what you are getting.

How can I append characters to a line in a file?

I have a CSV file that was extracted from a ticketing system (I have no direct DB access to) and need to append a couple of columns to this from another database before creating reports off of it in Excel.
I'm using Perl to pull data out of the other database and would like to just append the additional columns to the end of each line as I process the file.
Is there a way to do this without having to basically create a new file? The basic structure is:
foreach $line (#lines) {
my ($vars here....) = split (',',$line);
## get additional fields
## append new column data to line
}

You could look at DBD::CSV to treat the file as if it were a database (which would also handle escaping special characters for you).

You can use Tie::File (in the Perl core since Perl 5.8) to modify a file in place:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
my $file = shift;
tie my #lines, "Tie::File", $file
or die "could not open $file: $!\n";
for my $line (#lines) {
$line .= join ",", '', get_data();
}
sub get_data {
my $data = <DATA>;
chomp $data;
return split /-/, $data
}
__DATA__
1-2-3-4
5-6-7-8
You can also use in-place-editing with the #ARGV/<> trick by setting $^I:
#!/usr/bin/perl
use strict;
use warnings;
$^I = ".bak";
while (my $line = <>) {
chomp $line;
$line .= join ",", '', get_data();
print "$line\n";
}
sub get_data {
my $data = <DATA>;
chomp $data;
return split /-/, $data
}
__DATA__
1-2-3-4
5-6-7-8

Despite any nice interfaces, you have to eventually read the file line-by-line. You might even have to do more that that if some quoted fields can have embedded newlines. Use something that knows about CSV to avoid some of those problems. Text::CSV_XS should save you most of the hassle of odd cases.

Consider using the -i option to edit <> files in-place.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Parse UTF-8 HTML to CSV Ascii using Perl - perl

If you want to parse HTML or XML then use an HTML or XML parser. If you want to create a CSV file then use a CSV file module. This problem has nothing at all do to with the differences between Unicode and ASCII.

Related

Perl Substitute String in Text File [duplicate]

Perl LWP::Simple File.txt in Array not spaces

Perl - customize log files for displaying only specific errors

This is a Perl script to grep a pattern entered from docx files. Please anybody, please point out my mistakes to make it work?

How can I append characters to a line in a file?

Categories

Resources