how to remove comments syntax only? - perl

I want collect all tags in from XML file. How can I remove comments syntax only?
XML File:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<!--<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref>--></given-names>
</name>
</contrib>
</contrib-group>
</xml>
I need output as:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref></given-names>
</name>
</contrib>
</contrib-group>
</xml>
How can I remove comments.. without remove contains?
script:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open(my $output , '>', "split.xml") || die "can't open the Output $!\n";
my $xml = XML::Twig->new( twig_handlers => { xref => sub{comments => 'drop'} } );
$xml->parsefile("sample.xml");
$xml->print($output);
I can't do it... How can I remove <!-- --> only without remove contain?

#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open my $output , '>', 'split.xml' or die "Can't open: $!\n";
my $xml = XML::Twig->new( comments => 'process', # Turn on comment processing
twig_handlers =>
{ '#COMMENT' => \&uncomment }
);
$xml->parsefile('sample.xml');
$xml->print($output);
sub uncomment {
my ($xml, $comment) = #_;
$comment->set_outer_xml($comment->text); # Replace the comment with its contents.
}

Related

In perl I want change the tag content(firstclip) to secondclip in my existing xml file

<?xml version="1.0"?>
<root>
<Arguments>
<apkName>Player
<testUseCase>PlayVideo</testUseCase>
<id>1</id>
<clipName>firstclip</clipName>
</apkName>
</Arguments>
</root>
I tried this code: but its not working and its keeping player name in new tag with name with content and order also changing ..
use XML::Simple;
my $xml_file = "test.xml";
my $xml = XMLin(
$xml_file,
KeepRoot => 1,
ForceArray => 1,
)
$xml->{root}->[0]->{Arguments}->[0]->{apkName}->[0]->{clipName}->[0] = 'secondclip';
XMLout(
$xml,
XMLDecl =>1,
KeepRoot => 1 ,
NoAttr => 1,
OutputFile => $xml_file,
);
Don't use XML::Simple:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
#parse your XML.
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
#search for, and modify 'clipName' nodes containing the text 'firstclip'.
$_ -> set_text('secondclip') for $twig -> findnodes('//clipName[string()="firstclip"]');
$twig -> set_pretty_print('indented');
$twig -> print;
Although are you sure apkName actually looks like that? It seem odd that the 'close' tag would be where it is.
To rewrite your existing file - XML::Twig has a parsefile_inplace mechanism, but I'd suggest it's overly complicated for what you're trying to do, and instead you just want to
open ( my $output, '>', 'output.new.xml' ) or die $!;
print {$output} $twig -> sprint;

XML::Twig - Inplace editing

I'm trying to append timestamp at the end of xdp file. I am using XML::Twig. On running the script timestamp (<testing>4619314911532861</testing>) gets added at the end but the output is coming on STDOUT instead of testdata.xdp. What am I missing?
Code:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig=XML::Twig->new(pretty_print => 'indented');
my $file = 'testdata.xdp';
$twig->parsefile_inplace($file, '.bak');
my $root= $twig->root;
my #children= $root->children;
foreach my $child (#children){
my $eblg= new XML::Twig::Elt( 'testing', localtime);
$eblg->paste( 'last_child', $child);
}
$twig->flush;
The problem here is - parsefile_inplace works as a standalone thing. It replaces the source file immediately after the parse operation is complete.
So to use it like that, you need to do your 'work' within twig_handlers. If you do this, it'll parse/modify/overwrite.
E.g.:
sub insert_after_all {
my ( $twig, $element ) = #_;
my $eblg= new XML::Twig::Elt( 'testing', localtime);
$eblg->paste( 'last_child', $element);
$twig -> flush;
}
my $twig = XML::Twig->new(pretty_print => 'indented',
twig_handlers => { '_all_' => \&insert_after_all } );
my $file = 'testdata.xdp';
$twig->parsefile_inplace($file, '.bak');
Otherwise - rename the source, and print {$new_fh} $twig -> sprint;

How to get a POD section into a string?

How to get a POD section into a variable:
pod2usage(-verbose => 99, -sections => "DESCRIPTION"); # Goes on STDOUT
=head1 DESCRIPTION
A description
=cut
I just want to assign DESCRIPTION to a variable instead displaying it on STDOUT.
I am currently investigating this way. It is complicated and it doesn't work:
sub pod2scalar {
use File::Temp 'tempfile';
my ($fh, $filename) = tempfile(UNLINK => 1);
open OLDOUT, '>&STDOUT';
{
local *STDOUT;
open STDOUT, ">", $filename or warn "Can't open $filename: $!";
#pod2usage(#_); # Doesn't work... I don't know why...
print STDOUT "This is captured in \$str";
close STDOUT;
}
open STDOUT, '>&OLDOUT' or die "Can't restore stdout: $!";
close OLDOUT or die "Can't close OLDOUT: $!";
open $fh, "<", $filename or warn "Can't open $filename: $!";
my $str = do { local $/, <$fh> };
close $fh;
$str;
}
You can open a scalar variable for output by passing a reference to open in place of a file name
Then you can provide the file handle as the value of the -output option of pod2usage to get the data sent to your scalar variable
You will also want to set an -exitval of 'NOEXIT' so that you get a chance to use what you have captured
It would look like this
use Pod::Usage 'pod2usage';
sub pod2scalar {
open my $fh, '>', \my $text;
pod2usage(#_, -output => $fh, -exitval => 'NOEXIT');
$text;
}
There is an excellent perl library Capture::Tiny that simplifies saving stdout/stderr.
By default pod2usage exits the program, so you must specify -exitval => "noexit".
Here is a full working example:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Pod::Usage;
use Capture::Tiny ':all';
=head1 DESCRIPTION
A description
=cut
my $stdout = capture_merged {
pod2usage(-verbose => 99, -sections => "DESCRIPTION", -exitval => "noexit");
};
say "## Captured";
say $stdout;
__END__
This will output:
## Captured
Description:
A description

How to put data from CSV file to Perl hash

I have Perl and CSV file with something like:
"Name","Lastname"
"Homer","Simpsons"
"Ned","Flanders"
In this CSV file I have header in the first line and in other lines there are
data.
I want to convert this CSV file to such Perl data:
[
{
Lastname => "Simpsons",
Name => "Homer",
},
{
Lastname => "Flanders",
Name => "Ned",
},
]
I've written the function that users Text::CSV and doing what I need.
Here is the sample script:
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use 5.010;
use utf8;
use open qw(:std :utf8);
use Text::CSV;
sub read_csv {
my ($filename) = #_;
my #first_line;
my $result;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $filename or die "$filename: $!";
while (my $row = $csv->getline ($fh)) {
if (not #first_line) {
#first_line = #{$row};
} else {
push #{$result}, { map { $first_line[$_] => $row->[$_] } 0..$#first_line };
}
}
close $fh;
return $result;
}
my $data = read_csv('sample.csv');
This works fine but this function I want to use in several scripts. I'm
greatly suprised that Text::CSV doesn't have this feature.
My question. What should I do to simplify solving such tasks in the future for
me and others?
Should I use some Perl module from CPAN, should I try to add this function to
Text::CSV, or something else?
Huh? Why so complicated? First, we fetch the header outside of the loop:
my $headers = $csv->getline($fh) or die "no header";
Assign these to be the column names:
$csv->column_names(#$headers);
Then, each call to getline_hr will provide a hashref:
while (my $hashref = $csv->getline_hr($fh)) {
push #$result, $hashref;
}
We can also use getline_hr_all:
$result = $csv->getline_hr_all($fh);
In other words, it ain't complex, most pieces are already provided by Text::CSV, and it can be done in very few lines.
Also, a module like this seems to already exist: Text::CSV::Slurp. (note: reverse dependency search through metacpan is awesome)
It's probably not a standard feature because different people will want their CSV files parsed into different data structures.
Why not create your own module that wraps this function?
package CSVRead;
use strict;
use warnings;
use 5.010;
use open qw(:std :utf8);
use Text::CSV;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw(read_csv);
sub read_csv {
my ($filename) = #_;
my #first_line;
my $result;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $filename or die "$filename: $!";
while (my $row = $csv->getline ($fh)) {
if (not #first_line) {
#first_line = #{$row};
} else {
push #{$result}, { map { $first_line[$_] => $row->[$_] } 0..$#first_line };
}
}
close $fh;
return $result;
}
Then, use it like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
use CSVRead;
my $data = read_csv('sample.csv');
say Dumper $data;

How to keep data marked as UTF-8 after parsing with HTML::Tree?

I wrote a script, where i slurp in UTF-8 encoded HTML-file and then parse it to tree with HTML::Tree. Problem is that after parsing the strings are not marked as UTF-8 anymore.
As _utf8_on() is not recommended way to set flag on, i am looking for proper way.
My simplified code-example:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use utf8::all;
use autodie;
use HTML::Tree;
use Encode qw/is_utf8/;
my $file = shift;
my $tree;
if ($file) {
my $content = slurp_in( 'file' => $file );
$tree = html_tree('content' => $content);
} else {
die "no file";
}
my $title = $tree->look_down(_tag => 'title');
$title = $title->as_HTML('');
if ( is_utf8( $title ) ) {
say "OK: $title";
} else {
say "NOT OK: $title";
}
## SUBS
##
sub slurp_in {
my %v = #_;
open(my $fh, "<:utf8", $v{file}) || die "no $v{file}: $!";
local $/;
my $content = (<$fh>);
close $fh;
if ($content) {
return $content;
} else {
die "no content in $v{file} !";
}
}
sub html_tree {
my %v = #_;
my $tree = HTML::Tree->new();
$tree->utf8_mode(1); ## wrong call here, no such method, but no warnings on it!
$tree->parse( $v{content} );
if ($tree) {
return $tree;
} else {
die "no tree here";
}
}
Your code is overcomplicated, and you employ utf8::all and decode manually and call that strange method all at once. Rhetorically asking, what do you expect to achieve that way? I do not have the patience to find out the details what goes wrong and where, especially since you did not post any input with which your program fails to do the expected, so I drastically reduce it to a much simpler one. This works:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings FATAL => ':all';
use File::Slurp qw(read_file); # autodies on error
use HTML::Tree qw();
my $file = shift;
die 'no file' unless $file;
my $tree = HTML::Tree->new_from_content(
read_file($file, binmode => ':encoding(UTF-8)')
);
my $title = $tree->look_down(_tag => 'title');
$title->as_HTML(''); # returns a Perl string