XML::Twig - Inplace editing

XML::Twig - Inplace editing - perl

I'm trying to append timestamp at the end of xdp file. I am using XML::Twig. On running the script timestamp (<testing>4619314911532861</testing>) gets added at the end but the output is coming on STDOUT instead of testdata.xdp. What am I missing?
Code:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig=XML::Twig->new(pretty_print => 'indented');
my $file = 'testdata.xdp';
$twig->parsefile_inplace($file, '.bak');
my $root= $twig->root;
my #children= $root->children;
foreach my $child (#children){
my $eblg= new XML::Twig::Elt( 'testing', localtime);
$eblg->paste( 'last_child', $child);
}
$twig->flush;

The problem here is - parsefile_inplace works as a standalone thing. It replaces the source file immediately after the parse operation is complete.
So to use it like that, you need to do your 'work' within twig_handlers. If you do this, it'll parse/modify/overwrite.
E.g.:
sub insert_after_all {
my ( $twig, $element ) = #_;
my $eblg= new XML::Twig::Elt( 'testing', localtime);
$eblg->paste( 'last_child', $element);
$twig -> flush;
}
my $twig = XML::Twig->new(pretty_print => 'indented',
twig_handlers => { '_all_' => \&insert_after_all } );
my $file = 'testdata.xdp';
$twig->parsefile_inplace($file, '.bak');
Otherwise - rename the source, and print {$new_fh} $twig -> sprint;

Related

Using Perl I want to filter files with names matching a particular pattern using SFTP

I tried the below snippet but it always gets the name of all files present in the current directory.
my %args = ("user" => $user, "password" => $pass, "warn" => 0 );
$con= Net::SFTP->new($server, %args);
#files = $con->ls($dir, wanted =>"*.csv");

Seems like there is no wanted option for the ls method. However, you can pass a callback. Here is an example using Text::Glob and a callback function wanted():
use strict;
use warnings;
use Net::SFTP;
use Text::Glob qw( match_glob );
my $con= Net::SFTP->new('host', user => 'user', password => 'pass');
my $dir = '.'; # specify the directory
my #files; # store filenames here
$con->ls($dir, sub { wanted( $_[0], '*.csv') } );
say for #files; # print the matched filenames
sub wanted {
my ( $info, $pat ) = #_;
my $filename = $info->{filename};
if (match_glob( $pat, $filename ) ) {
push #files, $filename;
}
}

You can do it this way in Net::SFTP::Foreign module. I believe same should work for Net::SFTP.
my #files = map {$_->{'filename'}} grep {$_->{'filename'} =~ /.csv$/i} #{$con->ls("$dir")};

$twig->purge is giving empty file

I may be asking a basic question but it's killing me.
Following is my code snippet
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = new XML::Twig( twig_handlers => { TRADE => \&TRADE } );
$twig->parsefile('1510.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');
sub TRADE {
my ( $twig, $TRADE ) = #_;
#added delete in place of cut
$TRADE->cut($TRADE) unless
$TRADE->att('origin') eq "COMPUTER";
}
This is working as expected. It is giving me all TRADES having 'origin' equals 'COMPUTER'.
But I need to handle XML files spanning to 1 GB.
In that case it 'segmentation error' as it consumes huge memory.
Hence, in order to resolve the issue I am trying to implement 'purge' concept of XML::Twig
Hence I modified the code to :
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = new XML::Twig( twig_handlers => { TRADE => \&TRADE } );
$twig->parsefile('1510.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');
sub TRADE {
my ( $twig, $TRADE ) = #_;
#added delete in place of cut
$TRADE->cut($TRADE) unless
$TRADE->att('origin') eq "COMPUTER";
$twig->purge;
}
This is giving me empty file. I am trying to flush those twigs which are used in order to use memory efficiently.
I don't know why it is giving me blank output file.
Sample XML :
<TRADEEXT>
<TRADE origin = 'COMPUTER'/>
<TRADE origin = 'COMP'/>
<TRADE origin = 'COMPP'/>
</TRADEEXT>
output file:
<TRADEEXT>
<TRADE origin = 'COMPUTER'/>
</TRADEEXT>

You should probably use flush (to a filehanlde) instead of purge: flush outputs the twig that has been parsed so far and frees the memory, while purge only frees the memory.
That said if all you want is remove the TRADE elements that don't have the proper attribute, you could do something like this:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
open( my $out, '>:utf8', "out.xml") or die "cannot create output file out.xml: $!";
my $twig = XML::Twig->new( pretty_print => 'indented',
twig_roots => { 'TRADE[#origin != "COMPUTER"]'
=> sub { $_->delete; }
},
twig_print_outside_roots => $out,
)
->parsefile('1510.xml');
This will leave some extra empty lines in the file, you can remove them later. The twig_roots handler is triggered for all elements you need to remove, and it deletes them, while the twig_print_outside_roots option causes all other elements to be printed as_is.

Read ini files without section names

I want to make a configuration file which hold some objects, like this (where of course none of the paramaters can be considered as a primary key)
param1=abc
param2=ghj
param1=bcd
param2=hjk
; always the sames parameters
This file could be read, lets say with Config::IniFiles, because it has a direct transcription into ini file, like this
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
with, for example, something like
perl -pe 'if (m/^\s*$/ || !$section ) print "[", ($section++ || 0) , "]"'
And finish with
open my $fh, '<', "/path/to/config_file.ini" or die $!;
$cfg = Config::IniFiles->new( -file => $fh );
(...parse here the sections starting with 0.)
But, I here ask me some question about the thing becoming quite complex....
(A) Is There a way to transform the $fh, so that it is not required to execute the perl one-liner BEFORE reading the file sequentially? So, to transform the file during perl is actually reading it.
or
(B) Is there a module to read my wonderfull flat database? Or something approching? I let myslef said, that Gnu coreutils does this kind of flat file reading, but I cannot remember how.

You can create a simple subclass of Config::INI::Reader:
package MyReader;
use strict;
use warnings;
use base 'Config::INI::Reader';
sub new {
my $class = shift;
my $self = $class->SUPER::new( #_ );
$self->{section} = 0;
return $self;
}
sub starting_section { 0 };
sub can_ignore { 0 };
sub parse_section_header {
my ( $self, $line ) = #_;
return $line =~ /^\s*$/ ? ++$self->{section} : undef ;
}
1;
With your input this gives:
% perl -MMyReader -MData::Dumper -e 'print Dumper( MyReader->read_file("cfg") )'
$VAR1 = {
'1' => {
'param2' => 'hjk',
'param1' => 'bcd'
},
'0' => {
'param2' => 'ghj',
'param1' => 'abc'
}
};

You can use a variable reference instead of a file name to create a filehandle that reads from it:
use strict;
use warnings;
use autodie;
my $config = "/path/to/config_file.ini";
my $content = do {
local $/;
open my $fh, "<", $config;
"\n". <$fh>;
};
# one liner replacement
my $section = 0;
$content =~ s/^\s*$/ "\n[". $section++ ."]" /mge;
open my $fh, '<', \$content;
my $cfg = Config::IniFiles->new( -file => $fh );
# ...

You can store the modified data in a real file or a string variable, but I suggest that you use paragraph mode by setting the input record separator $/ to the empty string. Like this
use strict;
use warnings;
{
local $/ = ''; # Read file in "paragraphs"
my $section = 0;
while (<DATA>) {
printf "[%d]\n", $section++;
print;
}
}
__DATA__
param1=abc
param2=ghj
param1=bcd
param2=hjk
output
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
Update
If you read the file into a string, adding section identifiers as above, then you can read the result directly into a Config::IniFiles object using a string reference, for instance
my $config = Config::IniFiles->new(-file => \$modified_contents)
This example shows the tie interface, which results in a Perl hash that contains the configuration information. I have used Data::Dump only to show the structure of the resultant hash.
use strict;
use warnings;
use Config::IniFiles;
my $config;
{
open my $fh, '<', 'config_file.ini' or die "Couldn't open config file: $!";
my $section = 0;
local $/ = '';
while (<$fh>) {
$config .= sprintf "[%d]\n", $section++;
$config .= $_;
}
};
tie my %config, 'Config::IniFiles', -file => \$config;
use Data::Dump;
dd \%config;
output
{
# tied Config::IniFiles
"0" => {
# tied Config::IniFiles::_section
param1 => "abc",
param2 => "ghj",
},
"1" => {
# tied Config::IniFiles::_section
param1 => "bcd",
param2 => "hjk",
},
}

You may want to perform operations on a flux of objects (as Powershell) instead of a flux of text, so
use strict;
use warnings;
use English;
sub operation {
# do something with objects
...
}
{
local $INPUT_RECORD_SEPARATOR = '';
# object are separated with empty lines
while (<STDIN>) {
# key value
my %object = ( m/^ ([^=]+) = ([[:print:]]*) $ /xmsg );
# key cannot have = included, which is the delimiter
# value are printable characters (one line only)
operation ( \%object )
}
A like also other answers.

How to put data from CSV file to Perl hash

I have Perl and CSV file with something like:
"Name","Lastname"
"Homer","Simpsons"
"Ned","Flanders"
In this CSV file I have header in the first line and in other lines there are
data.
I want to convert this CSV file to such Perl data:
[
{
Lastname => "Simpsons",
Name => "Homer",
},
{
Lastname => "Flanders",
Name => "Ned",
},
]
I've written the function that users Text::CSV and doing what I need.
Here is the sample script:
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use 5.010;
use utf8;
use open qw(:std :utf8);
use Text::CSV;
sub read_csv {
my ($filename) = #_;
my #first_line;
my $result;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $filename or die "$filename: $!";
while (my $row = $csv->getline ($fh)) {
if (not #first_line) {
#first_line = #{$row};
} else {
push #{$result}, { map { $first_line[$_] => $row->[$_] } 0..$#first_line };
}
}
close $fh;
return $result;
}
my $data = read_csv('sample.csv');
This works fine but this function I want to use in several scripts. I'm
greatly suprised that Text::CSV doesn't have this feature.
My question. What should I do to simplify solving such tasks in the future for
me and others?
Should I use some Perl module from CPAN, should I try to add this function to
Text::CSV, or something else?

Huh? Why so complicated? First, we fetch the header outside of the loop:
my $headers = $csv->getline($fh) or die "no header";
Assign these to be the column names:
$csv->column_names(#$headers);
Then, each call to getline_hr will provide a hashref:
while (my $hashref = $csv->getline_hr($fh)) {
push #$result, $hashref;
}
We can also use getline_hr_all:
$result = $csv->getline_hr_all($fh);
In other words, it ain't complex, most pieces are already provided by Text::CSV, and it can be done in very few lines.
Also, a module like this seems to already exist: Text::CSV::Slurp. (note: reverse dependency search through metacpan is awesome)

It's probably not a standard feature because different people will want their CSV files parsed into different data structures.
Why not create your own module that wraps this function?
package CSVRead;
use strict;
use warnings;
use 5.010;
use open qw(:std :utf8);
use Text::CSV;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw(read_csv);
sub read_csv {
my ($filename) = #_;
my #first_line;
my $result;
my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<:encoding(utf8)", $filename or die "$filename: $!";
while (my $row = $csv->getline ($fh)) {
if (not #first_line) {
#first_line = #{$row};
} else {
push #{$result}, { map { $first_line[$_] => $row->[$_] } 0..$#first_line };
}
}
close $fh;
return $result;
}
Then, use it like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
use CSVRead;
my $data = read_csv('sample.csv');
say Dumper $data;

How to add one more node information to xml file

I written one script that create one xml file from multiple files,I written script like this.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
use Digest::MD5 'md5';
if ( #ARGV == 0 ) {
push #ARGV, "c:/main/work";
warn "Using default path $ARGV[0]\n Usage: $0 path ...\n";
}
open( my $allxml, '>', "all_xml_contents.combined.xml" )
or die "can't open output xml file for writing: $!\n";
print $allxml '<?xml version="1.0" encoding="UTF-8"?>',
"\n<Shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
my %shipto_md5;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
},
#ARGV
);
print $allxml "</Shiporder>\n";
sub extract_information {
my $path = $_;
if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'data' )) {
my $elem = $reader->readOuterXml();
my $md5 = md5( $elem );
print $allxml $reader->readOuterXml() unless ( $shipto_md5{$md5}++ );
}
}
return;
}
from above script I am extracting data node information from all xml files and stored in a new xml file . but I have one more node starts with "details", I need to extract that information and I need to add that information also to the file, I tried like this
$reader->nextElement( 'details' );
my $information = $reader->readOuterXml();
I added this in while loop but how can I assign or print this data into same file($all xml). Please help me with this problem.
After your suggestion I tried like this, It gives error
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
if ( #ARGV == 0 ) {
push #ARGV, "V:/main/work";
warn "Using default path $ARGV[0]\n Usage: $0 path ...\n";
}
my $libXML = new XML::LibXML;
my $outputDom = $libXML->parse_string('<?xml version="1.0" encoding="UTF-8"?
>','<Shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">');
my $shiporder = $outputDom->documentElement;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
},
#ARGV
);
sub extract_information {
my $path = $_;
if(my #inputDom = XML::LibXML->load_xml(location => $path)){
$inputDom->findnodes('//data || //deatils');
foreach (#$inputDom) {
$shiporder->appendChild($_->parentNode->cloneNode(1));
}
$outputDom->toFile("allfiles.xml");
}
}
but it gives like " '\n\n:1: Parser error:Strat tag expected,'<' not found " Can you help me with script because I am very new to perl.

You would do a lot better if you used what XML::LibXML and related modules gives you, it is a very large and comprehensive module and allows you to do a lot in few lines.
You can use the parser to start a new dom document using parse_string, storing the root node using documentElement. From there, use parse_file to load up each of your input files, then findnodes on the input files to extract the nodes you want to clone. Then append a clone of your input nodes to the output document, and finally use the toFile method to write out your output.
Something like:
my $libXML = new XML::LibXML;
my $outputDom = $libXML->parse_string('<?xml version="1.0" encoding="UTF-8"?>',
'\n<Shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n');
my $shiporder = $outputDom->documentElement;
...
my $inputDom = $libXML->parse_file(some_file_name);
$inputDom->findnodes('//data || //details'); # use a more suitable xpath
foreach (#$inputDom) {
$shipOrder->appendChild($_->parentNode->cloneNode(1)); # if you want parent too...
}
...
$outputDom->toFile(some_output_file);
}
You will have to allow for namespaces and whatnot, but this gives one approach to start with.