Perl code to delete a multi line XML node - perl

I have an xml file test.xml
<many-nested-roots>
<foo>
<bar>
</bar>
</foo>
<other-random-nodes></other-random-nodes>
<foo>
<bar>
<foobar>
</foobar>
</bar>
</foo>
<!-- multiple such blocks not in any particular order -->
</many-nested-roots>
I need to delete xml node <foo><bar></bar></foo> but not <foo><bar><foobar></foobar></bar></foo>.
EDIT: The node <foo><bar></bar></foo> occurs multiple times and randomly across a heavily nested XML.
What I tried which doesn't work:
perl -ne 'print unless /^\s*<foo>\n\s*<bar>\n\s*<\bar>\n\s*<\/foo>/' test.xml
^ This doesn't match for newline
perl -ne 'print unless /<foo>/ ... /<\/foo>/' test.xml
^ This deletes all the tags including <foobar>
perl -ne 'print unless /<foo>.*?<bar>.*?<\/bar>.*?<\/foo>/s' test.xml
^ I used /s to let . match for newline. Doesn't work.

A one-liner using XML::LibXML and an XPath expression to find the nodes to delete:
perl -MXML::LibXML -E '
my $dom = XML::LibXML->load_xml(location => $ARGV[0]);
$_->unbindNode for $dom->documentElement->find("//foo/bar[count(*)=0]/..")->#*;
print $dom->serialize' test.xml
(Old versions of perl need #{$dom->...} instead of $dom->...->#*)
Or using xmlstarlet (not perl, but very handy for scripted manipulation of XML files):
xmlstarlet ed -d '//foo/bar[count(*)=0]/..' test.xml

As #Shawn and #tshiono said, you should not use regex but a XML parser. Here is an example, but not a one-liner, using Mojo::DOM provided by Mojolicious:
#!/usr/bin/env perl
use Mojo::Base -strict, -signatures;
use Mojo::DOM;
use Mojo::File 'path';
my $dom = Mojo::DOM->new->xml(1)->parse(path($ARGV[0])->slurp);
$dom->find("foo bar")->each(
sub ($el, $i) { $el->parent->remove if $el->children->size == 0 }
);
print $dom;
If you save it as myscript.pl you can call it with ./myscript.pl test.xml.

Would you please try:
perl -0777 -pe s'#<foo>\s*<bar>\s*</bar>\s*</foo>\s*##g' test.xml
The -0777 option tells perl to slurp whole file at once to make the regex match across lines.
Please note it is not recommended to parse XML files with regex. Perl has several modules to handle XML files such as XML::Simple. As a standalone program, XMLstarlet will be a nice tool to manipulate XML files.

Related

How to escape ',' sign at command line?

When I do in perl script (source):
`perl -d:DebugHooks::DbInteract='s;e [\#DB::stack\,\#DB::goto_frames]' -e '1'`
The module gets two arguments:
s;e [\#DB::goto_frames\
\#DB::stack];
But I want to get only one:
s;e [\#DB::goto_frames,\#DB::stack];
How to escape ',' sign?
It's got to be your module splitting on the comma, not perl or the shell. Running the following under bash, I get only a single argument in #ARGV:
$ perl -w -E 'say join "\n", ("---", #ARGV, "---")' 's;e [\#DB::stack\,\#DB::goto_frames]'
---
s;e [\#DB::stack\,\#DB::goto_frames]
---
Edit:
I stand corrected. It is being split on commas by perl, presumably to allow passing multiple arguments to a module, as I proved to myself by creating a module at ./Devel/DbInteract.pm containing:
package Devel::DbInteract;
use strict;
use warnings;
use 5.010;
sub import {
say 'Received ' . scalar #_ . ' params:';
say join "\n", #_;
}
1;
and running the command:
$ PERL5LIB=. perl -d:DbInteract='s;e [\#DB::stack,\#DB::goto_frames]' -e ''
Received 3 params:
Devel::DbInteract
s;e [\#DB::stack
\#DB::goto_frames]
Judging by the source linked in the asker's answer, there does not appear to be any provision for escaping values or any other way to prevent this splitting. Your options, then, would be to either work around it by joining the params back together or submitting a patch to the perl source to add an allowance for escaping.
Perl does not care about escaping: https://github.com/Perl/perl5/blob/blead/perl.c#L3240-L3264
the -d flag just add next as zero line number into the script:
use Devel::DbInteract split(/,/,q{s;e [\#DB::stack\,\#DB::goto_frames]});
Some people advice me simple approach than patching perl. Just to use => in my case:
`perl -d:DebugHooks::DbInteract='s;e [\#DB::stack => \#DB::goto_frames]' -e '1'`

Perl how do I add text to specifically second line of file?

Trying to do this sort of thing in perl:
sed '1 a<!-- $Header: $\n Purpose: system generated file -->' -i test.xml
Add the header block and purpose to line #2 in the file for xml, shell scripts, etc...
Don't want to do this either:
`sed '1 a<!-- \$Header: \$\n Purpose: system generated file -->' -i test.xml`
But realize it's an option if absolutely necessary.
If you only pass one file, you can use the following:
perl -i -pe'
$_ .= "<!-- \$Header: \$\n Purpose: system generated file -->\n" if $. == 1;
' test.xml
If you might pass multiple files, you'll need to add a line so that $. is reset at the end of each file.
perl -i -pe'
$_ .= "<!-- \$Header: \$\n Purpose: system generated file -->\n" if $. == 1;
close(ARGV) if eof;
' test*.xml
(Note: eof() means something different than just eof. how awful is that!)
I added line breaks for readability. The commands will work as is, but you can remove the line breaks if you so desire.
Try this way:
perl -ple '++$i == 2 and $_ = "changed" # change $_ as you want' in.txt > out.txt

Perl script+Remove all lines except the one's starting with "Only in"

I have a file which has many lines. A few of those start with "Only in". So I want to retain only the lines which start with "Only in" and delete the rest. Can someone please tell me what regex command I could use.
Something like " %s/!(Only in)/rm -rf that line " Sorry for mixing up verilog, unix and perl here. Can someone help me with the same
perl -ne 'print if /^Only in /' file
See also grep.
perl -i -ne '/^Only in/ and print' file
And in a script :
use strict; use warnings;
$INPLACE_EDIT = 1;
while (<>) {
/^Only in/ and print;
}
Both the 2 solutions replace inline without the need of creating temporary files by yourself.

How can I convert Perl one-liners into complete scripts?

I find a lot of Perl one-liners online. Sometimes I want to convert these one-liners into a script, because otherwise I'll forget the syntax of the one-liner.
For example, I'm using the following command (from nagios.com):
tail -f /var/log/nagios/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
I'd to replace it with something like this:
tail -f /var/log/nagios/nagios.log | ~/bin/nagiostime.pl
However, I can't figure out the best way to quickly throw this stuff into a script. Does anyone have a quick way to throw these one-liners into a Bash or Perl script?
You can convert any Perl one-liner into a full script by passing it through the B::Deparse compiler backend that generates Perl source code:
perl -MO=Deparse -pe 's/(\d+)/localtime($1)/e'
outputs:
LINE: while (defined($_ = <ARGV>)) {
s/(\d+)/localtime($1);/e;
}
continue {
print $_;
}
The advantage of this approach over decoding the command line flags manually is that this is exactly the way Perl interprets your script, so there is no guesswork. B::Deparse is a core module, so there is nothing to install.
Take a look at perlrun:
-p
causes Perl to assume the following loop around your program, which makes it iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
If a file named by an argument cannot be opened for some reason, Perl warns you about it, and moves on to the next file. Note that the lines are printed automatically. An error occurring during printing is treated as fatal. To suppress printing use the -n switch. A -p overrides a -n switch.
BEGIN and END blocks may be used to capture control before or after the implicit loop, just as in awk.
So, simply take this chunk of code, insertyour code at the "# your program goes here" line, and viola, your script is ready!
Thus, it would be:
#!/usr/bin/perl -w
use strict; # or use 5.012 if you've got newer perls
while (<>) {
s/(\d+)/localtime($1)/e
} continue {
print or die "-p destination: $!\n";
}
That one's really easy to store in a script!
#! /usr/bin/perl -p
s/(\d+)/localtime($1)/e
The -e option introduces Perl code to be executed—which you might think of as a script on the command line—so drop it and stick the code in the body. Leave -p in the shebang (#!) line.
In general, it's safest to stick to at most one "clump" of options in the shebang line. If you need more, you could always throw their equivalents inside a BEGIN {} block.
Don't forget chmod +x ~/bin/nagiostime.pl
You could get a little fancier and embed the tail part too:
#! /usr/bin/perl -p
BEGIN {
die "Usage: $0 [ nagios-log ]\n" if #ARGV > 1;
my $log = #ARGV ? shift : "/var/log/nagios/nagios.log";
#ARGV = ("tail -f '$log' |");
}
s/(\d+)/localtime($1)/e
This works because the code written for you by -p uses Perl's "magic" (2-argument) open that processes pipes specially.
With no arguments, it transforms nagios.log, but you can also specify a different log file, e.g.,
$ ~/bin/nagiostime.pl /tmp/other-nagios.log
Robert has the "real" answer above, but it's not very practical. The -p switch does a bit of magic, and other options have even more magic (e.g. check out the logic behind the -i flag). In practice, I'd simply just make a bash alias/function to wrap around the oneliner, rather than convert it to a script.
Alternatively, here's your oneliner as a script: :)
#!/usr/bin/bash
# takes any number of arguments: the filenames to pipe to the perl filter
tail -f $# | perl -pe 's/(\d+)/localtime($1)/e'
There are some good answers here if you want to keep the one-liner-turned-script around and possibly even expand upon it, but the simplest thing that could possibly work is just:
#!/usr/bin/perl -p
s/(\d+)/localtime($1)/e
Perl will recognize parameters on the hashbang line of the script, so instead of writing out the loop in full, you can just continue to do the implicit loop with -p.
But writing the loop explicitly and using -w and "use strict;" are good if plan to use it as a starting point for writing a longer script.
#!/usr/bin/env perl
while(<>) {
s/(\d+)/localtime($1)/e;
print;
}
The while loop and the print is what -p does automatically for you.

How do I run a Perl script on multiple input files with the same extension?

How do I run a Perl script on multiple input files with the same extension?
perl scriptname.pl file.aspx
I'm looking to have it run for all aspx files in the current directory
Thanks!
In your Perl file,
my #files = <*.aspx>;
for $file (#files) {
# do something.
}
The <*.aspx> is called a glob.
you can pass those files to perl with wildcard
in your script
foreach (#ARGV){
print "file: $_\n";
# open your file here...
#..do something
# close your file
}
on command line
$ perl myscript.pl *.aspx
You can use glob explicitly, to use shell parameters without depending to much on the shell behaviour.
for my $file ( map {glob($_)} #ARGV ) {
print $file, "\n";
};
You may need to control the possibility of a filename duplicate with more than one parameter expanded.
For a simple one-liner with -n or -p, you want
perl -i~ -pe 's/foo/bar/' *.aspx
The -i~ says to modify each target file in place, and leave the original as a backup with an ~ suffix added to the file name. (Omit the suffix to not leave a backup. But if you are still learning or experimenting, that's a bad idea; removing the backups when you're done is a much smaller hassle than restoring the originals from a backup if you mess something up.)
If your Perl code is too complex for a one-liner (or just useful enough to be reusable) obviously replace -e '# your code here' with scriptname.pl ... though then maybe refactor scriptname.pl so that it accepts a list of file name arguments, and simply use scriptname.pl *.aspx to run it on all *.aspx files in the current directory.
If you need to recurse a directory structure and find all files with a particular naming pattern, the find utility is useful.
find . -name '*.aspx' -exec perl -pi~ -e 's/foo/bar/' {} +
If your find does not support -exec ... + try with -exec ... \; though it will be slower and launch more processes (one per file you find instead of as few as possible to process all the files).
To only scan some directories, replace . (which names the current directory) with a space-separated list of the directories to examine, or even use find to find the directories themselves (and then perhaps explore -execdir for doing something in each directory that find selects with your complex, intricate, business-critical, maybe secret list of find option predicates).
Maybe also explore find2perl to do this directory recursion natively in Perl.
If you are on Linux machine, you could try something like this.
for i in `ls /tmp/*.aspx`; do perl scriptname.pl $i; done
For example to handle perl scriptname.pl *.aspx *.asp
In linux: The shell expands wildcards, so the perl can simply be
for (#ARGV) {
operation($_); # do something with each file
}
Windows doesn't expand wildcards so expand the wildcards in each argument in perl as follows. The for loop then processes each file in the same way as above
for (map {glob} #ARGV) {
operation($_); # do something with each file
}
For example, this will print the expanded list under Windows
print "$_\n" for(map {glob} #ARGV);
You can also pass the path where you have your aspx files and read them one by one.
#!/usr/bin/perl -w
use strict;
my $path = shift;
my #files = split/\n/, `ls *.aspx`;
foreach my $file (#files) {
do something...
}