XML removeChild, but out output has a blank row - perl

Simple removeChild test, although the xml line is removed, it maintains an empty blank row, how come? Btw - my source xml file does have indents, however even when I remove them I get the same result. So what's the point of being able to removeChild row if it still retains a blank space?
Is there a way to re-format the resulting xml lines prior to outputing it to the file?
foreach my $XYZ ($doc->findnodes("//EE1"))
{
my $library = $XYZ->parentNode;
$library->removeChild($XYZ);
}
print {$FH} $doc->toString(0);
RESULT IN OUTPUT FILE:
<?xml version="1.0"?>
<TopTag>
<AA1>ZNY</AA1>
<AA2>111</AA2>
<BB1>
<CC1>ZNY</CC1>
<CC2>
<DD1>
<-----blank line remains
<EE2>2000</EE2>
</DD1>
<DD1>
<-----blank line remains
<EE2>5000</EE2>
</DD1>
</CC2>
</BB1>
<AA1>ZNY2</AA1>
<AA2>2</AA2>
</TopTag>

The empty lines come from text nodes containing whitespace. Consider the following document:
<doc>
<elem/>
</doc>
The doc element contains the following nodes:
A text node containing a newline and two space characters.
An element node with the elem element.
Another text node containing a newline.
If the elem element is removed, only the text nodes remain resulting in a blank line.
The easiest way to reindent a XML::LibXML document is to use the module XML::LibXML::PrettyPrint. Also have a look at this question.

Remove newlines that are preceded by another newline (positive look-behind assertion) and optional whitespace in between.
my $output = $doc->toString(0);
$output =~ s/(?<=\n)\s*\n//g;
print {$FH} $output;

You can use the no_blanks option for load_xml() - it will automatically strip any extra whitespace elements when importing your XML:
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => $filename, no_blanks => 1);
Since the whitespace is removed, you need to then use:
print $dom->toString(1);
to get nicely formatted output.

Related

My Perl variable to variable substitutions do not work

I have a substitution to make in a Perl script, which I do not seem to get working. I have a string in a text file which has the form:
T+30H
The string T+30H has to be written in many files and has to change from file to file. It is two digits and sometimes three digits. First I define the variable:
my $wrffcr=qr{T+\d+H};
After reading the file containing the string, I have the following substitution command (starting with the file capture)
#scrptlines=<$NCLSCRPT>;
foreach $scrptlines (#scrptlines) {
$scrptlines =~ s/$wrffcr/T+$fcrange2[$jj]H/g;
}
$fcrange2[$jj] is defined and I confirm its value by printing its value just before the above 4 lines of code.
print "$fcrange2[$jj]\n";
When I run my script, nothing changes for this particular substitution. I suspect it is to do with the way I define the string to be substituted.
I will appreciate any assistance.
Zilore Mumba
Watch out for the first + in my $wrffcr=qr{T+\d+H};. It'll make it match 1 or more Ts, not T followed by a +. You probably want
my $wrffcr=qr{T\+\d+H};

Search for a match, after the match is found take the number after the match and add 4 to it, it is posible in perl?

I am a beginer in perl and I need to modify a txt file by keeping all the previous data in it and only modify the file by adding 4 to every number related to a specific tag (< COMPRESSED-SIZE >). The file have many lines and tags and looks like below, I need to find all the < COMPRESSED-SIZE > tags and add 4 to the number specified near the tag:
< SOURCE-START-ADDRESS >01< /SOURCE-START-ADDRESS >
< COMPRESSED-SIZE >132219< /COMPRESSED-SIZE >
< UNCOMPRESSED-SIZE >229376< /UNCOMPRESSED-SIZE >
So I guess I need to do something like: search for the keyword(match) and store the number 132219 in a variable and add the second number (4) to it, replace the result 132219 with 132223, the rest of the file must remain unchanged, only the numbers related to this tag must change. I cannot search for the number instead of the tag because the number could change while the tag will remain always the same. I also need to find all the tags with this name and replace the numbers near them by adding 4 to them. I already have the code for finding something after a keyword, because I needed to search also for another tag, but this script does something else, adds a number in front of a keyword. I think I could use this code for what i need, but I do not know how to make the calculation and keep the rest of the file intact or if it is posible in perl.
while (my $row = <$inputFileHandler>)
{
if(index($row,$Data_Pattern) != -1){
my $extract = substr($row, index($row,$Data_Pattern) + length($Data_Pattern), length($row));
my $counter_insert = sprintf "%08d", $counter;
my $spaces = " " x index($row,$Data_Pattern);
$data_to_send ="what i need to add" . $extract;
print {$outs} $spaces . $Data_Pattern . $data_to_send;
$counter = $counter + 1;
}
else
{
print {$outs} $row;
next;
}
}
Maybe you could help me with a block of code for my needs, $Data_Pattern is the match. Thank you very much!
This is a classic one-liner Perl task. Basically you would do something like
$ perl -i.bak -pe's/^< COMPRESSED-SIZE >\K(\d+)/$1 + 4/e' yourfile.txt
Which will in essence copy and replace your file with a new, edited file. This can be very dangerous, especially if you are a Perl newbie. The -i switch is here used with the .bak extension which saves a backup in yourfile.txt.bak. This does not make this operation safe, however, as running the command twice will overwrite the backup.
It is advisable to make a separate backup of the target file before using this command.
-i.bak edit "in-place", the file is overwritten, a backup of the original is created with extension .bak.
-p argument is treated as a file name, which is read, and printed back.
s/ // the substitution operator, which is applied to all lines of the file.
^ inside the regex looks for beginning of line.
\K keep the match that is to the left.
(\d+) capture () 1 or more digits \d+ and store them in $1
/e treat the right hand side of the substitution operator as an expression and use the result as the replacement string. In this case it will increase your number and return the sum.
The long version of this command is
while (<>) {
s/^< COMPRESSED-SIZE >\K(\d+)/$1 + 4/e
}
Which can be placed in a file and run with the -i switch.

String variable position being overwritten in write-host

If I run the below code, $SRN can be written as output or added to another variable, but trying to include either another variable or regular text causes it to be overwritten from the beginning of the line. I'm assuming it's something to do with how I'm assigning $autocode and $SRN initially but can't tell what it's trying to do.
# Load the property set to allow us to get to the email body.
$item.load($psPropertySet) # Load the data.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\n" # Get the body text, remove blank lines, split on line breaks to create an array (otherwise it is a single string).
$autocode = $bod[4].split('-')[2] # Get line 4 (should be Title), split on dash, look for 3rd element, this should contain our automation code.
$SRN = $bod[1] -replace 'ID: ','' # Get line 2 (should be ID), find and replace the preceding text.
# Skip processing if autocode does not match our list of handled ones.
if ($autocode -cin $autocodes)
{
write-host "$SRN $autocode"
write-host "$autocode $SRN"
write-host "$SRN test"
$var = "$SRN $autocode"
$var
}
The code results in this, you can see if $SRN isn't at the start of the line it is fine. Unsure where the extra spaces come from either:
KRNE8385
KRNE SR1788385
test8385
KRNE8385
I would expect to see this:
SR1788385 KRNE
KRNE SR1788385
SR1788385 test
SR1788385 KRNE
LotPings pointed me down the right path, both variables still had either "0D" or "\r" in them. My regex replace was only getting rid of them on blank lines, and I split the array on "\n" only. Changing line 3 in the original code to the below appears to have resolved the issue. First time seeing Format-Hex, but it appears to be excellent for troubleshooting such issues.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\r\n"

Using perl to split over multiple lines

I'm trying to write a perl script to process a log4net log file. The fields in the log file are separated by a semi-colon. My end goal is to capture each field and populate a mysql table.
Usually I have lines that look a little like this (all on a single line)
DEBUG;2017-06-13T03:56:38,316-05:00;2017-06-13 08:56:38,316;79ab0b95-7f58-
44a8-a2c6-1f8feba1d72d;(null);WorkerStartup 1;"Starting services."
These are easy to process. I can simply split by semicolon to get the information I need.
However occassionally the "message" field at the end may span several lines, especially if there is a stack trace. I would want to capture the entire message as a single column. I cannot use split by semicolon, because the next lines would typically look like:
at some.random.classname
at another.classname
...
Can someone give some tips how to solve this problem?
The following solution uses that the number of " in a field is even ($p=~y/"//%2), this condition number of " odd may be changed by other that can indicate the field is not complete.
The number of columns splitted is fixed to 7 (to allow ; in last field) and may be changed for example #array = map {s/;$//} $p=~/\G(?:"[^"]*"|[^;])*;/g;.
The file is read line by line but a line is processed sub process when it's complete $p variable to store the previous line the last line is processed in END block.
perl -ne '
sub process {
#array = split /;/,$p,7;
# do something with array
print ((join "\n---\n", #array),"\n");
}
if ($p=~y/"//%2) {
$p.=$_;
next;
}
process;
$p=$_;
END{process}
' < logfile.txt

Perl get array count so can start foreach loop at a certain array element

I have a file that I am reading in. I'm using perl to reformat the date. It is a comma seperated file. In one of the files, I know that element.0 is a zipcode and element.1 is a counter. Each row can have 1-n number of cities. I need to know the number of elements from element.3 to the end of the line so that I can reformat them properly. I was wanting to use a foreach loop starting at element.3 to format the other elements into a single string.
Any help would be appreciated. Basically I am trying to read in a csv file and create a cpp file that can then be compiled on another platform as a plug-in for that platform.
Best Regards
Michael Gould
you can do something like this to get the fields from a line:
my #fields = split /,/, $line;
To access all elements from 3 to the end, do this:
foreach my $city (#fields[3..$#fields])
{
#do stuff
}
(Note, based on your question I assume you are using zero-based indexing. Thus "element 3" is the 4th element).
Alternatively, consider Text::CSV to read your CSV file, especially if you have things like escaped delimiters.
Well if your line is being read into an array, you can get the number of elements in the array by evaluating it in scalar context, for example
my $elems = #line;
or to be really sure
my $elems = scalar(#line);
Although in that case the scalar is redundant, it's handy for forcing scalar context where it would otherwise be list context. You can also find the index of the last element of the array with $#line.
After that, if you want to get everything from element 3 onwards you can use an array slice:
my #threeonwards = #line[3 .. $#line];