How can I replace the following string:
<value>-myValue</value>
<value>1234</value>
And make it to be:
<value>-myValue</value>
<value>0</value>
Please take into account that there is a line break.
Script
sed -e '/<value>-myValue</,/<value>/{ /<value>[0-9][0-9]*</ s/[0-9][0-9]*/0/; }' data
From a line containing <value>-myValue< to the next line containing <value>, if the line matches <value>XX< where XX is a string of one or more digits, replace the string of digits with 0.
Input
This is not something to change
<value>-myValue</value>
<value>1234</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>4321</value>
stuffing
Output
This is not something to change
<value>-myValue</value>
<value>0</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>0</value>
stuffing
If this is XML, TLP is right that an XML parser would be superior. Continuing on with your sed approach, however, consider:
$ sed '/<value>-myValue/ {N; s/<value>[[:digit:]]\+/<value>0/}' file
<value>-myValue</value>
<value>0</value>
You can possibly simplify this a bit, depending on what criteria you specifically want to use:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' )->parse( \*DATA );
foreach my $value ( $twig->findnodes('//value') ) {
if ( $value->trimmed_text eq '-myValue'
and $value->next_sibling('value')
and $value->next_sibling('value')->text =~ m/^\d+$/ )
{
$value->next_sibling('value')->set_text('1234');
}
}
$twig->print;
__DATA__
<root>
<value>-myValue</value>
<value>0</value>
</root>
This outputs:
<root>
<value>-myValue</value>
<value>1234</value>
</root>
It parses your XML.
Looks for all nodes with a tag of value.
Checks that it has a sibling.
Checks that sibling is 'just numeric' e.g. matching regex ^\d+$
replaces the content of that sibling with 1234.
And will work on XML regardless of formatting, which is the problem with XML - pretty fundamentally there's a bunch of entirely valid things you can do that are semantically identical in XML.
Related
On a Unix system I have an input text file containing long multi-line strings.
I now want to remove line breaks only between two patterns ( and ) which can be on different lines.
Example input file:
text1 text2 <remarks> text3
text4 text5 </remarks> text6 text7 text8
Result output for the above input file should be:
text1 text2 <remarks> text3 text4 text5 </remarks> text6 text7 text8
I would prefer to use sed or Perl or maybe awk to do the job.
I do not see a solution as the newlines can happen "randomly" and text is just some log messages.
Here is a more detailed look of the input file I need to process. It does not contain a root XML section, but for testing I might just add one manually. Also there may be many "remarks" sections.
Inputfile Snippet (as it is very long), Filename is test:
<paymentTerm keyValue1="8" objectType="PAYMENTTERM" />
<paymentType keyValue1="20" objectType="PAYMENTTYPE" />
<priceList keyValue1="1" objectType="PRICELIST" />
<remarks>Zollanmeldung ab 250 €
Lager Adresse:
Hessen-Ring 456
D-64546 Mörfelden-Walldorff
eine Stunde vor Ankunft melden unter Mobile
Neu Spedition
A&R Logistics Group
Storkenburgstrasse 99
D-62546 Mörfelden-Walldorf
www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON" />
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE" />
As stated above I want to remove the linebreaks ONLY between the patterns "remarks" and "/remarks".
I tried the Perl XML Parsing suggested by borodin like this:
use strict;
use warnings 'all';
use XML::Twig;
use constant XML_FILE => 'test';
my $twig = XML::Twig->new(
twig_handlers => {
remarks => sub { $_->set_text($_->trimmed_text) }
}
);
$twig->parsefile(XML_FILE);
$twig->print;
It works, but prints everything on one line.
With GNU awk for multi-char RS:
$ awk -v RS='</?remarks>' -v ORS= '!(NR%2){gsub(/\n/,OFS)} {print $0 RT}' file
text1 text2 <remarks> text3 text4 text5 </remarks> text6 text7 text8
XML can represent the same information in many different ways, and it is always a risk to try processing it using regular expressions. It is far better to use a proper XML module to process XML data. This solution uses
XML::Twig
In the constructor for the $twig object you can specify a callback which is called automatically every time a given XML element is encountered in the input
The trimmed_text method removes leading and trailing whitespace from the text of the element, and turns any internal whitespace sequences, including line breaks, into a single space. That is exactly what you are asking for here, so a call to set_text is all that is necessary to update the string
The file to be processed is specified by the XML_FILE constant and you should modify that to specify the path to your own data file. The modified XML is printed to STDOUT
use strict;
use warnings 'all';
use open qw/ :std :encoding(UTF-8) /;
use XML::Twig;
use constant XML_FILE => 'remarks.xml';
my $twig = XML::Twig->new(
keep_spaces => 1,
twig_handlers => {
remarks => sub { $_->set_text($_->trimmed_text) }
}
);
$twig->parsefile(XML_FILE);
$twig->print;
input
Your sample data is invalid XML, so I have edited it to look like this. I have added the XML declaration that you said in a comment that you had, and I have added a root element <data>
<?xml version="1.0" encoding="UTF-8"?>
<data>
<paymentTerm keyValue1="8" objectType="PAYMENTTERM" />
<paymentType keyValue1="20" objectType="PAYMENTTYPE" />
<priceList keyValue1="1" objectType="PRICELIST" />
<remarks>Zollanmeldung ab 250 €
Lager Adresse:
Hessen-Ring 456
D-64546 Mörfelden-Walldorff
eine Stunde vor Ankunft melden unter Mobile
Neu Spedition
A&R Logistics Group
Storkenburgstrasse 99
D-62546 Mörfelden-Walldorf
www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON" />
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE" />
</data>
output
<?xml version="1.0" encoding="UTF-8"?>
<data>
<paymentTerm keyValue1="8" objectType="PAYMENTTERM"/>
<paymentType keyValue1="20" objectType="PAYMENTTYPE"/>
<priceList keyValue1="1" objectType="PRICELIST"/>
<remarks>Zollanmeldung ab 250 € Lager Adresse: Hessen-Ring 456 D-64546 Mörfelden-Walldorff eine Stunde vor Ankunft melden unter Mobile Neu Spedition A&R Logistics Group Storkenburgstrasse 99 D-62546 Mörfelden-Walldorf www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON"/>
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE"/>
</data>
my $string = "<name>
POWERDOWN_SUPPORT
</name>
<bool>
<value> true </value>
</bool>";
if ($string=~ s/POWERDOWN_SUPPORT<\/name><bool><value>.*?<\/value>/<false>/ims) {
print "$string\n";
}
How do I get the replacement to work?
Expected output:
<name>
POWERDOWN_SUPPORT
</name>
<bool>
<value> false </value>
</bool>
This may work:
s/(POWERDOWN_SUPPORT\s*?<\/name>\s*?<bool>\s*?<value>).*?(<\/value>)/$1 false $2/s
Use a XML aware tool, parsing XML with regular expressions is hard and error prone.
For example, xsh, a wrapper around XML::LibXML:
open file.xml ;
my $v = //name[normalize-space(.)='POWERDOWN_SUPPORT']/following-sibling::bool/value[normalize-space(.)='true'];
set $v/text() 'false' ;
save :b ;
I've done this before with sed, but I would like to know how this is done in Perl.
I have an CSV file that looks something like this:
IFB, Northpole, Alaska, 907-555-5555,,,,
Walmart, Fairbanks, Alaska,,,,,
Chicken, Anchorage, Alaska, 907-555-5555,,,,,
Beef, Somewhere,,,,, Over the Rainbow,,,907-555-5555
etc...
What I need is to remove the extra commas, but not the single commas separating the values. This means I need to remove any occurrence of more than 1 commas, no matter where it occurs in the file.
Desired output:
IFB, Northpole, Alaska, 907-555-5555
Walmart, Fairbanks, Alaska
Chicken, Anchorage, Alaska, 907-555-5555
Beef, Somewhere, Over the Rainbow, 907-555-5555
Here an example:
#!/usr/bin/env perl
use strict;
while(<DATA>) {
s/,+[\t ]*/, /g; # Remove extra commas
s/ $//g; # Remove trailing space
print;
}
__DATA__
IFB, Northpole, Alaska, 907-555-5555,,,,
Walmart, Fairbanks, Alaska,,,,,
Chicken, Anchorage, Alaska, 907-555-5555,,,,,
Beef, Somewhere,,,,,Over the Rainbow,,,907-555-5555
etc...
You can also use Perl as sed with perl -pe:
cat myfile | perl -pe 's/,+[\t ]*/, /g;'
The XML file looks like below:
<?xml version="1.0"?>
<application name="pos">
<artifact id="123" type="war" cycle="Release7-Sprint1">
<jira/>
<jenkins/>
<deployment/>
<scm>
<transaction id="1234" user="">
<file name=""/>
<file name=""/>
</transaction>
</scm>
</artifact>
</application>
My piece of code looks below and works fine when I use the hard coded value of attribute(name), instead of using a variable. I am referencing the line ( my $query =
'//application[#name="pos"]'; )
my $manifestDoc = $manifestFileParser->parse_file($manifestFile);
my $changeLogDoc = $changeLogParser->parse_file($changeLogXml );
my $changeLogRoot = $changeLogDoc->getDocumentElement;
#my $applicationName = pos;
my $query = '//application[#name="pos"]';
my $applicationNode = $manifestDoc->findnodes($query);
my $artifactNode = $manifestDoc->createElement('artifact');
$artifactNode->setAttribute("id",$artifactID);
$artifactNode->setAttribute("type",$artifactType);
$artifactNode->setAttribute("cycle",$releaseCycle);
$applicationNode->[0]->appendChild($artifactNode);
But if I modify the $query variable to use a variable ($applicationName) instead of a hard coded value of attribute, it gives me a compilation error saying below:
Can't call method "appendChild" on an undefined value at updateManifest.pl line
Modified code:
my $applicationName = "pos" ;
my $query = '//application[#name="$applicationName"]';
Not sure what is wrong. Anything to do with quotes?
Any help is much appreciated.
The expression '//application[#name="$applicationName"]' means the literal string with those contents – no variables are interpolated with single quotes. If you'd use double quotes, then both #name and $applicationName would be interpolated.
You have three options:
Use double quotes, but escape the #:
qq(//application[\#name="$applicationName"])
The qq operator is equivalent to double quotes "…" but can have arbitrary delimiters, which avoids the need to escape the " inside the string.
Concatenate the string:
'//application[#name="' . $applicationName . '"]'
This often has a tendency to be hard to read. I'd avoid this solution.
Use a sprintf pattern to build the string from a template:
sprintf '//application[#name="%s"]', $applicationName
If you don't already know printf patterns, you can find them documented in perldoc -f sprintf.
Lets say I have the following lines:
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
how can I edit this with sed (or perl) in order to read:
1a1b1c
2d2e2f
3a3b
4a4b4c4d4e4f
I have done with awk like this:
awk -F':' '{gsub(/:/, $1, $0); print $0}'
but takes ages to complete! So looking for something faster.
'Tis a tad tricky, but it can be done with sed (assuming the file data contains the sample input):
$ sed '/^\(.\):/{
s//\1/
: retry
s/^\(.\)\([^:]*\):/\1\2\1/
t retry
}' data
1a1b1c
2d2e2f
3a3b
4a4b4c4d4e4f
$
You may be able to flatten the script to one line with semi-colons; sed on MacOS X is a bit cranky at times and objected to some parts, so it is split out into 6 lines. The first line matches lines starting with a single character and a colon and starts a sequence of operations for when that is recognized. The first substitute replaces, for example, '1:' by just '1'. The : retry is a label for branching too - a key part of this. The next substitution copies the first character on the line over the first colon. The t retry goes back to the label if the substitute changed anything. The last line delimits the entire sequence of operations for the initially matched line.
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>) {
if ( s/^([^:]+)// ) {
my $delim = $1;
s/:/$delim/g;
}
print;
}
__DATA__
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
use feature qw/ say /;
use strict;
use warnings;
while( <DATA> ) {
chomp;
my #elements = split /:/;
my $interject = shift #elements;
local $" = $interject;
say $interject, "#elements";
}
__DATA__
1:a:b:c
2:d:e:f
3:a:b
4:a:b:c:d:e:f
Or on the linux shell command line:
perl -aF/:/ -pe '$i=shift #F;$_=$i.join $i,#F;' infile.txt