Let's assume the following XML file:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I need to replace the first </addresses> following the first <addresses xmlns="namespace"> by </namespace:addresses> so that the file becomes:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I am aware of this similar thread, but none of the following solution changes anything:
sed -e '/<addresses xmlns="namespace">/!b' -e ':a' -e "s/<\/namespace:addresses>/<\/addresses>/;t trail" -e 'n;ba' -e ':trail' -e 'n;btrail' file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/<\/namespace:addresses>/ s/<\/namespace:addresses>/<\/addresses>/" file.xml
For instance:
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Maybe this issue is linked to the sed I'm using: 4.7-1ubuntu1 on impish/21.10 or even 4.8-1.
Any suggestion?
I'm open to any other tool (perl/awk), the simpler, the better.
It is much easier with perl than with sed:
perl -0777 -i -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' file
See the online demo. Details:
<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1> - the regex pattern matching
< - a < char
(addresses) - Group 1 ($1): addresses
\s+ - one or more whitespaces
xmlns="namespace"> - a fixed string
[^<]*(?:<(?!/\1>)[^<]*)* - a much faster alternative to (?s:.)*? - basically, matches any text up to a </addresses> string
\K - match reset operator that omits all text matched so far from the current match memory buffer
</\1> - (this is what is finally consumed and will be replaced): </ + Group 1 value (so as not to repeat addresses) + >
</namespace:$1> - the replacement is </namespace: + Group 1 value + >.
It replaces the first occurrence because the -0777 slurps the file into a single multiline text and there is no g flag.
Note the difference between \1 backreference syntax inside the pattern and $1 replacement backreference in the replacement pattern in perl command.
See the online demo:
s=' some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...'
perl -0777 -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' <<< "$s"
Output:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Related
I have a file with text as follows:
###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###
I want to extract all strings between ### .
My desired output would be something like this:
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
I have tried the following:
grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'
This almost works but only seems to grab the first instance per line, so the first line in my output only grabs
interest1 moreinterest1
rather than
interest1 moreinterest1
interest2
Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:
awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
Here is an alternative grep + sed solution:
grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
This assumes there are no # characters in between ### markers.
With GNU awk for multi-char RS:
$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
You can use pcregrep:
pcregrep -o1 '###(.*?)###' file
The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.
o1 option will output Group 1 value only.
See the regex demo online.
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file
Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.
This might work for you (GNU sed):
sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file
Replace all occurrences of ###'s by newlines.
If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.
I am having a problem deleting a range of text from a file. See the file example below:
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>
I only want to delete beginning with the first <transaction> and ending with
the first : </transaction>. The delete should include <transaction> and </transaction>.
I think this can be accomplished using sed. But I have been unable to make it work.
awk '/transaction/ {b++} b>2'
Output:
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>
if you only want to delete lines with tags, use:
sed -e '/<\/\?transaction>/d' file.txt
if you want to delete tags and text between them, use:
sed -e '/<transaction>/,/<\/transaction>/d' file.txt
If your input is like the one in the example, you can do that more easily with awk:
awk '{ if (p) print $0 }; $0=="</transaction>" { p = 1 }' input.txt
Edit:
if you need to skip the lines from, e.g., the 4th <transaction> to the next one:
awk 'BEGIN { p = 0 }; $0=="<transaction>" { p++ }; { if (p != 4) print $0 }' input.txt
This might work for you (GNU sed):
sed -n '/<transaction>/{:a;n;/<\/transaction>/!ba;:b;n;p;bb};p' file
This puts the sed invocation into grep mode. Prints any lines before the first instance of <transaction>, does not print and lines thereafter until the tag </transaction> has passed and then prints the remainder of the file.
Another solution expects the text to be well formed:
sed '1,/<\/transaction>/{/<transaction>/h;G;//!P;d}' file
How can I replace the following string:
<value>-myValue</value>
<value>1234</value>
And make it to be:
<value>-myValue</value>
<value>0</value>
Please take into account that there is a line break.
Script
sed -e '/<value>-myValue</,/<value>/{ /<value>[0-9][0-9]*</ s/[0-9][0-9]*/0/; }' data
From a line containing <value>-myValue< to the next line containing <value>, if the line matches <value>XX< where XX is a string of one or more digits, replace the string of digits with 0.
Input
This is not something to change
<value>-myValue</value>
<value>1234</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>4321</value>
stuffing
Output
This is not something to change
<value>-myValue</value>
<value>0</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>0</value>
stuffing
If this is XML, TLP is right that an XML parser would be superior. Continuing on with your sed approach, however, consider:
$ sed '/<value>-myValue/ {N; s/<value>[[:digit:]]\+/<value>0/}' file
<value>-myValue</value>
<value>0</value>
You can possibly simplify this a bit, depending on what criteria you specifically want to use:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' )->parse( \*DATA );
foreach my $value ( $twig->findnodes('//value') ) {
if ( $value->trimmed_text eq '-myValue'
and $value->next_sibling('value')
and $value->next_sibling('value')->text =~ m/^\d+$/ )
{
$value->next_sibling('value')->set_text('1234');
}
}
$twig->print;
__DATA__
<root>
<value>-myValue</value>
<value>0</value>
</root>
This outputs:
<root>
<value>-myValue</value>
<value>1234</value>
</root>
It parses your XML.
Looks for all nodes with a tag of value.
Checks that it has a sibling.
Checks that sibling is 'just numeric' e.g. matching regex ^\d+$
replaces the content of that sibling with 1234.
And will work on XML regardless of formatting, which is the problem with XML - pretty fundamentally there's a bunch of entirely valid things you can do that are semantically identical in XML.
I want to extract this data http://code.google.com/p/memcached-session-manager/wiki/JMXStatistics via jmx but using only command line.
This is because is the only way to enter to the server I got.
Any pointer would help
thanks in advance
I did up this quick-n-dirty solution after banging my head against the JMX protocol's inability to tunnel out over ssh.
(1) found some jmxterm thing, (2) stuff some simple command into it (runs an MBean operation) (after examining what JConsole did when attached to local tomcat..) and (3) format it into something readable.
#!/bin/bash
# wget http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
JARFILE="jmxterm-1.0-alpha-4-uber.jar"
JMXHOST=127.0.0.1
JMXPORT=9003
function die() { echo $1 ; exit 1 ; }
function checkexe() { [ -x `which $1` ] || die "cant find executable program $1" ; }
PROGS="java grep cat sed tr xsltproc xmllint"
for P in $PROGS ; do checkexe $P ; done
[ -f $JARFILE ] || die "can't see jar file $JARFILE"
## jmxterm commands to get thread stack dump
cat >/tmp/myscript.jmx<<EOF
domain java.lang
bean java.lang:type=Threading
run dumpAllThreads 1 1
quit
EOF
## get the stack traces
java -jar $JARFILE -l $JMXHOST:$JMXPORT -i /tmp/myscript.jmx > /tmp/allthreads.txt
grep "threadId" /tmp/allthreads.txt || die "stack trace get seemed to fail ?!"
## turn them into xml
cat /tmp/allthreads.txt | \
sed \
-e "s|\[ |<array>|g" -e "s| \]|</array>|g" \
-e "s|{|<obj>|g" -e "s|}|</obj>|g" \
-e "s|\([^ <>]*\) = \([^ <>]*\);|<\1>\2</\1>|g" \
-e "s|\([^ ]*\) = \([^;]*\) *;|<\1>\2</\1>|g" | \
tr '\r\n\t' ' ' | tr -s ' ' | \
sed \
-e "s|\([^ =]*\) = \([^;=]*\);|<\1>\2</\1>|g" \
-e "s|\([^ ]*\) = \([^;]*\);|<\1>\2</\1>|g"\
-e "s| *, *||g" > /tmp/allthreads.xml
## xsl to convert xml-ified stack traces to simpler format
cat >/tmp/jmxterm_threads.xslt<<EOF
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<jmx>
<xsl:for-each select="/array/obj">
<xsl:sort select="threadId" data-type="number"/>
<xsl:apply-templates select="." mode="thread"/>
</xsl:for-each>
</jmx>
</xsl:template>
<xsl:template match="obj" mode="thread">
<thread id="{threadId/text()}">
<xsl:copy-of select="threadName"/>
<xsl:copy-of select="threadState"/>
<xsl:copy-of select="suspended"/>
<xsl:copy-of select="inNative"/>
<xsl:copy-of select="waitedCount"/>
<xsl:copy-of select="blockedCount"/>
<xsl:apply-templates select="stackTrace"/>
</thread>
</xsl:template>
<xsl:template match="stackTrace">
<stackTrace>
<xsl:for-each select="./array/obj">
<stackfn class="{className}" method="{methodName}" file="{fileName}" line="{lineNumber}"/>
</xsl:for-each>
</stackTrace>
</xsl:template>
</xsl:stylesheet>
EOF
## simplify xml
xsltproc /tmp/jmxterm_threads.xslt /tmp/allthreads.xml | xmllint --format - > /tmp/allthreads_simple.xml
cat /tmp/allthreads_simple.xml
For CLI access I recommend jmx4perl and the readline based JMX shell j4psh. They require a Jolokia agent on the the server side, though. These agents are available in several flavours (war, osgi, mule, jdk6). Since memcached-session-manager runs within a Tomcat, you probably should use the WAR agent and simply drop it into the webapps/ directory.
I have a file which contains "title" written in it many times. How can I find the number of times "title" is written in that file using the sed command provided that "title" is the first string in a line? e.g.
# title
title
title
should output the count = 2 because in first line title is not the first string.
Update
I used awk to find the total number of occurrences as:
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
But how can I tell awk to count only those lines having title the first string as explained in example above?
Never say never. Pure sed (although it may require the GNU version).
#!/bin/sed -nf
# based on a script from the sed info file (info sed)
# section 4.8 Numbering Non-blank Lines (cat -b)
# modified to count lines that begin with "title"
/^title/! be
x
/^$/ s/^.*$/0/
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h
:e
$ {x;p}
Explanation:
#!/bin/sed -nf
# run sed without printing output by default (-n)
# using the following file as the sed script (-f)
/^title/! be # if the current line doesn't begin with "title" branch to label e
x # swap the counter from hold space into pattern space
/^$/ s/^.*$/0/ # if pattern space is empty start the counter at zero
/^9*$/ s/^/0/ # if pattern space starts with a nine, prepend a zero
s/.9*$/x&/ # mark the position of the last digit before a sequence of nines (if any)
h # copy the marked counter to hold space
s/^.*x// # delete everything before the marker
y/0123456789/1234567890/ # increment the digits that were after the mark
x # swap pattern space and hold space
s/x.*$// # delete everything after the marker leaving the leading digits
G # append hold space to pattern space
s/\n// # remove the newline, leaving all the digits concatenated
h # save the counter into hold space
:e # label e
$ {x;p} # if this is the last line of input, swap in the counter and print it
Here are excerpts from a trace of the script using sedsed:
$ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter
PATT:title$
HOLD:$
COMM:/^title/ !b e
COMM:x
PATT:$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:0$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:0$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x0$
HOLD:title$
COMM:h
PATT:x0$
HOLD:x0$
COMM:s/^.*x//
PATT:0$
HOLD:x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:x0$
COMM:x
PATT:x0$
HOLD:1$
COMM:s/x.*$//
PATT:$
HOLD:1$
COMM:G
PATT:\n1$
HOLD:1$
COMM:s/\n//
PATT:1$
HOLD:1$
COMM:h
PATT:1$
HOLD:1$
COMM::e
COMM:$ {
PATT:1$
HOLD:1$
PATT:title$
HOLD:1$
COMM:/^title/ !b e
COMM:x
PATT:1$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:1$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:1$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x1$
HOLD:title$
COMM:h
PATT:x1$
HOLD:x1$
COMM:s/^.*x//
PATT:1$
HOLD:x1$
COMM:y/0123456789/1234567890/
PATT:2$
HOLD:x1$
COMM:x
PATT:x1$
HOLD:2$
COMM:s/x.*$//
PATT:$
HOLD:2$
COMM:G
PATT:\n2$
HOLD:2$
COMM:s/\n//
PATT:2$
HOLD:2$
COMM:h
PATT:2$
HOLD:2$
COMM::e
COMM:$ {
PATT:2$
HOLD:2$
PATT:foo$
HOLD:2$
COMM:/^title/ !b e
COMM:$ {
PATT:foo$
HOLD:2$
. . .
PATT:10$
HOLD:10$
PATT:title$
HOLD:10$
COMM:/^title/ !b e
COMM:x
PATT:10$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:10$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:10$
HOLD:title$
COMM:s/.9*$/x&/
PATT:1x0$
HOLD:title$
COMM:h
PATT:1x0$
HOLD:1x0$
COMM:s/^.*x//
PATT:0$
HOLD:1x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:1x0$
COMM:x
PATT:1x0$
HOLD:1$
COMM:s/x.*$//
PATT:1$
HOLD:1$
COMM:G
PATT:1\n1$
HOLD:1$
COMM:s/\n//
PATT:11$
HOLD:1$
COMM:h
PATT:11$
HOLD:11$
COMM::e
COMM:$ {
COMM:x
PATT:11$
HOLD:11$
COMM:p
11
PATT:11$
HOLD:11$
COMM:}
PATT:11$
HOLD:11$
The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed debugger isn't being used.
Revised answer
Succinctly, you can't - sed is not the correct tool for the job (it cannot count).
sed -n '/^title/p' file | grep -c
This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:
grep -c '^title' file
Original answer - before the question was edited
Succinctly, you can't - it is not the correct tool for the job.
grep -c title file
sed -n /title/p file | wc -l
The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title.
You could fix that with something like:
cat file |
tr ' ' '\n' |
grep -c title
The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.
I don't think sed would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c to count the occurrences.
I like Jonathan's idea of using tr to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep will be able to count just the lines with the single word 'title'.
just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.
$ more file
# title
# title
one
two
#title
title title
three
title junk title
title
four
fivetitlesixtitle
last
$ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
total: 7
if you just want "title" as the first string, use "==" instead of ~
awk '$1 == "title"{++c}END{print c}' file
sed 's/title/title\n/g' file | grep -c title
This might work for you:
sed '/^title/!d' file | sed -n '$='