Deleting text from a file - sed

I am having a problem deleting a range of text from a file. See the file example below:
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>
I only want to delete beginning with the first <transaction> and ending with
the first : </transaction>. The delete should include <transaction> and </transaction>.
I think this can be accomplished using sed. But I have been unable to make it work.

awk '/transaction/ {b++} b>2'
Output:
<transaction>
some text
some text
some text
</transaction>
<transaction>
some text
some text
some text
</transaction>

if you only want to delete lines with tags, use:
sed -e '/<\/\?transaction>/d' file.txt
if you want to delete tags and text between them, use:
sed -e '/<transaction>/,/<\/transaction>/d' file.txt

If your input is like the one in the example, you can do that more easily with awk:
awk '{ if (p) print $0 }; $0=="</transaction>" { p = 1 }' input.txt
Edit:
if you need to skip the lines from, e.g., the 4th <transaction> to the next one:
awk 'BEGIN { p = 0 }; $0=="<transaction>" { p++ }; { if (p != 4) print $0 }' input.txt

This might work for you (GNU sed):
sed -n '/<transaction>/{:a;n;/<\/transaction>/!ba;:b;n;p;bb};p' file
This puts the sed invocation into grep mode. Prints any lines before the first instance of <transaction>, does not print and lines thereafter until the tag </transaction> has passed and then prints the remainder of the file.
Another solution expects the text to be well formed:
sed '1,/<\/transaction>/{/<transaction>/h;G;//!P;d}' file

Related

Extract substrings between strings

I have a file with text as follows:
###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###
I want to extract all strings between ### .
My desired output would be something like this:
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
I have tried the following:
grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'
This almost works but only seems to grab the first instance per line, so the first line in my output only grabs
interest1 moreinterest1
rather than
interest1 moreinterest1
interest2
Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:
awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
Here is an alternative grep + sed solution:
grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
This assumes there are no # characters in between ### markers.
With GNU awk for multi-char RS:
$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
You can use pcregrep:
pcregrep -o1 '###(.*?)###' file
The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.
o1 option will output Group 1 value only.
See the regex demo online.
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file
Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.
This might work for you (GNU sed):
sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file
Replace all occurrences of ###'s by newlines.
If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.

SED Insert text after a specific multi-line text field

I am looking to search for and add a new line of text after a specific multi-line text, in this example i need to add a space and text after "oldText" under "[old-text]" only:
[old-text]
oldText
[inserted-new-text]
newTxt
[alsoOld-text]
oldText
Here's what I have so far but the syntax is not correct:
printf "[old-text]\noldText"|sed '/\[old-text]\noldTex\t/a [inserted-new-text]\nnewTxt'
$ sed -e '/\[old-text\]/{N;s/oldText/&\n\n[inserted-new-text]\nnewTxt/}' inputFile
Use /<pattern>/ to find the [old-text] and then use N; to go to the next line and replace.
$ printf "[old-text]\noldText" | \
sed -e '/\[old-text\]/{N;s/oldText/&\n\n[inserted-new-text]\nnewTxt/}'
[old-text]
oldText
[inserted-new-text]
newTxt

delete string for each line with sed

My file contains x number of lines, I would like to remove the string before and after the reference string at the beginning and end of each line.
The reference string and string to remove are separated by space.
The file contains :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email string_err
#ttt...> test.user.car ->
test.user.address
è_ 788 test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
How to remove the string at the start of each line which contain "test" string and also the end of each line separated by space or tab with sed?
The desired result is :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email
test.user.car
test.user.address
test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
I interpret your question as: find the first word that is "word characters and at least one dots"
Tcl:
echo '
set fh [open [lindex $argv 1] r]
while {[gets $fh line] != -1} {puts [regexp -inline {\w+(?:\.\w+)+} $line]}
' | tclsh - file
sed
sed -r 's/.*\<([[:alpha:]]+(\.[[:alpha:]]+)).*/\1/' file
perl
perl -nE '/(\w+(\.\w+)+)/ and say $1' file
using sed like
sed -r 's/^[^ ]+[ ]+([^ ]+)[ ]+[^ ]*/\1/' file
This might work for you (GNU sed):
sed -r 's/.*(test\S+).*/\1/' file

Search for a particular multiline pattern using awk and sed

I want to read from the file /etc/lvm/lvm.conf and check for the below pattern that could span across multiple lines.
tags {
hosttags = 1
}
There could be as many white spaces between tags and {, { and hosttags and so forth. Also { could follow tags on the next line instead of being on the same line with it.
I'm planning to use awk and sed to do this.
While reading the file lvm.conf, it should skip empty lines and comments.
That I'm doing using.
data=$(awk < cat `cat /etc/lvm/lvm.conf`
/^#/ { next }
/^[[:space:]]*#/ { next }
/^[[:space:]]*$/ { next }
.
.
How can I use sed to find the pattern I described above?
Are you looking for something like this
sed -n '/{/,/}/p' input
i.e. print lines between tokens (inclusive)?
To delete lines containing # and empty lines or lines containing only whitespace, use
sed -n '/{/,/}/p' input | sed '/#/d' | sed '/^[ ]*$/d'
space and a tab--^
update
If empty lines are just empty lines (no ws), the above can be shortened to
sed -e '/#/d' -e '/^$/d' input
update2
To check if the pattern tags {... is present in file, use
$ tr -d '\n' < input | grep -o 'tags\s*{[^}]*}'
tags { hosttags = 1# this is a comment}
The tr part above removes all newlines, i.e. makes everything into one single line (will work great if the file isn't to large) and then search for the tags pattern and outputs all matches.
The return code from grep will be 0 is pattern was found, 1 if not.
Return code is stored in variable $?. Or pipe the above to wc -l to get the number of matches found.
update3
regex for searcing for tags { hosttags=1 } with any number of ws anywhere
'tags\s*{\s*hosttags\s*=\s*1*[^}]*}'
try this line:
awk '/^\s*#|^\s*$/{next}1' /etc/lvm/lvm.conf
One could try preprocessing the file first, removing commments and empty lines and introducing empty lines behind the closing curly brace for easy processing with the second awk.
awk 'NF && $1!~/^#/{print; if(/}/) print x}' file | awk '/pattern/' RS=

awk or sed replacing specific $3 with ajacent $2

File looks like
$1 $2 $3
Text Text2 *
Text Text4 Text3
I would like to search for *'s within a file and replace the with the text in the column next to it. While keeping the rest of the info... basicly replace the * logicaly with column 2.
Currently I am working with either sed or awk
awk : awk '{ if($3=*) {print$2}}' works... but I would like to keep $1,2 aswell
sed : sed -r 's/[*]//g' I can't get reg expression to replace with $2 properly
Any quick help, tips or tricks?
Contents of file.txt:
Text Text2 *
Text Text4 Text3
One way using awk:
awk '$3 == "*" { $3=$2 }1' file.txt
Results:
Text Text2 Text2
Text Text4 Text3
With GNU sed \S and \s can be used to represent non-space and space respectively, so you could accomplish what you want like this:
sed -r '/(\S+)\s+(\S+)\s+\*/ s//\1 \2 \2/'
The empty s/// command implicitly uses the matches from //.
If it is run on the input listed by steve:
sed -r '/(\S+)\s+(\S+)\s+\*/ s//\1 \2 \2/' file.txt
Output:
Text Text2 Text2
Text Text4 Text3
If you want to preserve inter-column whitespace use:
sed -r '/(\S+)(\s+)(\S+)(\s+)\*/ s//\1\2\3\4\3/'
awk solution:
awk '$3=="*"{$3=$2}1' file