SED append on next n-th line, not append to next line - sed

How can I get sed to append sometext two line after it find a match?
For example:
text0
text1
text2
text3
after I match text0, I want to append text4 after the next 2 line, that is:
text0
text1
text2
text4
text3

I'd say:
sed -e '/text0/ { N; N; a text4' -e '}' filename
That is:
/text0/ { # when finding a line that matches text0
N # fetch two more lines
N
# and only then append text4
a text4
}
When using this as a one-liner, it is necessary to split it into two -e options so that the a command doesn't attempt to append a line text4 }.
Alternatively, you could use
sed '/text0/ { N; N; s/$/\ntext4/; }' filename
this avoids using the somewhat unwieldy a command but requires you to escape some metacharacters in the replacement text (such as \ and &).

Perl solution:
perl -pe 'push #append, 3 + $. if /text0/;
shift #append, print "text4\n" if $append[0] == $.;
' input.txt > output.txt
You might need some more tweaking if the string is to be appended after the end of the input.
Explanation:
$. is the line number.
when /text0/ is matched, the line number where the append should happen is pushed into the array #append.
when the current line number corresponds to the one at the beginning of the array, the line is printed and the first element of the array is discarded.
It also means it works for overlapping matches and appends.

I would have used awk for this:
awk '/text0/ {f=NR} f && NR==f+2 {$0=$0RS"text4"}1' file
text0
text1
text2
text4
text3
When pattern is found, set f to current line number.
When f is true and two lines later f && NR==f+2 add new text $0=$0RS"text4".
1 print the result

Related

Eliminate duplicate words across lines

I'd like a sed script that eliminates repeated words in a text file on one or more lines. For example:
this is is is a text file file it is littered with duplicate words
words words on one or more lines lines
lines
lines
should transform to:
this is a text file it is littered with duplicate words
on one or more lines
This awk script produces the correct output:
{
for (i = 1; i <= NF; i++) {
word = $i
if (word != last) {
if (i < NF) {
next_word = $(i+1)
if (word != next_word) {
printf("%s ", word)
}
} else {
printf("%s\n", word)
}
}
}
last = word
}
but I'd really like a sed "one-liner".
This works with GNU sed, at least for the example input:
$ sed -Ez ':a;s/(\<\S+)(\s+)\1\s+/\1\2/g;ta' infile
This is a text file and is littered with duplicate words
on one or more lines
The -E option is just there to avoid having to escape the capture group parentheses and + quantifiers.
-z treats the input as null byte separated, i.e., as a single line.
The commmand is then structured as
:a # label
s///g # substitution
ta # jump to label if substitution did something
And the substitution is this:
s/(\<\S+)(\s+)\1\s+/\1\2/g
First capture group: (\<\S+) – a complete word (start of word boundary, one or more non-space characters
Second capture group: (\s+) – any number of blanks after that first word
\1\s+ – the first word again plus whatever blanks follow it
This preserves the whitespace after the first word and discards the whitespace after the duplicate.
Note that -E, -z, \<, \S and \s are all GNU extensions to POSIX sed.
With sed, you can use
sed -E 's/([a-z]+) +\1/\1/g'
Note that it works for duplicates. Not for triplicates or line breaks.
This can be fixed, by joining all the lines and looping.
sed -E ':a;N;s/(\b[a-z]+\b)([ \n])[ \n]*\b\1\b */\1\2/g;ba'
sed -En '
H
${
g
s/^\n//
s/(\<([[:alnum:]]+)[[:space:]]+)(\2([[:space:]]+|$))+/\1/g
p
}
' file
This is a text file with duplicate words
on one or more lines
where
H -- append each line to the hold space
${...} -- on the last line, perform the enclosed commands
g -- replace pattern space with the contents of the hold space
s/^\n// -- remove leading newline (side-effect of H on first line)
s/(\<([[:alnum:]]+)[[:space:]]+)(\2([[:space:]]+|$))+/\1/g
..1..2............2............1..........................
the key here is to capture the text and the spaces separately so that the back reference can match with differing whitespace.
captured expression #1 is the first word and it's whitespace (which can contain newlines), and the capture #2 is just the word.

Join two specific lines with sed

I'm trying to manipulate a dataset with sed so I can do it in a batch because the datasets have the same structure.
I've a dataset with two rows (first line in this example is the 7th row) like this:
Enginenumber; ABX 105;Productionnumber.;01 2345 67-
"",,8-9012
What I want:
Enginenumber; ABX 105;Productionnumber.;01 2345 67-8-9012
So the numbers (8-9012) in the second line have been added at the end of the first line because those numbers belong to each other
What I've tried:
sed '8s/7s/' file.csv
But that one does not work and I think that one will just replace whole row 7. The 8-9012 part is on row 8 of the file and I want that part added to row 7. Any ideas and is this possible?
Note: In the question's current form, a sed solution is feasible - this was not the case originally, where the last ;-separated field of the joined lines needed transforming as a whole, which prompted the awk solution below.
Joining lines 7 and 8 as-is, merely by removing the line break between them, can be achieved with this simple sed command:
sed '7 { N; s/\n//; }' file.csv
awk solution:
awk '
BEGIN { FS = OFS = ";" }
NR==7 { r = $0; getline; sub(/^"",,/, ""); $0 = r $0 }
1
' file.csv
Judging by the OP's comments, an additional problem is the presence of CRLF line endings in the input. With GNU Awk or Mawk, adding RS = "\r\n" to the BEGIN block is sufficient to deal with this (or RS = ORS = "\r\n", if the output should have CRLF line endings too), but with BSD Awk, which only supports single-character input record separators, more work is needed.
BEGIN { FS = OFS = ";" } tells Awk to split the input lines into fields by ; and to also use ; on output (when rebuilding the line).
Pattern NR==7 matches input line 7, and executes the associated action ({...}) with it.
r = $0; getline stores line 7 ($0 contains the input line at hand) in variable r, then reads the next line (getline), at which point $0 contains line 8.
sub(/^"",,/, "") then removes substring "",, from the start of line 8, leaving just 8-9012.
$0 = r $0 joins line 7 and modified line 8, and by assigning the concatenation back to $0, the string assigned is split into fields by ; anew, and the resulting fields are joined to form the new $0, separated by OFS, the output field separator.
Pattern 1 is a common shorthand that simply prints the (possibly modified) record at hand.
With sed:
sed '/^[^"]/{N;s/\n.*,//;}' file
/^[^"]/: search for lines not starting with ", and if found:
N: next line is appended to the pattern space
s/\n.*,//: all characters up to last , are removed from second line

delete string for each line with sed

My file contains x number of lines, I would like to remove the string before and after the reference string at the beginning and end of each line.
The reference string and string to remove are separated by space.
The file contains :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email string_err
#ttt...> test.user.car ->
test.user.address
è_ 788 test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
How to remove the string at the start of each line which contain "test" string and also the end of each line separated by space or tab with sed?
The desired result is :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email
test.user.car
test.user.address
test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
I interpret your question as: find the first word that is "word characters and at least one dots"
Tcl:
echo '
set fh [open [lindex $argv 1] r]
while {[gets $fh line] != -1} {puts [regexp -inline {\w+(?:\.\w+)+} $line]}
' | tclsh - file
sed
sed -r 's/.*\<([[:alpha:]]+(\.[[:alpha:]]+)).*/\1/' file
perl
perl -nE '/(\w+(\.\w+)+)/ and say $1' file
using sed like
sed -r 's/^[^ ]+[ ]+([^ ]+)[ ]+[^ ]*/\1/' file
This might work for you (GNU sed):
sed -r 's/.*(test\S+).*/\1/' file

Remove newline depending on the format of the next line

I have a special file with this kind of format :
title1
_1 texthere
title2
_2 texthere
I would like all newlines starting with "_" to be placed as a second column to the line before
I tried to do that using sed with this command :
sed 's/_\n/ /g' filename
but it is not giving me what I want to do (doing nothing basically)
Can anyone point me to the right way of doing it ?
Thanks
Try following solution:
In sed the loop is done creating a label (:a), and while not match last line ($!) append next one (N) and return to label a:
:a
$! {
N
b a
}
After this we have the whole file into memory, so do a global substitution for each _ preceded by a newline:
s/\n_/ _/g
p
All together is:
sed -ne ':a ; $! { N ; ba }; s/\n_/ _/g ; p' infile
That yields:
title1 _1 texthere
title2 _2 texthere
If your whole file is like your sample (pairs of lines), then the simplest answer is
paste - - < file
Otherwise
awk '
NR > 1 && /^_/ {printf "%s", OFS}
NR > 1 && !/^_/ {print ""}
{printf "%s", $0}
END {print ""}
' file
This might work for you (GNU sed):
sed ':a;N;s/\n_/ /;ta;P;D' file
This avoids slurping the file into memory.
or:
sed -e ':a' -e 'N' -e 's/\n_/ /' -e 'ta' -e 'P' -e 'D' file
A Perl approach:
perl -00pe 's/\n_/ /g' file
Here, the -00 causes perl to read the file in paragraph mode where a "line" is defined by two consecutive newlines. In your example, it will read the entire file into memory and therefore, a simple global substitution of \n_ with a space will work.
That is not very efficient for very large files though. If your data is too large to fit in memory, use this:
perl -ne 'chomp;
s/^_// ? print "$l " : print "$l\n" if $. > 1;
$l=$_;
END{print "$l\n"}' file
Here, the file is read line by line (-n) and the trailing newline removed from all lines (chomp). At the end of each iteration, the current line is saved as $l ($l=$_). At each line, if the substitution is successful and a _ was removed from the beginning of the line (s/^_//), then the previous line is printed with a space in place of a newline print "$l ". If the substitution failed, the previous line is printed with a newline. The END{} block just prints the final line of the file.

awk or sed replacing specific $3 with ajacent $2

File looks like
$1 $2 $3
Text Text2 *
Text Text4 Text3
I would like to search for *'s within a file and replace the with the text in the column next to it. While keeping the rest of the info... basicly replace the * logicaly with column 2.
Currently I am working with either sed or awk
awk : awk '{ if($3=*) {print$2}}' works... but I would like to keep $1,2 aswell
sed : sed -r 's/[*]//g' I can't get reg expression to replace with $2 properly
Any quick help, tips or tricks?
Contents of file.txt:
Text Text2 *
Text Text4 Text3
One way using awk:
awk '$3 == "*" { $3=$2 }1' file.txt
Results:
Text Text2 Text2
Text Text4 Text3
With GNU sed \S and \s can be used to represent non-space and space respectively, so you could accomplish what you want like this:
sed -r '/(\S+)\s+(\S+)\s+\*/ s//\1 \2 \2/'
The empty s/// command implicitly uses the matches from //.
If it is run on the input listed by steve:
sed -r '/(\S+)\s+(\S+)\s+\*/ s//\1 \2 \2/' file.txt
Output:
Text Text2 Text2
Text Text4 Text3
If you want to preserve inter-column whitespace use:
sed -r '/(\S+)(\s+)(\S+)(\s+)\*/ s//\1\2\3\4\3/'
awk solution:
awk '$3=="*"{$3=$2}1' file