I need to find the line with first occurrence of a pattern, then I need to replace the whole line with a completely new one.
I found this command that replaces the first occurrence of a pattern, but not the whole line:
sed -e "0,/something/ s//other-thing/" <in.txt >out.txt
If in.txt is
one two three
four something
five six
something seven
As a result I get in out.txt:
one two three
four other-thing
five six
something seven
However, when I try to modify this code to replace the whole line, as follows:
sed -e "0,/something/ c\COMPLETE NEW LINE" <in.txt >out.txt
This is what I get in out.txt:
COMPLETE NEW LINE
five six
something seven
Do you have any idea why the first line is lost?
The c\ command deletes all lines between and inclusive the first matching address through the second matching address, when used with 2 addresses, and prints out the text specified following the c\ upon matching the second address. If there is no line matching the second address in the input, it just deletes all lines (inclusively) between the first matching address through the last line. Since you want to replace one line only, you shouldn't use the c\ command on an address range. The c\ is immediately followed by a new-line character in normal usage.
The 0,/regexp/ address range is a GNU sed extension, which will try to match regexp in the first input line too, which is different from 1,/regexp/ in that aspect. So, the correct command in GNU sed could be
sed '0,/something/{/something/c\
COMPLETE NEW LINE
}' < in.txt
or simplified as pointed out by Sundeep
sed '0,/something/{//c\
COMPLETE NEW LINE
}' < in.txt
or a one-liner,
sed -e '0,/something/{//cCOMPLETE NEW LINE' -e '}' < in.txt
if a literal new-line character is not desirable.
This one-liner also works as pointed out by potong:
sed '0,/something/!b;//cCOMPLETE NEW LINE' in.txt
This might work for you (GNU sed):
sed '1!b;:a;/something/!{n;ba};cCOMPLETE NEW LINE' file
Set up a loop that will only operate from the first line.
Within in the loop, if the key word is not found in the current line, print the current line, fetch the next and repeat until the end of the file or a match is found.
When a match is found, change the contents of the current line to the required result.
N.B. The c command terminates any further processing of sed commands in the same way the d command does.
If there are lines in the input following the key word match, the negation of address at the start of the sed cycle will capture these lines and result in their printing and no further processing.
An alternative:
sed 'x;/./{x;b};x;/something/h;//cCOMPLETE NEW LINE' file
Or (specific to GNU and bash):
sed $'0,/something/{//cCOMPLETE NEW LINE\n}' file
Just use awk:
$ awk '!done && sub(/something/,"other-thing"){done=1} {print}' file
one two three
four other-thing
five six
something seven
$ awk '!done && sub(/.*something.*/,"other-thing"){done=1} {print}' file
one two three
other-thing
five six
something seven
$ awk '!done && /something/{$0="other-thing"; done=1} {print}' file
one two three
other-thing
five six
something seven
and look what you can trivially do if you want to replace the Nth occurrence of something:
$ awk -v n=1 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
other-thing
five six
something seven
$ awk -v n=2 '/something/ && (++cnt == n){$0="other-thing"} {print}' file
one two three
four something
five six
other-thing
Related
I have a file containing many blocks of lines. In each block, I have one numeric character of multiple digits (15353580 for instance). I need to extract all these numbers and put them as a column in a new file.
I came across this thread. The sed command does the job but does not separate the numbers from each other. Using the second example ("123 he23llo") of the most voted response, I would like to have 123-23 instead of 12323, where my '-' stands for a line break. How can I do so ?
You can use this sed,
sed -e 's/[^0-9]\+/-/g' -e 's/-$//'
Example:
$ echo "123 hel43lo 23fds" | sed -e 's/[^0-9]\+/-/g' -e 's/-$//'
123-43-23
everybody.
I don't understand dollar sign ($) in sed script programming, it is stand for last line of a file or a counter of sed?
I want to reverse order of lines (emulates "tac") of /etc/passwd. like following:
$ cat /etc/passwd | wc -l ----> 52 // line numbers
$ sed '1!G;h;$!d' /etc/passwd | wc -l ----> 52 // working correctly
$ sed '1!G;h;$d' /etc/passwd | wc -l ----> 1326 // no ! followed by $
$ sed '1!G;h;$p' /etc/passwd | wc -l ----> 1430 // instead !d by p
Last two example don't work right, who can tell me what mean does dollar sign stand for?
All the commands "work right." They just do something you don't expect. Let's consider the first version:
sed '1!G;h;$!d
Start with the first two commands:
1!G; h
After these two commands have been executed, the pattern space and the hold space both contain all the lines reads so far but in reverse order.
At this point, if we do nothing, sed would take its default action which is to print the pattern space. So:
After the first line is read, it would print the first line.
After the second line is read, it would print the second line followed by the first line.
After the third line is read, it would print the third line, followed by the second line, followed by the first line.
And so on.
If we are emulating tac, we don't want that. We want it to print only after it has read in the last line. So, that is where the following command comes in:
$!d
$ means the last line. $! means not-the-last-line. $!d means delete if we are not on the last line. Thus, this tells sed to delete the pattern space unless we are on the last line, in which case it will be printed, displaying all lines in reverse order.
With that in mind, consider your second example:
sed '1!G;h;$d'
This prints all the partial tacs except the last one.
Your third example:
sed '1!G;h;$p'
This prints all the partial tacs up through the last one but the last one is printed twice: $p is an explicit print of the pattern space for the last line in addition to the implicit print that would happen anyway.
I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)
Both grep and sed handle input line-by-line and, as far as I know, getting either of them to handle multiple lines isn't very straightforward. What I'm looking for is an alternative or alternatives to these two programs that treat newlines as just another character. Is there any tool that fits such a criteria
The tool you want is awk. It is record-oriented, not line-oriented, and you can specify your record-separator by setting the builtin variable RS. In particular, GNU awk lets you set RS to any regular expression, not just a single character.
Here is an example where awk uses one blank line to separate every record. If you show us what data you have, we can help you with it.
cat file
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
eight line
more data
Running awk on this and reconstruct data using blank line as new record.
awk -v RS= '{$1=$1}1' file
first line second line third line
fourth line fifth line sixth line
seventh line eight line
more data
PS RS is not equal to file, is set to RS= blank, equal to RS=""
1) Sed can handle a block lines together, not always line by line.
In sed, normally I use :loop; $!{N; b loop}; to get all the lines available in pattern space delimited by newline.
Sample:
Productivity
Google Search\
Tips
"Web Based Time Tracking,
Web Based Todo list and
Reduce Key Stores etc"
result (remove the content between ")
sed -e ':loop; $!{N; b loop}; s/\"[^\"]*\"//g' thegeekstuff.txt
Productivity
Google Search\
Tips
You should read this URL (Unix Sed Tutorial: 6 Examples for Sed Branching Operation), it will give you detail how it works.
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
2) For grep, check if your grep support -z option, which needn't handle input line by line.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.
I want to get a list of lines in a batch file which are greater than 120 characters length. For this I thought of using sed. I tried but I was not successful. How can i achieve this ?
Is there any other way to get a list other than using sed ??
Thanks..
Another way to do this using awk:
cat file | awk 'length($0) > 120'
You can use grep and its repetition quantifier:
grep '.\{120\}' script.sh
Using sed, you have some alternatives:
sed -e '/.\{120\}/!d'
sed -e '/^.\{,119\}$/d'
sed -ne '/.\{120\}/p'
The first option matches lines that don't have (at least) 120 characters (the ! after the expression is to execute the command on lines that don't match the pattern before it), and deletes them (ie. doesn't print them).
The second option matches lines that from start (^) to end ($) have a total of characters from zero to 119. These lines are also deleted.
The third option is to use the -n flag, which tells sed to not print lines by default, and only print something if we tell it to. In this case, we match lines that have (at least) 120 characters, and use p to print them.