Remove duplicate words in a line with sed

Remove duplicate words in a line with sed - sed

Purely academic, but it's frustrating me.
I want to correct this text:
there there are are multiple lexical errors in this line line
using sed. I've got this far:
sed 's/\([a-z][a-z]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text
It corrects everything except the final doubled up words!
there are multiple lexical errors in this line line
Can a sed guru please explain why the above doesn't deal with the words at the end?

This is because in the last case (line) your regex memory 1 will have line (line followed by a space) in it and you are searching for its repetition. Since there is not space after the last line the match fails.
To fix this add a space after the ending word line.
Alternatively you can change the regex to:
sed -e 's/\b\([a-z]\+\)[ ,\n]\1/\1/g'
See it

Related

about use sed Modify the file？

I have a question about using sed to modify file. My file content:
<data-value name="WLS_INSTALL_DIR" value="/home/Oracle/wlserver_10.3">
I want to replace the content of field value="/home/Oracle/wlserver_10.3"
to get this result:
<data-value name="WLS_INSTALL_DIR" value="/u03/Middle_home/Oracle/wlserver_10.3">
I use sed:
sed "6 i/^value=/>/s/value= />\(.*\)/value=\"\/u03\/Oracle/Middleware/wlserver_10.3"\" \/\ /u03/silent.xml

Your sed script has a number of issues.
First off, anything that looks like 6istuff will simply write everything after i ("insert") verbatim as a new line before the sixth line. (Some dialects require a newline after the i and will basically do nothing.)
Secondly, ^value= does not match your input; it would only select a line starting with the string value= (the ^ metacharacter means beginning of line).
Thirdly, the /> in your subsitution regex terminates the substitution and so everything from > onwards is parsed as invalid flags for the substitution. I cannot see the purpose of this part, anyway; it doesn't match your data, and so the regex fails.
What remains after removing all these superfluous and erroneous details is a more or less useful sed script. (I assume the 6 to address only the sixth line of input is intentional, although you don't mention this in the question at all.) I have made some additional minor improvements, such as using % as the substitution delimiter and tightening the regex so that it only ever substitutes a double-quoted value.
sed '6s%value="[^"]*"%value="/u03/Oracle/Middleware/wlserver_10.3"%' /u03/silent.xml
Better than 6 would perhaps be to identify the line with /name="WLS_INSTALL_DIR"/.
Still, as alluded to in a comment, the proper way to manipulate XML is with a dedicated tool such as xsltproc.

Try:
sed 's|/home|/u03/Middle_home|'

Use Sed to modify a line that has an initial space and contains a comma

This should be extremely simple, but for the life of me I just can't get gnu-sed to do it this afternoon.
The file in question has lines that look like this:
PART NUMBER PART NUMBER QUANTITY WEIGHT -999 -4,999 -9,999
w/ UL APPROVAL
MIN-3
I need to prepend every line like the "MIN-3" line with a ">" character, and the only thing specifically differentiating those lines from the others are two things:
The first character is a space " ".
The lines do not contain a comma.
I've tried mostly things like any of the following:
/^ +[^,]+$/ s/^/>/
/^ +[\w\-]+$/ s/^/>/
/^ +(\w|\-)+$/ s/^/>/
I will admit, I am somewhat new to sed. :)
Edit: Answers that use perl, or awk could also be appreciated, though my initial target is sed.

try this:
sed '/^ [^,]*$/s/^/>/'
the output is, only the line with MIN-3 with leading >
sed default uses basic regex. so the + should be \+ in your script. I think that could be the problem killing your time. You could add -r however, to let sed use extended-regex.

According to your description this should do:
sed 's/^\([ ][^,]*\)$/> \1/' input
which matches the complete line if the line starts with a space and then contains anything but a comma until the end.

Here is a simple answer:
sed 's/^ [^,]*$/>&/'

Replace 3 lines with another line SED Syntax

This is a simple question, I'm not sure if i'm able to do this with sed/awk
How can I make sed search for these 3 lines and replace with a line with a determined string?
<Blarg>
<Bllarg>
<Blllarg>
replace with
<test>
I tried with sed "s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/g" But it just don't seem to find these lines. Probably something with my break line character (?) \n. Am I missing something?

Because sed usually handles only one line at a time, your pattern will never match. Try this:
sed '1N;$!N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/;P;D' filename

This might work for you:
sed '/<Blarg>/ {N;N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/}' <filename>
It works as follows:
Search the file till <Blarg> is found
Then append the two following lines to the current pattern space using N;N;
Check if the current pattern space matches <Blarg>\n<Bllarg>\n<Blllarg>
If so, then substitute it with <test>

You can use range addresses with regular expressions an the c command, which does exactly what you are asking for:
sed '/<Blarg>/,/<Blllarg>/c<test>' filename

Can I use the sed command to replace multiple empty line with one empty line?

I know there is a similar question in SO How can I replace mutliple empty lines with a single empty line in bash?. But my question is can this be implemented by just using the sed command?
Thanks

Give this a try:
sed '/^$/N;/^\n$/D' inputfile
Explanation:
/^$/N - match an empty line and append it to pattern space.
; - command delimiter, allows multiple commands on one line, can be used instead of separating commands into multiple -e clauses for versions of sed that support it.
/^\n$/D - if the pattern space contains only a newline in addition to the one at the end of the pattern space, in other words a sequence of more than one newline, then delete the first newline (more generally, the beginning of pattern space up to and including the first included newline)

You can do this by removing empty lines first and appending line space with G command:
sed '/^$/d;G' text.txt
Edit2: the above command will add empty lines between each paragraph, if this is not desired, you could do:
sed -n '1{/^$/p};{/./,/^$/p}'
Or, if you don't mind that all leading empty lines will be stripped, it may be written as:
sed -n '/./,/^$/p'
since the first expression just evaluates the first line, and prints it if it is blank.
Here: -n option suppresses pattern space auto-printing, /./,/^$/ defines the range between at least one character and none character (i.e. empty space between newlines) and p tells to print this range.

Remove Leading Whitespace from File

My shell has a call to 'fortune' in my .login file, to provide me with a little message of the day. However, some of the fortunes begin with one leading whitespace line, some begin with two, and some don't have any leading whitespace lines at all. This bugs me.
I sat down to wrapper fortune with my own shell script, which would remove all the leading whitespace from the input, without destroying any formatting of the actual fortune, which may intentionally have lines of whitespace.
It doesn't appear to be an easy one-liner two-minute fix, and as I read(reed) through the man pages for sed and grep, I figured I'd ask our wonderful patrons here.

Using the same source as Dav:
# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Additionally, here's why this works:
The comma separates a "range" of operation. sed can accept regular expressions for range definitions, so /./ matches the first line with "anything" (.) on it and $ specifies the end of the file. Therefore,
/./,$ matches "the first not-blank line to the end of the file".
! then inverts that selection, making it effectively "the blank lines at the top of the file".
d deletes those lines.

# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Just pipe the output of fortune into it:
fortune | sed '/./,$!d'

How about:
sed "s/^ *//" < fortunefile

i am not sure about how your fortune message actually looks like, but here's an illustration
$ string=" my message of the day"
$ echo $string
my message of the day
$ echo "$string"
my message of the day
or you could use awk
echo "${string}" | awk '{gsub(/^ +/,"")}1'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove duplicate words in a line with sed - sed

Related

about use sed Modify the file？

Use Sed to modify a line that has an initial space and contains a comma

Replace 3 lines with another line SED Syntax

Can I use the sed command to replace multiple empty line with one empty line?

Remove Leading Whitespace from File

Categories

Resources