Remove a string which is not present in parenthesis ()? [closed] - sed

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a file which consist a data as below, and I want to remove which data not present
in the parenthesis.
hello (welcome) to chennai (hai)
hello (how) this is for testing (with)
[is] this (bhuvanesh)
I want the output as below
(welcome) (hai)
(how) (with)
(bhuvanesh)

You can use the following sed command:
sed 's/[^(]*\(([^)]\+)\)[^(]*/\1/g' input.txt
Explanation:
I'm using the substitute command. In it's basic form it looks like this:
s/SEARCH/REPLACE/g
the g at end the means global, and means sed should reaplace all occurences of SEARCH not just the first.
The SEARCH pattern looks like this:
[^(]*\(([^)]\+)\)[^(]*
I'll try to explain it step by step...
[^(]*
[] is a character class, the ^ at the beginning means that the characters listed in the class should not match. We are listing only a single character - the opening parenthesis (. The * means this can occur zero or more times. In one sentence, sed is searching for all characters before the first starting parenthesis (.
\(([^)]\+)\)
(...) is a matching group. In the basic sed language it needs to get escaped: \(...\). The first character in the matching group is the opening parenthesis (. A character class [^)] is following. It matches every character except of the closing parenthesis ). The quantifier \+ means there must be at least one character between the parenthesises in your input text, if you would like to allow empty content you need to use the * as quantifier here. It follows the closing parenthesis ) and the end of the matching group \)..
Through the usage of the matching group, the matched content is available via \1 now.
The last part of the search pattern is the same as the first part:
[^(]*
It matches everything until the next opening parenthesis.
The REPLACE pattern is simple. It throws away everything except of the content of matching group \1.

This awk would do:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "(%s) ",$i;print ""}' file
(welcome) (hai)
(how) (with)
(bhuvanesh)
Or like this:
awk -F"[()]" '{for (i=2;i<=NF;i+=2) printf "%s ",$i;print ""}' file
welcome hai
how with
bhuvanesh

Try this one.
sed -r 's/\[.*\][^(]*//g ; s/.*(\(.*\)).*(\(.*\))/\1\2/g'

Related

sed - find multiple phrases and replace them [duplicate]

This question already has an answer here:
replace multiple strings in one line with sed
(1 answer)
Closed 1 year ago.
Got sed command which search one phrase and if found then whole line is replaced.
sed 's/.*phrase.*/123/'
That works great but how to use multiple phrases and if even one of them is found then replace whole line?
Was trying command below but no success:
sed 's/.*phrase1|phrase2.*/123/'
Using GNU sed.
You need to use an alternation operator like this:
sed 's/.*\(phrase1\|phrase2\).*/123/' # POSIX BRE way
sed -E 's/.*(phrase1|phrase2).*/123/' # POSIX ERE way
In POSIX BRE, \(phrase1\|phrase2\) means a capturing group that matches either phrase1 or phrase2. \| is a GNU extension for an alternation operator in a POSIX BRE pattern.
In POSIX ERE (enabled with -E option), you need to remove the backslashes from the above mentioned constructs: (phrase1|phrase2).
Grouping is necessary as it makes the alternation apply only to the contructs grouped, i.e. .*phrase1\|phrase2.* would match either anything from the start till the last phrase1 string or phrase2 till the end of string.
You are almost there, you need to put the different phrases between parentheses. Also, if you want to replace all occurrences of your pattern you need to add g at the end.
You can take a look at the docs:
https://www.gnu.org/software/sed/manual/sed.html

Keep lines containing "list of different words" like pattern [duplicate]

This question already has answers here:
How to make sed remove lines not matched by a substitution
(4 answers)
Boolean OR in sed regex
(4 answers)
Closed 4 years ago.
How can I keep all lines matching all those words
toto OR titi OR clic OR SOMETHING and delete any other lines?
If I do sed '/toto/ p ' file I cannot select titi for example.
What I am looking for is something similar to a Perl Regular expression as
^ (word1|word2|word3|andsoon).*. However, I need it for sed because it will be integrated into a bigger sed script.
The goal is to keep all lines starting with word where word is any word from a set of words.
The answer here depends a bit on how your master script is called. Imagine you have a file with the following content:
foo
car
bar
and you are interested in the lines matching "foo" and "bar", then you can do:
sed '/foo\|bar/!d'
sed -n '/foo\|bar/!d;p'
sed -n '/foo\|bar/p'
all these will output:
foo
bar
If you would just do:
sed '/foo\|bar/p'
you actually duplicate the lines.
foo
foo
car
bar
bar
As you see, there is a bit of different handling depending on the usage of the -n flag.
-n, --quiet, --silent suppress automatic printing of pattern space
source: man sed
In general, my suggestion is to delete the lines you don't need at the beginning of your sed script.

Using sed how to remove last character only in the first line [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How can I use sed to remove the last character from only the first line of a file?
You can for example use this:
sed '1 s/.$//' file
Explanation
1 indicates the line in which we want to perform the action.
given the syntax s/text/replacement/, we look for any character with . followed by $, which indicates end of line. Hence, we look for the last character before end of line and replace it with nothing. That is, we remove the last character of the line.
To edit the file you can use -i.bak.
Test
$ cat a
hello this is some text
and this is something else
$ sed '1 s/.$//' a
hello this is some tex
and this is something else
For fun, let's see how to accomplish this with awk:
awk -v FS= -v OFS= 'NR==1{NF=NF-1}1' file
This sets the input and output field separators (FS, OFS) as empty (same as BEGIN{FS=OFS=""}), so every single character is a field. Based on that, when the record is 1 (in this case, when we are in the 1st line), decrement the number of fields (NF) so that the last character is "lost". Then 1 is a true condition that makes awk perform its default action: {print $0}.

Sed expression that converts some matches to uppercase

This sed expression converts an input string into a two-line output string. Each of the two output lines are composed of substrings from the input. The first line needs to be convered into upper case:
s:random_stuff\(choice1\|choice2\){\([^}]*\)}:\U\1\n\2:
The aim is to convert
random_stuff_choice1{This is a sentence that MAY contain AnyThing}
random_stuff_choice2{This is another similar sentence}
into
CHOICE1
This is a sentence that MAY contain AnyThing
CHOICE2
This is another similar sentence
The problem I have is that \U aplies to everything following it so the second line is also uppercased. Is it possible to make \U apply to the first match only ?
With sed:
$ sed 's/.*\(choice[0-9]\+\){\([^}]*\)}/\U\1\n\E\2/' file
CHOICE1
This is a sentence that MAY contain AnyThing
CHOICE2
This is another similar sentence
With awk:
$ awk -F'{|}' 'gsub(/.*_/,""){print toupper($1)"\n"$2}' file
CHOICE1
This is a sentence that MAY contain AnyThing
CHOICE2
This is another similar sentence
Use \E to cancel the \U:
s:random_stuff_\(choice1\|choice2\){\([^}]*\)}:\U\1\E\n\2:

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark
Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile
$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.
this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.