Printing text between regexps

Printing text between regexps - sed

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?

A pattern range is for multiline patterns. This is how you'd do that:
sed -n '/pat1/,/pat2/{/pat1\|pat2/!p}' inputfile
-n - don't print by default
/pat1/,/pat2/ - within the two patterns inclusive
/pat1\|pat2/!p - print everything that's not one of the patterns
What you may be asking for is what's between two patterns on the same line. One of the other answers will do that.
Edit:
A couple of examples:
$ cat file1
aaaa bbbb cccc
123 start 456
this is what
I want
789 end 000
xxxx yyyy zzzz
$ sed -n '/start/,/end/{/start\|end/!p}' file1
this is what
I want
You can shorten it by telling sed to use the most recent pattern again (//):
$ sed -n '/.*start.*/,/^[0-9]\{3\} end 0*$/{//!p}' file1
this is what
I want
As you can see, I didn't have to duplicate the long, complicated regex in the second part of the command.

sed -r 's/pat1(.*)pat2/\1/g' somefile.txt

I don't know the kind of pattern you used, but i think it is also possible with regular expressions.
cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'

you can use awk.
$ cat file
other TEXT
pat1 text i want pat2
pat1 TEXT I
WANT
pat2
other text
$ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file
text i want
TEXT I
WANT
The solution works for patterns that span multiple lines

Related

Improving sed program - conditions

I use this code according to this question.
$ names=(file1.txt file2.txt file3.txt) # Declare array
$ printf 's/%s/a-&/g\n' "${names[#]%.txt}" # Generate sed replacement script
s/file1/a-&/g
s/file2/a-&/g
s/file3/a-&/g
$ sed -f <(printf 's/%s/a-&/g\n' "${names[#]%.txt}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
How to make conditions that solve the following problem please?
names=(file1.txt file2.txt file3file2.txt)
I mean that there is a world in the names of files that is repeated as a part of another name of file. Then there is added a- more times.
I tried
sed -f <(printf 's/{%s}/{s-&}/g\n' "${files[#]%.tex}")
but the result is
\input{a-{file1}}
I need to find {%s} and a- place between { and %s

It's not clear from the question how to resolve conflicting input. In particular, the code will replace any instance of file1 with a-file1, even things like 'foofile1'.
On surface, the goal seems to be to change tokens (e.g., foofile1 should not be impacted by by file1 substitution. This could be achieved by adding word boundary assertion (\b) - before and after the filename. This will prevent the pattern from matching inside other longer file names.
printf 's/\\b%s\\b/a-&/g\n' "${names[#]%.txt}"

Since this explanation is too long for comment so adding an answer here. I am not sure if my previous answer was clear or not but my answer takes care of this case and will only replace exact file names only and NOT mix of file names.
Lets say following is array value and Input_file:
names=(file1.txt file2.txt file3file2.txt)
echo "${names[*]}"
file1.txt file2.txt file3file2.txt
cat file1
TEXT
\connect{file1}
\begin{file2}
\connect{file3}
TEXT
75
Now when we run following code:
awk -v arr="${names[*]}" '
BEGIN{
FS=OFS="{"
num=split(arr,array," ")
for(i=1;i<=num;i++){
sub(/\.txt/,"",array[i])
array1[array[i]"}"]
}
}
$2 in array1{
$2="a-"$2
}
1
' file1
Output will be as follows. You could see file3 is NOT replaced since it was NOT present in array value.
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{file3}
TEXT
75

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.

If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.

awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo

With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.

Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)

2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

sed delete lines not containing specific string

I'm new to sed and I have the following question. In this example:
some text here
blah blah 123
another new line
some other text as well
another line
I want to delete all lines except those that contain either string 'text' and or string 'blah', so my output file looks like this:
some text here
blah blah 123
some other text as well
Any hints how this can be done using sed?

This might work for you:
sed '/text\|blah/!d' file
some text here
blah blah 123
some other text as well

You want to print only those lines which match either 'text' or 'blah' (or both), where the distinction between 'and' and 'or' is rather crucial.
sed -n -e '/text/{p;n;}' -e '/blah/{p;n;}' your_data_file
The -n means don't print by default. The first pattern searches for 'text', prints it if matched and skips to the next line; the second pattern does the same for 'blah'. If the 'n' was not there then a line containing 'text and blah' would be printed twice. Although I could have use just -e '/blah/p', the symmetry is better, especially if you need to extend the list of matched words.
If your version of sed supports extended regular expressions (for example, GNU sed does, with -r), then you can simplify that to:
sed -r -n -e '/text|blah/p' your_data_file

You could simply do it through awk,
$ awk '/blah|text/' file
some text here
blah blah 123
some other text as well

Are you looking for the grep?
Here is an example to look for different texts.
cat yourfile.txt | grep "text\|blah"

returning the entire sentence and not just line

grep shows the lines where the search word is found. I have a text file where there is no line break and entire text is on a single line. Is there any way to instruct grep to show the contents of the left and right (just like -after, -before)?
I will like to see the entire sentence. The words between 2 fullstops. (i.e. the sentence where the word is found)

Use awk with the period as record separator and filter the records on the required pattern:
awk -v RS="." '/pattern/' file
Which is shorthand for:
awk -v RS="." '/pattern/{print}' file

You can temporarily chop the text into lines:
cat text.txt | sed 's/\./.\n/g' | grep pattern

Try this:
grep -o '[^.]*word[^.]*\.' file

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Printing text between regexps - sed

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?

sed -r 's/pat1(.*)pat2/\1/g' somefile.txt

I don't know the kind of pattern you used, but i think it is also possible with regular expressions. cat myfile | sed -r 's/^(.)pat1(.)pat2(.*)$/\2/g'

you can use awk. $ cat file other TEXT pat1 text i want pat2 pat1 TEXT I WANT pat2 other text $ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file text i want TEXT I WANT The solution works for patterns that span multiple lines

Related

Improving sed program - conditions

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

Select specific items from a file using sed

sed delete lines not containing specific string

returning the entire sentence and not just line

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Printing text between regexps - sed

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?

sed -r 's/pat1(.*)pat2/\1/g' somefile.txt

I don't know the kind of pattern you used, but i think it is also possible with regular expressions. cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'

you can use awk. $ cat file other TEXT pat1 text i want pat2 pat1 TEXT I WANT pat2 other text $ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file text i want TEXT I WANT The solution works for patterns that span multiple lines

Related

Improving sed program - conditions

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

Select specific items from a file using sed

sed delete lines not containing specific string

returning the entire sentence and not just line

Categories

Resources

I don't know the kind of pattern you used, but i think it is also possible with regular expressions. cat myfile | sed -r 's/^(.)pat1(.)pat2(.*)$/\2/g'