sed: How can I print all matching in each line? - sed

I have a two-line string:
> a="Microarchitectural Data Sampling (MDS) aka CVE-2018-12126, CVE-2018-12127,CVE-2018-12130, CVE-2019-11091, publicly announced by Intel on 5/14/2019, this has high visibility and lots of public media exposure.\nMicroarchitectural Data Sampling (MDS) aka CVE-2018-12126, CVE-2018-12127,CVE-2018-12130, CVE-2019-11091, publicly announced by Intel on 5/14/2019, this has high visibility and lots of public media exposure."
> echo -e $a
Microarchitectural Data Sampling (MDS) aka CVE-2018-12126, CVE-2018-12127,CVE-2018-12130, CVE-2019-11091, publicly announced by Intel on 5/14/2019, this has high visibility and lots of public media exposure.
Microarchitectural Data Sampling (MDS) aka CVE-2018-12126, CVE-2018-12127,CVE-2018-12130, CVE-2019-11091, publicly announced by Intel on 5/14/2019, this has high visibility and lots of public media exposure.
What I want to print is:
CVE-2018-12126 CVE-2018-12127 CVE-2018-12130 CVE-2019-11091
CVE-2018-12126 CVE-2018-12127 CVE-2018-12130 CVE-2019-11091
# OR
CVE-2018-12126
CVE-2018-12127
CVE-2018-12130
CVE-2019-11091
CVE-2018-12126
CVE-2018-12127
CVE-2018-12130
CVE-2019-11091
I've tried below:
> echo -e $a | sed -r 's/.*(CVE-[0-9]{4}-[0-9]{4,6}).*/\1/g'
CVE-2019-11091
CVE-2019-11091
It only print the last matching of each line :-)
How to print all the matching groups?

Use grep with -o option that will output matched substrings only:
grep -o 'CVE-[0-9]\{4\}-[0-9]\{4,6\}' file > outputfile
Note the braces in \{4\} are escaped since this is the default POSIX BRE engine compliant regex.
With sed, the easy solution is to use two steps: wrap the expected matches with newlines and then extract those that exactly match your pattern:
pat='CVE-[0-9]\{4\}-[0-9]\{4,6\}'
sed "s/$pat/\n&\n/g" file.txt | sed -n "/^$pat\$/p" > outputfile
Output:
CVE-2018-12126
CVE-2018-12127
CVE-2018-12130
CVE-2019-11091
CVE-2018-12126
CVE-2018-12127
CVE-2018-12130
CVE-2019-11091
See the online demo

This might work for you (GNU sed):
sed -E '/\n/!s/CVE-[0-9]{4}-[0-9]{4,6}/\n&\n/g;/^CVE-[0-9]{4}-[0-9]{4,6}/P;D' file
Surround the required strings by newlines and then print those lines only.
Or if you prefer:
regexp='CVE-[0-9]\{4\}-[0-9]\{4,6\}'
sed '/\n/!s/'$regexp'/\n&\n/g;/^'$regexp'/P;D' file

Related

Extract version from string cross platform

I need to create a line in makefile which will extract the version from string, and will work cross-platform, ideally without dependencies.
This is what I had
echo "golangci-lint has version 1.42.0 built..." | grep -oP '\d+\.\d+\.\d'
retuslt: 1.42.0
But it doesn't work on mac.
Trying to do it with sed like this, but doesn't work
echo "golangci-lint has version 1.42.0 built ..." | sed -n 's/.*\(\d+\.\d+\.\d\).*/\1/p'
grep -ow '[0-9][0-9.]\+[0-9]'
That uses only a basic regular expression, and options that BSD grep and GNU grep share.
You can use
echo "golangci-lint has version 1.42.0 built ..." | sed -En 's/.*([0-9]+\.[0-9]+\.[0-9]+).*/\1/p'
Details:
-E - enables the POSIX ERE syntax
n - default line output is suppressed now
.*([0-9]+\.[0-9]+\.[0-9]+).* - any text, then Group 1 capturing one or more digits, ., one or more digits, ., one or more digits and the rest of the line
\1 - the replacement is just Group 1 value
p - only the substitution result is printed.
With your shown samples, you could try following awk program which will print only matched value of version out of whole line.
echo "golangci-lint has version 1.42.0 built ..." |
awk '
{
match($0,/[0-9]+\.[0-9]+\.[0-9]+/)
print substr($0,RSTART,RLENGTH)
}
'
Explanation: Simple explanation would be, printing line's value with echo command of shell here and sending its output as a standard input to awk code, where using match function to match mentioned regex in it. If there is a match then printing matched value.
Explanation of regex:
[0-9]+\.[0-9]+\.[0-9]+: Matching 1 or more occurrences of digits followed by . followed by 1 or more occurrences of digits followed by another dot. followed by 1 or more digits.
-P is an experimental feature in gnu-grep which is not available on Mac BSD. However default grep available in Mac can handle it easily with -E switch but you have to use [0-9] or [[:digit:]] in place of \d in your search pattern:
s="golangci-lint has version 1.42.0 built..."
grep -Eo '([0-9]+\.)+[0-9]+' <<< "$s"
# or else
grep -Eo '([[:digit:]]+\.)+[[:digit:]]+' <<< "$s"
1.42.0
As a side note I have gnu-grep installed on my Mac using home brew package.
Suggesting the following:
echo "golangci-lint has version 1.42.0 built..." | grep -o '[0-9\.]\{4,\}'
Explanation
[0-9\.] --- match a single digit or dot(.)
\{4,\} --- the matched charterer 4 or more times.
This awk is 100% POSIX:
awk 'match($0, /[0-9][0-9.]+[0-9]/) {print substr($0, RSTART, RLENGTH)}'
It will always print the first match and only (up to) one match per line. There can be zero or more dots in the number, but leading/trailing dots won't get printed.
grep -o is quite portable, but not every platform supported by Go has it. Eg. IBM AIX. Also note that if a line has multiple matches, it will print each match on a new line.

sed remove line if neither pattern provided don't match

I am trying to create a filter command to reduce the lines from a log file, assume each line contains partition made of date,
/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
....
/iamthepathxx/20210619/filexx.txt
then from thousands of lines I only want to keep the ones with two string in the path
/202106
/202105
and remove any other lines
I have tried following command
sed -i -e '\(/202105\|/202106\)!d' ~/log.txt
above command threw
sed: -e expression #1, char 24: unterminated address regex
You can use
sed -i '/\/20210[56]/!d' ~/log.txt
Or, if you need to use more specific alternatives and further enhance the pattern:
sed -i -E '/\/(202105|202106)/!d' ~/log.txt
Details:
-i - GNU sed option for inline file replacement
-E - option enabling POSIX ERE regex syntax
/\/20210[56]/ - regex that matches /20210 and then either 5 or 6
\/(202105|202106) - the POSIX ERE pattern that matches / and then either 202105 or 202106
!d - removes the lines not matching the pattern.
See the online demo:
#!/bin/bash
s='/iamthepath01/20200301/file01.txt
/iamthepath02/20200302/file02.txt
/iamthepathxx/20210619/filexx.txt'
sed '/\/20210[56]/!d' <<< "$s"
Output:
/iamthepathxx/20210619/filexx.txt
sed is the wrong tool for this. If you want a script that's as fragile as the sed one then use grep as it's the tool that exists solely to do a simple g/re/p (hence the name) like you're doing:
$ grep '/20210[56]' file
/iamthepathxx/20210619/filexx.txt
or if you want a more robust solution that focuses just on the part of the line you want to match and so will avoid false matches, then use awk:
$ awk -F '/' '$3 ~ /^20210[56]/' file
/iamthepathxx/20210619/filexx.txt
This might work for you (GNU sed):
sed -ni '\#/20210[56]#p' file
This uses seds -n grep-like option to turn off implicit printing and -i option to edit the file in place.
Normally sed uses the /.../ to match but other delimiters may be used if the first is escaped e.g. \#...#.
So the above solution will filter the existing file down to lines that contain either /202105 or /202106.
N.B. grep will almost certainly be faster in finding the above lines however the use of the -i option may be the ultimate reason for choosing sed (although the same outcome can be achieved by tacking on the > tmpFile && mv tmpFile file to a grep solution).

Parsing a line with sed using regular expression

Using sed I want to parse Heroku's log-runtime-metrics like this one:
2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages
the desired output is:
worker.2: 54.01MB (54.01MB is being memory_total)
I could not manage although I tried several alternatives including:
sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'
What is wrong with my command? How can it be corrected?
The .+ after source= and memory_total= are both greedy, so they accept as much of the line as possible. Use [^ ] to mean "anything except a space" so that it knows where to stop.
sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'
Putting your content into https://regex101.com/ makes it really obvious what's going on.
I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:
sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'
The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.
This produces your desired output from the given line of data:
worker.2: 54.01MB
Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:
heroku logs -t -s heroku | grep memory_total | sed.......
Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).
You should be able to use:
sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'
The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

Extract pattern between a substring and first occurrence of numeric in a string

Following is the content of a file:
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r
I want to extract component names component1 component2 etc.
This is what I tried:
for line in `sed -n -e '/^xxx-/p' $file`
do
comp=`echo $line | sed -e '/xxx-/,/[0-9]/p'`
echo "comp - $comp"
done
I also tried sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'
This is based on some info on net. Please give me sed command and if possible also explain stepwise
Part 2. I also need to extract version number from the string.
version number starts with digit and ends with . followed by xc-linux.
As you can see to maintain the uniqueness its has random alphanumeric characters ( length is 7) as part of the version number.
For example :
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
In this string the version number is : 1.0-2-2acd314
There are quite a few ways to extract the data. The simplest form would be grep.
GNU grep:
You can grab the required data using GNU grep with PCRE option -P:
$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r
$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4
Here we use negative look behind assertion tell to capture everything from _ to a - not incusive.
awk:
$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4
Here we tell awk to use - and _ as delimiters and print the second column.
sed:
Having said that, you can also use sed to extract required data using group capture:
$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4
The regex states that match any character zero or more times up to an _. From that point onwards, capture everything until a - in a group. In the replacement part we just use the data captured in the group by calling it using back reference, that is \1.

GREP SED how can I search for a pattern span into two lines?

SOLUTION
Initial solution
find . -type f -exec sed -i ':a;N;$!ba;s/\n //g' {} + | grep -l "672.15687489"
Initial post:
I was wondering how to search for a pattern in a file. The but is that the pattern is spanned in two lines and I don't know in which part the pattern is divided.
Example:
The pattern: _"672.15687489"_
But, in the file could be one of these several options:
672.15\n687489
672.156\n87489
672.1568\n7489
672.15687\n489
...
I don't care how the pattern is splitted, the only thing I want is the name of the file that have the pattern.
Thank you for the hilarious sed | grep "solution":
sed -i ':a;N;$!ba;s/\n //g' {} + | grep -l "672.15687489"
but in reality, just use awk. Here's a GNU awk solution that won't change your original file, doesn't require multiple commands and a pipe, and does not require a James Bond decoder ring to understand an arcane combination of letters and punctuation marks:
$ cat file
foo
672.15
687489
bar
$ gawk -v RS='\0' '{gsub(/\n/,"")} /672.15687489/{print FILENAME; exit}' file
file
All you need to know is that setting RS to the Null character tells gawk to read the whole file as a single record. Other awks may or may not support this but GNU awk does. There are other awk solutions, all of which would be clearer than the posted sed+grep solution.