using sed for extracting multiple matches

using sed for extracting multiple matches - sed

I have the following line:
echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine|sed 's/.*\(ZZ:Z:.*[ ]\).*/\1/g'
which outputs:
ZZ:Z:cas.sup
I'd like to use sed for extracting both ZZ:Z entries from the given line, such as (please avoid awk since the position of ZZ:Z entries may differ per each line in my file):
preferable output:
ZZ:Z:mus.sup ZZ:Z:cas.sup
Or possibly:
ZZ:Z:mus.sup
ZZ:Z:cas.sup
Thanks.

Try grep with the -o (or --only-matching) flag:
$ grep -o 'ZZ:Z:[^ ]* ' <<< "AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine"
ZZ:Z:mus.sup
ZZ:Z:cas.sup
Or with sed, based on this #potong answer:
sed 's/ZZ:Z:/\n&/g;s/[^\n]*\n\(ZZ:Z:[^ ]* \)[^\n]*/\1 /g;s/.$//'
If you have only two occurrences of the pattern per line:
sed -n 's/.*\(ZZ:Z[^ ]*\).*\(ZZ:Z[^ ]*\).*/\1 \2/p' <<< "AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine"

You can surely achieve it with sed, but wouldn't a tr and grep solution be more natural (because you seem to actually have different logical records despite the fact they appear on a single line):
echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine | tr ' ' '\n' | grep "ZZ:Z"
and if you want all back into a single line, just add | tr '\n' ' ' at the end for converting back \n into spaces.
Of course you could also replace grep with sed in this solution.

Related

insert semi colon after 10 digit number

I have lines that start like this: 2141058222 11/22/2017 and I want to append a ; at the end of the ten digit number like this: 2141058222; 11/22/2017.
I've tried sed with sed -i 's/^[0-9]\{10\}\\$/;&/g' which does nothing.
What am I missing?

Try this:
echo "2141058222 11/22/2017" | sed -r 's/^([0-9]{10})/&;/'

echo "2141058222 11/22/2017" | sed 's/ /; /'
Output:
2141058222; 11/22/2017

If the input is always in the format specified, GNU cut works, and might even be more efficient than sed:
cut -c -10,11- --output-delimiter ';' <<< "2141058222 11/22/2017"
Output:
2141058222; 11/22/2017
For an input file that'd be:
cut -c -10,11- --output-delimiter ';' file

Print pattern on a string with special character

How to print only string figure with the following line :
\begin{figure}[h!]
I tried :
firstLine='\begin{figure}[h!]'
echo $firstLine | sed -n 's/\\begin{\(.*\)}/\1/p'
but returns :
figure[h!] instead of figure
It seems that issue comes from [] or ! character.

firstLine='\begin{figure}[h!]'
echo "$firstLine" | sed 's/.*{\(.*\)}.*/\1/'
Output:
figure
With your code (add .*):
echo $firstLine | sed -n 's/\\begin{\(.*\)}.*/\1/p'

This might work for you (GNU sed):
sed 's/.*{\(.*\)}.*/\1/' file
This assumes there is only one {...} expression and one line.
A more rigorous solution would be:
sed -n 's/.*\\begin{\([^}]*\)}.*/\1/p' file
However nothing would be output if no match was found.

Extract data between two strings using either AWK or SED

I'm trying to extract data/urls (in this case - someurl) from a file that contains them within some tag ie.
xyz>someurl>xyz
I don't mind using either awk or sed.

I think the best, easiest, way is with cut:
$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl
With awk can be done like:
$ echo "xyz>someurl>xyz" | awk 'BEGIN { FS = ">" } ; { print $2 }'
someurl
And with sed is a little bit more tricky:
$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl
we get blocks of something1<something2<something3 and print the 2nd one.

grep was born to extract things:
kent$ echo "xyz>someurl>xyz"|grep -Po '>\K[^>]*(?=>)'
someurl
you could kill a fly with a bomb of course:
kent$ echo "xyz>someurl>xyz"|awk -F\> '$0=$2'
someurl

If your grep supports P option then you can use lookahead and lookbehind regular expression to identify the url.
$ echo "xyz>someurl>xyz" | grep -oP '(?<=xyz>).*(?=>xyz)'
someurl
This is just a sample to get you started not the final answer.

Unix - Removing everything after a pattern using sed

I have a file which looks like below:
memory=500G
brand=HP
color=black
battery=5 hours
For every line, I want to remove everything after = and also the =.
Eventually, I want to get something like:
memory:brand:color:battery:
(All on one line with colons after every word)
Is there a one-line sed command that I can use?

sed -e ':a;N;$!ba;s/=.\+\n\?/:/mg' /my/file
Adapted from this fine answer.
To be frank, however, I'd find something like this more readable:
cut -d = -f 1 /my/file | tr \\n :

Here's one way using GNU awk:
awk -F= '{ printf "%s:", $1 } END { printf "\n" }' file.txt
Result:
memory:brand:color:battery:
If you don't want a colon after the last word, you can use GNU sed like this:
sed -n 's/=.*//; H; $ { g; s/\n//; s/\n/:/g; p }' file.txt
Result:
memory:brand:color:battery

This might work for you (GNU sed):
sed -i ':a;$!N;s/=[^\n]*\n\?/:/;ta' file

perl -F= -ane '{print $F[0].":"}' your_file
tested below:
> cat temp
abc=def,100,200,dasdas
dasd=dsfsf,2312,123,
adasa=sdffs,1312,1231212,adsdasdasd
qeweqw=das,13123,13,asdadasds
dsadsaa=asdd,12312,123
> perl -F= -ane '{print $F[0].":"}' temp
abc:dasd:adasa:qeweqw:dsadsaa:

My command is
First step:
sed 's/([a-z]+)(\=.*)/\1:/g' Filename |cat >a
cat a
memory:
brand:
color:
battery:
Second step:
sed -e 'N;s/\n//' a | sed -e 'N;s/\n//'
My output is
memory:brand:color:battery:

Can sed search & replace on a match if that match in only part of a line?

The sed below will output the input exactly. What I'd like to do is replace all occurrences of _ with - in the first matching group (\1), but not in the second. Is this possible?
echo 'abc_foo_bar=one_two_three' | sed 's/\([^=]*\)\(=.*\)/\1\2/'
abc_foo_bar=one_two_three
So, the output I'm hoping for is:
abc-foo-bar=one_two_three
I'd prefer not to resort to awk since I'm doing a string of other sed commands too, but I'll resort to that if I have to.
Edit: Minor fix to RE

You can do this in sed using the hold space:
$ echo 'abc_foo_bar=one_two_three' | sed 'h; s/[^=]*//; x; s/=.*//; s/_/-/g; G; s/\n//g'
abc-foo-bar=one_two_three

You could use awk instead of sed as follows:
echo 'abc_foo_bar=one_two_three' | awk -F= -vOFS== '{gsub("_", "-", $1); print $1, $2}'
The output would be, as expected:
abc-foo-bar=one_two_three

You could use ghc instead of sed as follows:
echo "abc_foo_bar=one_two_three" | ghc -e "getLine >>= putStrLn . uncurry (++) . (map (\x -> if x == '_' then '-' else x) *** id) . break (== '=')"
The output would be, as expected:
abc-foo-bar=one_two_three

This might work for you:
echo 'abc_foo_bar=one_two_three' |
sed 's/^/\n/;:a;s/\n\([^_=]*\)_/\1-\n/;ta;s/\n//'
abc-foo-bar=one_two_three
Or this:
echo 'abc_foo_bar=one_two_three' |
sed 'h;s/=.*//;y/_/-/;G;s/\n.*=/=/'
abc-foo-bar=one_two_three

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

using sed for extracting multiple matches - sed

Related

insert semi colon after 10 digit number

Print pattern on a string with special character

Extract data between two strings using either AWK or SED

Unix - Removing everything after a pattern using sed

Can sed search & replace on a match if that match in only part of a line?

Categories

Resources