Extract data between two strings using either AWK or SED - sed

I'm trying to extract data/urls (in this case - someurl) from a file that contains them within some tag ie.
xyz>someurl>xyz
I don't mind using either awk or sed.

I think the best, easiest, way is with cut:
$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl
With awk can be done like:
$ echo "xyz>someurl>xyz" | awk 'BEGIN { FS = ">" } ; { print $2 }'
someurl
And with sed is a little bit more tricky:
$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl
we get blocks of something1<something2<something3 and print the 2nd one.

grep was born to extract things:
kent$ echo "xyz>someurl>xyz"|grep -Po '>\K[^>]*(?=>)'
someurl
you could kill a fly with a bomb of course:
kent$ echo "xyz>someurl>xyz"|awk -F\> '$0=$2'
someurl

If your grep supports P option then you can use lookahead and lookbehind regular expression to identify the url.
$ echo "xyz>someurl>xyz" | grep -oP '(?<=xyz>).*(?=>xyz)'
someurl
This is just a sample to get you started not the final answer.

Related

insert semi colon after 10 digit number

I have lines that start like this: 2141058222 11/22/2017 and I want to append a ; at the end of the ten digit number like this: 2141058222; 11/22/2017.
I've tried sed with sed -i 's/^[0-9]\{10\}\\$/;&/g' which does nothing.
What am I missing?
Try this:
echo "2141058222 11/22/2017" | sed -r 's/^([0-9]{10})/&;/'
echo "2141058222 11/22/2017" | sed 's/ /; /'
Output:
2141058222; 11/22/2017
If the input is always in the format specified, GNU cut works, and might even be more efficient than sed:
cut -c -10,11- --output-delimiter ';' <<< "2141058222 11/22/2017"
Output:
2141058222; 11/22/2017
For an input file that'd be:
cut -c -10,11- --output-delimiter ';' file

How to get wheel users with specific prefix letters

cat /etc/group | grep wheel
wheel:x:10:I0173203,i04317303,raccount,d454523,c564566,C555533,D2354546
I want to extract only the users that start with c\C i\I d\D
How do I get this Desired output?
I0173203 i04317303 d454523 c564566 C555533 D2354546
I would use awk for this:
$ awk -F[:,] '/^wheel/ {
for(i=4;i<=NF;i++) if($i~/^[cCiIdD]/) printf "%s%s",$i,(i==NF?RS:OFS)
}' /etc/group
I0173203 i04317303 d454523 c564566 C555533 D2354546
You can also use perl:
perl -nle '#m=(m/[:,]([iIcCdD]\w+)/g) if $_=~/^wheel/ }{ print "#m"' /etc/group
cat /etc/group | grep wheel | sed 's/^.*:\(.*\)$/\1/g' | sed 's/,/\n/g' | egrep '^[cCiIdD].*'
Run first command in chain, look at results. Then add second, look at results, ...

How to display text between nested Parenthesis using sed or awk or grep?

I have a file inside that one line contains nested parenthesis, i want to display those words only.
Example:
(abc (defg) or hij(klmn)) and (opq(rstuv))
Expected Result:
defg
klmn
rstuv
I have tried with awk - awk -F "[(())]" '{ for (i=2; i<NF; i+=2) print $i}'
I have tried with sed - sed 's/.*(\([a-zA-Z0-9_]*\)).*/\1/'
Using perl global matching and lazy quantifiers:
#! /usr/bin/perl -n
use feature 'say';
while (/\((.*?\)[^(]*?)\)/g) {
$m=$1;
while ($m =~ /\((.*?)\)/g) {
say $1;
}
}
Output:
defg
klmn
rstuv
Maybe with grep?
$ echo "(abc (defg) or hij(klmn)) and (opq(rstuv))" | grep -o "([a-z]*)"
(defg)
(klmn)
(rstuv)
It catches the groups of ( + letters + ).
I tried to get rid of the paranthesis but could not. This is my approach:
grep -Po '(?<=()[a-z]*(?=))'
but it indicates that "grep: lookbehind assertion is not fixed length", as I guess it cannot decide up to which ) to look for.
This might work for you (GNU sed):
sed -r 's/\(([^()]*)\)/\n\1\n/;s/[^\n]*\n//;/[^()]/P;D' file

Unix - Removing everything after a pattern using sed

I have a file which looks like below:
memory=500G
brand=HP
color=black
battery=5 hours
For every line, I want to remove everything after = and also the =.
Eventually, I want to get something like:
memory:brand:color:battery:
(All on one line with colons after every word)
Is there a one-line sed command that I can use?
sed -e ':a;N;$!ba;s/=.\+\n\?/:/mg' /my/file
Adapted from this fine answer.
To be frank, however, I'd find something like this more readable:
cut -d = -f 1 /my/file | tr \\n :
Here's one way using GNU awk:
awk -F= '{ printf "%s:", $1 } END { printf "\n" }' file.txt
Result:
memory:brand:color:battery:
If you don't want a colon after the last word, you can use GNU sed like this:
sed -n 's/=.*//; H; $ { g; s/\n//; s/\n/:/g; p }' file.txt
Result:
memory:brand:color:battery
This might work for you (GNU sed):
sed -i ':a;$!N;s/=[^\n]*\n\?/:/;ta' file
perl -F= -ane '{print $F[0].":"}' your_file
tested below:
> cat temp
abc=def,100,200,dasdas
dasd=dsfsf,2312,123,
adasa=sdffs,1312,1231212,adsdasdasd
qeweqw=das,13123,13,asdadasds
dsadsaa=asdd,12312,123
> perl -F= -ane '{print $F[0].":"}' temp
abc:dasd:adasa:qeweqw:dsadsaa:
My command is
First step:
sed 's/([a-z]+)(\=.*)/\1:/g' Filename |cat >a
cat a
memory:
brand:
color:
battery:
Second step:
sed -e 'N;s/\n//' a | sed -e 'N;s/\n//'
My output is
memory:brand:color:battery:

Can sed search & replace on a match if that match in only part of a line?

The sed below will output the input exactly. What I'd like to do is replace all occurrences of _ with - in the first matching group (\1), but not in the second. Is this possible?
echo 'abc_foo_bar=one_two_three' | sed 's/\([^=]*\)\(=.*\)/\1\2/'
abc_foo_bar=one_two_three
So, the output I'm hoping for is:
abc-foo-bar=one_two_three
I'd prefer not to resort to awk since I'm doing a string of other sed commands too, but I'll resort to that if I have to.
Edit: Minor fix to RE
You can do this in sed using the hold space:
$ echo 'abc_foo_bar=one_two_three' | sed 'h; s/[^=]*//; x; s/=.*//; s/_/-/g; G; s/\n//g'
abc-foo-bar=one_two_three
You could use awk instead of sed as follows:
echo 'abc_foo_bar=one_two_three' | awk -F= -vOFS== '{gsub("_", "-", $1); print $1, $2}'
The output would be, as expected:
abc-foo-bar=one_two_three
You could use ghc instead of sed as follows:
echo "abc_foo_bar=one_two_three" | ghc -e "getLine >>= putStrLn . uncurry (++) . (map (\x -> if x == '_' then '-' else x) *** id) . break (== '=')"
The output would be, as expected:
abc-foo-bar=one_two_three
This might work for you:
echo 'abc_foo_bar=one_two_three' |
sed 's/^/\n/;:a;s/\n\([^_=]*\)_/\1-\n/;ta;s/\n//'
abc-foo-bar=one_two_three
Or this:
echo 'abc_foo_bar=one_two_three' |
sed 'h;s/=.*//;y/_/-/;G;s/\n.*=/=/'
abc-foo-bar=one_two_three