Sed replacing last comma fails - sed

I'm working on a sed script that takes a bunch of lines and turns them into an argument list for matlab (single quoted, comma separated).
It's working well so far:
[script to generate list] | sed -n "s#\(.*$\)#'\1',#p#" | tr '\n' ' '
But this leaves me with a trailing comma.
By testing, I can remove it with
[list of comma separated values] | sed -n 's#,$##p#'
but, when putting it all together:
[script to generate list] | sed -n "s#\(.*$\)#'\1',#;s#,$##p#" | tr '\n' ' '
Outputs nothing.
I feel like it has something to do with not having a p in the first line of the sed script, but I don't want it to print those values, I want them sent to the next line in the script (isn't that the default?)
Edit:
[script to generate list] Outputs a list of directories, for example:
./work/matlab_stun_gun/tex/fullTest.pdf
./Downloads/Howfar(tetra2) fixed.pdf
./work/savdocs/win_tests/tex/texReport.pdf
./Downloads/AcademicAudit.pdf
./work/matlab_stun_gun/report.pdf
./Downloads/PMB_4DVMC.pdf
./work/savdocs/win_tests/tex/mouseHeatMap.pdf
./Downloads/Geometry.pdf
./work/savdocs/win_tests/tex/mouseHeatMap.pdf
./work/matlab_stun_gun/tex/fullTest.pdf
The list generator is just find . -name "*.pdf" | pickl -n 10, adjusted for file type/ number etc. This is going to become a general purpose script.
Expected output would be :
'./work/savdocs/win_tests/tex/mClickss.pdf', './Downloads/Howfar(tetra2) fixed.pdf', './Downloads/MedPhys_defDOSXYZ.pdf', './Downloads/MedPhys_defDOSXYZ.pdf', './report.pdf', './work/savdocs/win_tests/tex/cSwitchs.pdf', './tex/zoomIn.pdf', './tex/fullTest.pdf', './temp/tex/zoomIn.pdf', './tex/zoomIn.pdf'
(Note the lack of trailing comma)

You are experiencing a multi-faceted problem here, in the sense that each of your attempts has something wrong with it.
Starting with [list of comma separated values] | sed -n 's#,$##p#', keep in mind that tr effectively makes your separator ', ' (comma-space) instead of just ',' comma. This means that you will output nothing from the second sed expression. You can fix that by matching with sed -n 's#, $##p#'. If you insist on using the -n flag, that is the correct solution. In full:
[script to generate list] | \
sed -n "s#\(.*$\)#'\1',#p#" | \
tr '\n' ' ' | \
sed -n 's#, $##p#'
The problem with your combination attempt, [script to generate list] | sed -n "s#\(.*$\)#'\1',#;s#,$##p#" | tr '\n' ' ', is that you need to apply tr before you remove the trailing commas. Even if this were to print anything, you would be adding a comma, stripping it off immediately on each line, and then replacing newlines with spaces. The correct order is already shown above.
Multiple commands in sed should be specified using the -e flag. They pipe the result of one command into another, equivalently to using pipes, but much more efficiently. To get sed -n "s#\(.*$\)#'\1',#;s#,$##p#" to print, rephrase it like:
sed -n -e "s#\(.*$\)#'\1',#" -e "s#, $##p#"
This is of course going to strip off the commas as soon as you add them to each line, but it shows the correct syntax for doing so.
Further Improvements
You probably don't need to use the -n flag for sed (and consequently) the /p/ flag for the s command. The -n flag is only useful if you only want to print matches, but you want to print everything, so it does not apply to you.
You also don't need an explicit capture group since you can use the \0 replacement to get the entire match, not just the group. Here is an example:
[script to generate list] | sed "s/.*/'\0'" | tr '\n' ' ' | sed 's/, $//'
Finally, there are alternatives to removing the trailing bits of the string without starting a subprocess, especially since you are already enclosing your expression in $(...):
RESULT=$([script to generate list] | sed "s/.*/'\0'" | tr '\n' ' ')
RESULT=${RESULT%, }
OR
RESULT=${RESULT::-2}

Related

Regex: how to match up to a character or the end of a line?

I am trying to separate out parts of a path as follows. My input path takes the following possible forms:
bucket
bucket/dir1
bucket/dir1/dir2
bucket/dir1/dir2/dir3
...
I want to separate the first part of the path (bucket) from the rest of the string if present (dir1/dir2/dir3/...), and store both in separate variables.
The following gives me something close to what I want:
❯ BUCKET=$(echo "bucket/dir1/dir2" | sed 's#\(^[^\/]*\)[\/]\(.*\)#\1#')
❯ EXTENS=$(echo "bucket/dir1/dir2" | sed 's#\(^[^\/]*\)[\/]\(.*\)#\2#')
echo $BUCKET $EXTENS
❯ bucket dir1/dir2
HOWEVER, it fails if I only have bucket as input (without a slash):
❯ BUCKET=$(echo "bucket" | sed 's#\(^[^\/]*\)[\/]\(.*\)#\1#')
❯ EXTENS=$(echo "bucket" | sed 's#\(^[^\/]*\)[\/]\(.*\)#\2#')
echo $BUCKET $EXTENS
❯ bucket bucket
... because, in the absence of the first '/', no capture happens, so no substitution takes place. When the input is just 'bucket' I would like $EXTENS to be set to the empty string "".
Thanks!
For something so simple you could use bash built-in instead of launching sed:
$ path="bucket/dir1/dir2"
$ bucket="${path%%/*}"
$ extens="${path#$bucket}"
$ printf '|%s|%s|\n' "$bucket" "$extens"
|bucket|/dir1/dir2|
$ path="bucket"
$ bucket="${path%%/*}"
$ extens="${path#$bucket}"
$ printf '|%s|%s|\n' "$bucket" "$extens"
|bucket||
But if you really want to use sed and capture groups:
$ declare -a bucket_extens
$ mapfile -td '' bucket_extens < <(printf '%s' "bucket/dir1/dir2" | sed -E 's!([^/]*)(.*)!\1\x00\2!')
$ printf '|%s|%s|\n' "${bucket_extens[#]}"
|bucket|/dir1/dir2|
$ mapfile -td '' bucket_extens < <(printf '%s' "bucket" | sed -E 's!([^/]*)(.*)!\1\x00\2!')
$ printf '|%s|%s|\n' "${bucket_extens[#]}"
|bucket||
We use the extended regex (-E) to simplify a bit, and ! as separator of the substitute command. The first capture group is simply anything not containing a slash and the second is everything else, including nothing if there's nothing else.
In the replacement string we separate the two capture groups with a NUL character (\x00). We then use mapfile to assign the result to bash array bucket_extens.
The NUL trick is a way to deal with file names containing spaces, newlines... NUL is the only character that cannot be part of a file name. The -d '' option of mapfile indicates that the lines to map are separated by NUL instead of the default newline.
Don't capture anything. Instead, just match what you don't want and replace it with nothing:
BUCKET=$(echo "bucket" | sed 's#/.*##'). # bucket
BUCKET=$(echo "bucket/dir1/dir2" | sed 's#/.*##') # bucket
EXTENS=$(echo "bucket" | sed 's#[^/]*##') # blank
EXTENS=$(echo "bucket/dir1/dir2" | sed 's#[^/]*##') # /dir1/dir2
As you are putting a slash in the regex. the string with no slashes will not
match. Let's make the slash optional as /\?. (A backslash before ?
is requires due to the sed BRE.) Then would you please try:
#!/bin/bash
#path="bucket/dir1/dir2"
path="bucket"
bucket=$(echo "$path" | sed 's#\(^[^/]*\)/\?\(.*\)#\1#')
extens=$(echo "$path" | sed 's#\(^[^/]*\)/\?\(.*\)#\2#')
echo "$bucket" "$extens"
You don't need to prepend a backslash to a slash.
By convention, it is recommended to use lower cases for user variables.

Using a single sed call to split and grep

This is mostly by curiosity, I am trying to have the same behavior as:
echo -e "test1:test2:test3"| sed 's/:/\n/g' | grep 1
in a single sed command.
I already tried
echo -e "test1:test2:test3"| sed -e "s/:/\n/g" -n "/1/p"
But I get the following error:
sed: can't read /1/p: No such file or directory
Any idea on how to fix this and combine different types of commands into a single sed call?
Of course this is overly simplified compared to the real usecase, and I know I can get around by using multiple calls, again this is just out of curiosity.
EDIT: I am mostly interested in the sed tool, I already know how to do it using other tools, or even combinations of those.
EDIT2: Here is a more realistic script, closer to what I am trying to achieve:
arch=linux64
base=https://chromedriver.storage.googleapis.com
split="<Contents>"
curl $base \
| sed -e 's/<Contents>/<Contents>\n/g' \
| grep $arch \
| sed -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out
What I would like to simplify is the curl line, turning it into something like:
curl $base \
| sed 's/<Contents>/<Contents>\n/g' -n '/1/p' -e 's/^<Key>\(.*\)\/chromedriver.*/\1/' \
| sort -V > out
Here are some alternatives, awk and sed based:
sed -E "s/(.*:)?([^:]*1[^:]*).*/\2/" <<< "test1:test2:test3"
awk -v RS=":" '/1/' <<< "test1:test2:test3"
# or also
awk 'BEGIN{RS=":"} /1/' <<< "test1:test2:test3"
Or, using your logic, you would need to pipe a second sed command:
sed "s/:/\n/g" <<< "test1:test2:test3" | sed -n "/1/p"
See this online demo. The awk solution looks cleanest.
Details
In sed solution, (.*:)?([^:]*1[^:]*).* pattern matches an optional sequence of any 0+ chars and a :, then captures into Group 2 any 0 or more chars other than :, 1, again 0 or more chars other than :, and then just matches the rest of the line. The replacement just keeps Group 2 contents.
In awk solution, the record separator is set to : and then /1/ regex is used to only return the record having 1 in it.
This might work for you (GNU sed):
sed 's/:/\n/;/^[^\n]*1/P;D' file
Replace each : and if the first line in the pattern space contains 1 print it.
Repeat.
An alternative:
sed -Ez 's/:/\n/g;s/^[^1]*$//mg;s/\n+/\n/;s/^\n//' file
This slurps the whole file into memory and replaces all colons by newlines. All lines that do not contain 1 are removed and surplus newlines deleted.
An alternative to the really ugly sed is: grep -o '\w*2\w*'
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | grep -o '\w*2\w*'
test2
bob2
fred2
grep -o: only matching
Or: grep -o '[^:]*2[^:]*'
echo -e "test1:test2:test3" | sed -En 's/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;//!D'
sed -n doesn't print unless told to
sed -E allows using parens to match (\n|$) which is newline or the end of the pattern space
P prints the pattern buffer up to the first newline.
D trims the pattern buffer up to the first newline
[^\n] is a character class that matches anything except a newline
// is sed shorthand for repeating a match
//! is then matching everything that didn't match previously
So, after you split into newlines, you want to make sure the 2 character is between the start of the pattern buffer ^ and the first newline.
And, if there is not the character you are looking for, you want to D delete up to the first newline.
At that point, it works for one line of input, with one string containing the character you're looking for.
To expand to several matches within a line, you have to ta, conditionally branch back to label :a:
$ printf "test1:test2:test3\nbob3:bob2:fred2\n" | \
sed -En ':a s/:/\n/g;/^[^\n]*2[^\n]*(\n|$)/P;D;ta'
test2
bob2
fred2
This is simply NOT a job for sed. With GNU awk for multi-char RS:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '/1/'
test1
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' 'NR%2'
test1
test3
test5
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS='[:\n]' '!(NR%2)'
test2
test4
test6
$ echo "foo1:bar1:foo2:bar2:foo3:bar3" | awk -v RS='[:\n]' '/foo/ || /2/'
foo1
foo2
bar2
foo3
With any awk you'd just have to strip the \n from the final record before operating on it:
$ echo "test1:test2:test3:test4:test5:test6"| awk -v RS=':' '{sub(/\n$/,"")} /1/'
test1

How to inject a line feed to replace a delimiter

/usr/bin/sed 's/,/\\n/g' comma-delimited.txt > newline-separated.txt
This doesn't work for me. I just get the ',' removed but the tokens are now just not delimited.
You must have an older version of sed, so you need to put a literal LF char in your substitution, i.e.
/usr/bin/sed 's/,/
/g' comma-delimited.txt > newline-separated.txt
You may even need to escape the LF, so make sure there are no white space chars after the last char '\'
/usr/bin/sed 's/,/\
/g' comma-delimited.txt > newline-separated.txt
This might work for you:
echo a,b,c,d,e | sed 'G;:a;s/,\(.*\(.\)\)/\2\1/;ta;s/.$//'
a
b
c
d
e
Explanation:
Appends a newline to the pattern space. G
Substitute ,'s with the last character in the pattern space i.e. the \n :a;s/,\(.*\(.\)\)/\2\1/;ta
Remove the newline. s/.$//
I tried the following, looks clumsy but does the work. Easy to understand. I use tr to do the replacement of the placeholder §. Only caveat is the placeholder, must be something NOT in the string(s).
ps -fu $USER | grep java | grep DML| sed -e "s/ -/§ -/g" | tr "§" "\n"
will give you an indented output of the commandline. DML is just some servername.
on AIX7 answer #3 worked well:
I need to insert a newline at the beginning of a paragraph so I can do grep -p to filter for 'mksysb' in the resulting 'stanza'
lsnim -l | /usr/bin/sed 's/^[a-zA-Z/\^J&/'
(actually the initial line had an escaped newline:
lsnim -l | /usr/bin/sed 's/^[a-zA-Z/\
&/')
recalling the command showed the ^J syntax ...

How to use a sed one-liner to parse "rec:id=1&name=zz&age=21" into "1 zz 21"?

I can chain multiple sed substitutions and a awk operation to achieve this, but is there a single sed substitution that can do it?
Also is there any other tool that is more suitable for this parsing task?
You could try:
sed -r 's!rec:id=(.*?)&name=(.*?)&age=(.*?)!\1 \2 \3!' input_file
If you don't know the rec:id etc in advance but you know there's three, you could try:
sed -r 's![^=]+=(.*?)&[^=]+=(.*?)&[^=]+=(.*?)!\1 \2 \3!' input_file
If you don't know how many &name=value pairs you're after in advance but want to output all the values, you could try something like:
grep -P -o '(?<==)([^&]*)(?=&|$)' | xargs
where the -P means 'perl regex', the regex says "find the string followed by an & (or end of string) and preceded by and equals sign", the -o means to print just the matches (ie the 1, zz, and 21) each on their own line, and the | xargs moves these from their own line to one line and space separated (ie 1\nzz\n21 to 1 zz 21).
This might work for you:
echo "rec:id=1&name=zz&age=21" | sed 's/[^=]*=\([^&]*\)/\1 /g'
1 zz 21
However this leaves an extra space at the end, to solve this use:
echo "rec:id=1&name=zz&age=21"|sed 's/[^=]*=\([^&]*\)/\1 /g:;s/ $//'
1 zz 21
How about parsing the values directly into variables?
inbound="rec:id=1&name=zz&age=21"
eval $(echo $inbound | cut -c5- | tr \& "\n")
echo "Name:$name, ID:$id, Age:$age"
Or even better, though slightly more arcane:
inbound="rec:id=1&name=zz&age=21"
IFS=\& eval $(cut -c5- <<< $inbound)
echo "Name:$name, ID:$id, Age:$age"

Filter text based in a multiline match criteria

I have the following sed command. I need to execute the below command in single line
cat File | sed -n '
/NetworkName/ {
N
/\n.*ims3/ p
}' | sed -n 1p | awk -F"=" '{print $2}'
I need to execute the above command in single line. can anyone please help.
Assume that the contents of the File is
System.DomainName=shayam
System.Addresses=Fr6
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=AS
System.DomainName=ims5.com
System.DomainName=Ram
System.Addresses=Fr9
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims7.com
System.DomainName=mani
System.Addresses=Hello
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims3.com
And after executing the command you will get only peer as the output. Can anyone please help me out?
You can use a single nawk command. And you can lost the useless cat
nawk -F"=" '/NetworkName/{n=$2;getline;if($2~/ims3/){print n} }' file
You can use sed as well as proposed by others, but i prefer less regex and less clutter.
The above save the value of the network name to "n". Then, get the next line and check the 2nd field against "ims3". If matched, then print the value of "n".
Put that code in a separate .sh file, and run it as your single-line command.
cat File | sed -n '/NetworkName/ { N; /\n.*ims3/ p }' | sed -n 1p | awk -F"=" '{print $2}'
Assuming that you want the network name for the domain ims3, this command line works without sed:
grep -B 1 ims3 File | head -n 1 | awk -F"=" '{print $2}'
So, you want the network name where the domain name on the following line includes 'ims3', and not the one where the following line includes 'ims7' (even though the network names in the example are the same).
sed -n '/NetworkName/{N;/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;};}' File
This avoids abuse of felines, too (not to mention reducing the number of commands executed).
Tested on MacOS X 10.6.4, but there's no reason to think it won't work elsewhere too.
However, empirical evidence shows that Solaris sed is different from MacOS sed. It can all be done in one sed command, but it needs three lines:
sed -n '/NetworkName/{N
/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;}
}' File
Tested on Solaris 10.
You just need to put -e pretty much everywhere you'd break the command at a newline or have a semicolon. You don't need the extra call to sed or awk or cat.
sed -n -e '/NetworkName/ {' -e 'N' -e '/\n.*ims3/ s/[^\n]*=\(.*\).*/\1/P' -e '}' File