optimize sed multiple expressions including white spaces and square brackets

optimize sed multiple expressions including white spaces and square brackets - sed

I have following command working fine, but just for learning purpose, I want to know how can I put following three expressions of sed into one:
bash
[user#localhost]$ echo '[lib:Library10] [idx:10] [Fragment] [75] [color]'| sed -e 's/\]//g' -e 's/\[//g' -e 's/\s\+/\t/g' -e 's/\:/\t/'
lib Library10 idx:10 Fragment 75 color

sed 's/[][]//g; s/:\|\s\+/\t/g'
Demonstrating:
$ echo '[lib:Library10] [idx:10] [Fragment] [75] [color]'| sed 's/[][]//g; s/:\|\s\+/\t/g'
lib Library10 idx 10 Fragment 75 color
$ echo '[lib:Library10] [idx:10] [Fragment] [75] [color]'| sed 's/[][]//g; s/:\|\s\+/\t/g' | od -c
0000000 l i b \t L i b r a r y 1 0 \t i d
0000020 x \t 1 0 \t F r a g m e n t \t 7 5
0000040 \t c o l o r \n
0000047
If you want to put a right bracket in a character class, it must be the first character, so [][] will match either a left or a right bracket.

You can group it in two blocks:
$ sed -re 's/(\]|\[)//g' -e 's/(\s+|\:)/\t/g' <<< "[lib:Library10] [idx:10] [Fragment] [75] [color]"
lib Library10 idx 10 Fragment 75 color
That is,
sed -e 's/\]//g' -e 's/\[//g' -e 's/\s\+/\t/g' -e 's/\:/\t/'
-------------------------- ------------------------------
| delete ] and [ | | replace \s+ and : with tab |
-------------------------- ------------------------------
-re 's/(\]|\[)//g' -e 's/(\s+|\:)/\t/g'
By pieces:
sed -e 's/\]//g' -e 's/\[//g'
can be compacted as:
sed -re 's/(\]|\[)//g'
by joining the conditions with a (condition1|condition2) statement together with -r for sed.
And the same with the other expression.
As a side note, tr can be better to delete the [, ] chars:
$ echo '[lib:Library10] [idx:10] [Fragment] [75] [color]' | tr -d '[]'
lib:Library10 idx:10 Fragment 75 color
And to replace : with \t you can also use tr:
$ echo '[lib:Library10] [idx:10] [Fragment] [75] [color]' | tr ':' '\t'
[lib Library10] [idx 10] [Fragment] [75] [color]

Related

How to Replace special Character in Unix Command

My source data contains special characters not in readable format. Can anyone help on the below :
Source data:
Commands Tryed:
sed 's/../t/g' test.txt > test2.txt

you can use tr to keep only printable characters:
tr -cd "[:print:]" <test.txt > test2.txt
Uses tr delete option on non-printable (print criteria negated by -c option)
If you want to replace those special chars by something else (ex: X):
tr -c "[:print:]" "X" <test.txt > test2.txt
With sed, you could try that to replace non-printable by X:
sed -r 's/[^[:print:]]/X/g' text.txt > test2.txt
it works on some but fails on chars >127 (maybe because the one I tried is printable as ▒ !) on my machine whereas tr works perfectly.
inline examples (printf to generate special chars + filter + od to show bytes):
$ printf "\x01ABC\x05\xff\xe0" | od -c
0000000 001 A B C 005 377 340
0000007
$ printf "\x01ABC\x05\xff\xe0" | sed "s/[^[:print:]]//g" | od -c
0000000 A B C 377 340
0000005
$ printf "\x01ABC\x05\xff\xe0" | tr -cd "[:print:]" | od -c
0000000 A B C
0000003

How to convert a ASCII NULL (NUL) into single spacing in a text file using Unix command?

When I BCP the data in sql server
In the output file I am getting a NUL like character in the output file, and i want to replace this with the single blank space.
When I used the below sed command it removes the NUL character but between those 2 delimiter we don't have single space.
sed 's/\x0/ /g' output file name
Example: After sed command i am getting output file like below
PHMO||P00000005233
PHMO||P00000005752
But i need a single spacing in between those delimiter as
PHMO| |P00000005233
PHMO| |P00000005752

The usual approach to this would be using tr. However, solutions with tr and sed are not portable. (The question is tagged "unix", so only portable solutions are interesting).
Here is a simple demo script
#!/bin/sh
date
tr '\000' ' ' <$0.in
date
sed -e 's/\x00/ /g' <$0.in
which I named foo, and its input (with the ASCII NUL shown here as ^#):
this is a null: "^#"
Running with GNU tr and sed:
Fri Apr 1 04:41:15 EDT 2016
this is a null: " "
Fri Apr 1 04:41:15 EDT 2016
this is a null: " "
With OSX:
Fri Apr 1 04:41:53 EDT 2016
this is a null: " "
Fri Apr 1 04:41:53 EDT 2016
this is a null: "^#"
With Solaris 10 (and 11, though there may be a recent change):
Fri Apr 1 04:38:08 EDT 2016
this is a null: ""
Fri Apr 1 04:38:08 EDT 2016
this is a null: ""
Bear in mind that sed is line-oriented, and that ASCII NUL is considered a binary (non-line) character. If you want a portable solution, then other tools such as Perl (which do not have that limitation) are useful. For that case one could add this to the script:
perl -np -e 's/\0/ /g' <$0.in
The intermediate tool awk is no better in this instance. Going to Solaris again, with these lines:
for awk in awk nawk mawk gawk
do
echo "** $awk:"
$awk '{ gsub("\0"," "); print; }' <$0.in
done
I see this output:
** awk:
awk: syntax error near line 1
awk: illegal statement near line 1
** nawk:
nawk: empty regular expression
source line number 1
context is
{ gsub("\0"," >>> ") <<<
** mawk:
this is a null: " "
** gawk:
this is a null: " "
Further reading:
sed - stream editor (POSIX)
tr - translate characters (POSIX), which notes
Unlike some historical implementations, this definition of the tr utility correctly processes NUL characters in its input stream. NUL characters can be stripped by using:
tr -d '\000'
perlrun - how to execute the Perl interpreter

This is an easy job for sed. Let's start creating a test file as you didn't provide one:
$ echo -e "one,\x00,two,\x00,three" > a
$ echo -e "four,\x00,five,\x00,six" >> a
As you can see it contains ASCII 0:
$ od -c a
0000000 o n e , \0 , t w o , \0 , t h r e
0000020 e \n f o u r , \0 , f i v e , \0 ,
0000040 s i x \n
0000044
Now let's run sed:
$ sed 's/\x00/ /g' a > b
And check the output:
$ cat b
one, ,two, ,three
four, ,five, ,six
$ od -c b
0000000 o n e , , t w o , , t h r e
0000020 e \n f o u r , , f i v e , ,
0000040 s i x \n
0000044

it can be done quite easily with perl
cat -v inputfile.txt
abc^#def^#ghij^#klmnop^#qrstuv^#wxyz
perl -np -e 's/\0/ /g' <inputfile.txt >outputfile.txt
cat -v outputfile.txt
abc def ghij klmnop qrstuv wxyz

Why does sed only replace the first character?

$ echo lcdefghijklmnopqrstblvcxyz | tr [a-i] [1-9] | sed 's/j/10/' | sed 's/k/11/' | sed 's/l/12/' | sed 's/m/13/' | sed 's/n/14/' | sed 's/o/15/' | sed 's/p/16/' | sed 's/q/17/' | sed 's/r/18/' | sed 's/s/19/' | sed 's/t/20/' | sed 's/u/21/' | sed 's/v/22/' | sed 's/w/23/' | sed 's/x/24/' | sed 's/y/25/' | sed 's/z/26/'
1234567891011l13141516171819202l223242526
The long command is intended to replace a..z with 1..26. Notice there are 3 "l" characters in the echoed string. Why is the first one correctly converted to "12" yet the other two (results 11l13 and 202l223) aren't?
I tried this on both my Windows 7 PC running Cygwin (bash 4.3.33(1)-release (x86_64-unknown-cygwin)) and on my MacBook Pro running Terminal (bash 3.2) and got the same results. I expected the result to be 1..26 concatenated. This is part of a bigger problem that I reduced to this test case.

You need the g flag for the substitution to be repeated:
$ echo lll | sed 's/l/12/'
12ll
$ echo lll | sed 's/l/12/'g
121212
Without the g flag, s replaces the first instance, as documented in man sed.
Also, you can put all of those commands in a single invocation of sed. You don't need all those pipes:
sed 's/j/10/g;s/k/11/g;s/l/12/g...'

Multiple sed commands (with g switch)
Under bash, you could try something like:
c=1 o=
for i in {a..z};do
o+="s/$i/$((c++))/g;"
done
sed -e "$o" <<<'lcdefghijklmnopqrstblvcxyz'
1234567891011121314151617181920212223242526
or
fold -s <<< ${o//;/; }
s/a/1/g; s/b/2/g; s/c/3/g; s/d/4/g; s/e/5/g; s/f/6/g; s/g/7/g; s/h/8/g;
s/i/9/g; s/j/10/g; s/k/11/g; s/l/12/g; s/m/13/g; s/n/14/g; s/o/15/g; s/p/16/g;
s/q/17/g; s/r/18/g; s/s/19/g; s/t/20/g; s/u/21/g; s/v/22/g; s/w/23/g; s/x/24/g;
s/y/25/g; s/z/26/g;
then
sed -e '
s/a/1/g; s/b/2/g; s/c/3/g; s/d/4/g; s/e/5/g; s/f/6/g; s/g/7/g; s/h/8/g;
s/i/9/g; s/j/10/g; s/k/11/g; s/l/12/g; s/m/13/g; s/n/14/g; s/o/15/g; s/p/16/g;
s/q/17/g; s/r/18/g; s/s/19/g; s/t/20/g; s/u/21/g; s/v/22/g; s/w/23/g; s/x/24/g;
s/y/25/g; s/z/26/g;
' <<<'lcdefghijklmnopqrstblvcxyz'
1234567891011121314151617181920212223242526

This might work for you (GNU sed):
sed -r '1{x;s/^/a1b2c3d4e5f6g7h8i9j10k11l12m13n14o15p16q17r18s19t20u21v22w23x24y25z26/;x};G;:a;s/([a-z])(.*\n.*\1([0-9]+))/\3\2/;ta;P;d' file
This uses a lookup table to translate the required strings.

Can sed search & replace on a match if that match in only part of a line?

The sed below will output the input exactly. What I'd like to do is replace all occurrences of _ with - in the first matching group (\1), but not in the second. Is this possible?
echo 'abc_foo_bar=one_two_three' | sed 's/\([^=]*\)\(=.*\)/\1\2/'
abc_foo_bar=one_two_three
So, the output I'm hoping for is:
abc-foo-bar=one_two_three
I'd prefer not to resort to awk since I'm doing a string of other sed commands too, but I'll resort to that if I have to.
Edit: Minor fix to RE

You can do this in sed using the hold space:
$ echo 'abc_foo_bar=one_two_three' | sed 'h; s/[^=]*//; x; s/=.*//; s/_/-/g; G; s/\n//g'
abc-foo-bar=one_two_three

You could use awk instead of sed as follows:
echo 'abc_foo_bar=one_two_three' | awk -F= -vOFS== '{gsub("_", "-", $1); print $1, $2}'
The output would be, as expected:
abc-foo-bar=one_two_three

You could use ghc instead of sed as follows:
echo "abc_foo_bar=one_two_three" | ghc -e "getLine >>= putStrLn . uncurry (++) . (map (\x -> if x == '_' then '-' else x) *** id) . break (== '=')"
The output would be, as expected:
abc-foo-bar=one_two_three

This might work for you:
echo 'abc_foo_bar=one_two_three' |
sed 's/^/\n/;:a;s/\n\([^_=]*\)_/\1-\n/;ta;s/\n//'
abc-foo-bar=one_two_three
Or this:
echo 'abc_foo_bar=one_two_three' |
sed 'h;s/=.*//;y/_/-/;G;s/\n.*=/=/'
abc-foo-bar=one_two_three

How to change what sed thinks is the line delimiter

As I'm new with sed, I'm having the fun of seeing that sed doesn't think that the \r character is a valid line delimiter.
Does anyone know how to tell sed which character(s) I'd like it to use as the line delimiter when processing many lines of text?

You can specify it with awk's RS (record separator) variable: awk 'BEGIN {RS = "\r"} ...
Or you can convert with: tr '\r' '\n'

(For making the examples below clearer and less ambiguous, I'll use the od util extensively.)
It is not possible to do with a flag, for example. I bet the best solution is the one cited by the previous answers: using tr. If you have a file such as the one below:
$ od -xc slashr.txt
0000000 6261 0d63 6564 0d66
a b c \r d e f \r
0000010
There are various ways of using tr; the one we wanted is to pass two parameters for it - two different chars - and tr will replace the first parameter by the second one. Sending the file content as input for tr '\r' '\n', we got the following result:
$ tr '\r' '\n' < slashr.txt | od -xc
0000000 6261 0a63 6564 0a66
a b c \n d e f \n
0000010
Great! Now we can use sed:
$ tr '\r' '\n' < slashr.txt | sed 's/^./#/'
#bc
#ef
$ tr '\r' '\n' < slashr.txt | sed 's/^./#/' | od -xc
0000000 6223 0a63 6523 0a66
# b c \n # e f \n
0000010
But I presume you need to use \r as the line delimiter, right? In this case, just use tr '\n' '\r' to reverse the conversion:
$ tr '\r' '\n' < slashr.txt | sed 's/^./#/' | tr '\n' '\r' | od -xc
0000000 6223 0d63 6523 0d66
# b c \r # e f \r
0000010

As far as I know, you can't. What's wrong with using a newline as the delimiter? If your input has DOS-style \r\n line endings it can be preprocessed to remove them and, if necessary, they can be returned afterwards.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

optimize sed multiple expressions including white spaces and square brackets - sed

Related

How to Replace special Character in Unix Command

How to convert a ASCII NULL (NUL) into single spacing in a text file using Unix command?

Why does sed only replace the first character?

Can sed search & replace on a match if that match in only part of a line?

How to change what sed thinks is the line delimiter

Categories

Resources