How to change the decimal separator with awk/sed? - sed

How to change number format (different decimal separator) from XXXXXX.XXX to XXXXXX,XXX using sed or awk?

How rigorous do you want to be? You could change all . characters, as others have suggested, but that will allow a lot of false positives if you have more than just numbers. A bit stricter would be to require that there are digits on both sides of the point:
$ echo 123.324 2314.234 adfdasf.324 1234123.daf 255.255.255.0 adsf.asdf a1.1a |
> sed 's/\([[:digit:]]\)\.\([[:digit:]]\)/\1,\2/g'
123,324 2314,234 adfdasf.324 1234123.daf 255,255,255,0 adsf.asdf a1,1a
That does allow changes in a couple of odd cases, namely 255.255.255.0 and a1.1a, but handles "normal" numbers cleanly.

Wouldn't this be more accurate as the OP whas talking about numbers.. to make sure it is a leading number before the dot. The document could hold other dots that the OP don't want to substitute.
sed '/[0-9]\./s/\./,/g'

If you want to replace the decimal separator for cosmetic purposes
In most cases tr is probably the easiest way to substitute characters :
$ echo "0.3"|tr '.' ','
0,3
Of course if you deal with input mixing numbers and strings, you will need a more robust approach, like the one proposed by Michael J. Barber or even more.
If you want to replace the decimal separator for computation purposes
By default gawk (GNU awk, i.e. the awk of most GNU/Linux distributions) uses the dot as decimal separator :
$ echo $LC_NUMERIC
fr_FR.UTF-8
$ echo "0.1 0.2"|awk '{print $1+$2}'
0.3
$ echo "0,1 0,2"|awk '{print $1+$2}'
0
However you can force it to use the decimal separator of the current locale using the --use-lc-numeric option :
$ echo $LC_NUMERIC
fr_FR.UTF-8
$ echo "0.1 0.2"|awk --use-lc-numeric '{print $1+$2}'
0
$ echo "0,1 0,2"|awk --use-lc-numeric '{print $1+$2}'
0,3
If the input format is different from the current locale, you can of course redefine LC_NUMERIC temporarily :
$ echo $LC_NUMERIC
fr_FR.UTF-8
$ echo "0.1 0.2"|LC_NUMERIC=en_US.UTF-8 awk --use-lc-numeric '{print $1+$2}'
0
$ echo "0,1 0,2"|LC_NUMERIC=fr_FR.UTF-8 awk --use-lc-numeric '{print $1+$2}'
0,3
(Credits and other links)

To substitute only the decimal commas in this line:
Total,"14333,374","1243750945,5","100,00%","100,00%","100,00%",1 639 600,"100,00%"
I used back-references (and MacOSX, so I need the -E option):
echo 'Total,"14333,374","1243750945,5","100,00%","100,00%","100,00%",1 639 600,"100,00%"' | sed -E 's/("[0-9]+),([0-9]+%?")/\1\.\2/g'
resulting in
Total,"14333.374","1243750945.5","100.00%","100.00%","100.00%",1 639 600,"100.00%"
The sed command says: "Find every string of the form 'double quotes digit_1,digit_2, followed by one or zero %, double quotes' and replace it by first_match.second_match."

I think
s/\./,/g
should serve what u want... unless u want something more special...

if you have bash/ksh etc
var=XXX.XXX
echo ${var/./,}

You could do this:
echo "XXX.XX" | sed s/\./,/g

Since the question is also tagged awk:
awk 'gsub(/\./,",")||1'

Related

How to remove after second period in a string using sed

In my script, have a possible version number: 15.03.2 set to variable $STRING. These numbers always change. I want to strip it down to: 15.03 (or whatever it will be next time).
How do I remove everything after the second . using sed?
Something like:
$(echo "$STRING" | sed "s/\.^$\.//")
(I don't know what ^, $ and others do, but they look related, so I just guessed.)
I think the better tool here is cut
echo '15.03.2' | cut -d . -f -2
This might work for you (GNU sed):
sed 's/\.[^.]*//2g' file
Remove the second or more occurrence of a period followed by zero or non-period character(s).
$ echo '15.03.2' | sed 's/\([^.]*\.[^.]*\)\..*/\1/'
15.03
More generally to skip N periods:
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){2}[^.]*)\..*/\1/'
15.03.2
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){3}[^.]*)\..*/\1/'
15.03.2.3
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){4}[^.]*)\..*/\1/'
15.03.2.3.4

tcsh & sed: no output

I’m trying to replace the 3rd column of a file for itself plus the value of column 2 (without any space). I get the proper value for variable c and a but then sed doesn't give any output. Any clue?
#!/bin/tcsh
setenv c `cat lig_mod.pdb | awk '{print $3}'`
echo $c
setenv a `cat lig_mod.pdb | awk '{print $3=$3$2}'`
echo $a
sed -i "" 's/^'"${c}"'$/^'"${a}"'$/g' lig_mod.pdb
Even though awk is usually better for columns parsing this one-liner sed can work for you as well:
sed -i 's/ \(\w*\) \(\w*\) / \1 \2\1 /1' lig_mod.pdb
the '/1' at the end denote the instance number you desire to change which for the 2nd and 3rd columns is the first, but you could use it for any adjacent columns.

Uppercase to Lowercase with Sed and character classes

I'd like to convert a string from upper to lower case. I know there are different ways of solving this problem, but I'd like to understand why this command doesn't work:
echo "aa" | sed 's/'[:upper:]'/'[:lower:]'/g'
Is it a wrong way to use the classes of characters?
from lowercase to uppercase, you can use
echo "aW123bR" | sed -r 's/[a-z]+/\U&/g'
tr command is an interesting alternative
echo "aW123bR" | tr '[:lower:]' '[:upper:]'
In sed, the y command is used for mapping sets of characters:
sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
It requires a literal list of characters, not character classes.
Another possible solution with gawk :
[ ~]$ echo "HELLO"|awk '{print tolower($0)}'
hello

sed/awk : match a pattern and return everything between the end of the pattern and a semicolon

I have a line:
<random junk>TYPE=snp;<more random junk>
and I need to return everything between the end of TYPE= and the ; (in this case snp but it could be any of a number of text strings.
I tried various sed / awk solutions but I can't seem to get it working. I have the feeling this is a simple problem so, sorry about that.
This seems to work:
sed 's/.*TYPE=\(.*\);.*/\1/'
EDIT:
Ah, so there can be semicolons in the random junk. Try this:
sed 's/.*TYPE=\([^;]*\);.*/\1/'
requires GNU grep:
grep -Po '(?<=TYPE=)[^;]+'
meaning: preceded by "TYPE=", find some non-semicolon characters
One way using GNU sed:
sed -r 's/.*TYPE=([^;]+).*/\1/' file.txt
Since you also tagged this awk:
$ text='<random junk>TYPE=snp;<more random junk>'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
$ text='foo=bar;baz=fnu;TYPE=snp;XAI=0;XAM=0'
$ echo "$text" | awk -FTYPE= '{sub(/;.*/,"",$2); print $2}'
snp
(Only using the variable to keep the lines from wrapping.)
Or, to parse this as set of variable=value pairs rather than just a string of text:
$ echo "$text" | awk -vRS=";" -F= '$1=="TYPE" {print $2}'
snp
You can also do this in pure bash, if you want:
$ t="red=blue;TYPE=snp;XAI=0.0037843;XAM=0.0170293;XAS=0.013245;XRI=0;XRM=0"
$ t=${t#*TYPE=}
$ t=${t%%;*}
$ echo $t
snp

How to use a sed one-liner to parse "rec:id=1&name=zz&age=21" into "1 zz 21"?

I can chain multiple sed substitutions and a awk operation to achieve this, but is there a single sed substitution that can do it?
Also is there any other tool that is more suitable for this parsing task?
You could try:
sed -r 's!rec:id=(.*?)&name=(.*?)&age=(.*?)!\1 \2 \3!' input_file
If you don't know the rec:id etc in advance but you know there's three, you could try:
sed -r 's![^=]+=(.*?)&[^=]+=(.*?)&[^=]+=(.*?)!\1 \2 \3!' input_file
If you don't know how many &name=value pairs you're after in advance but want to output all the values, you could try something like:
grep -P -o '(?<==)([^&]*)(?=&|$)' | xargs
where the -P means 'perl regex', the regex says "find the string followed by an & (or end of string) and preceded by and equals sign", the -o means to print just the matches (ie the 1, zz, and 21) each on their own line, and the | xargs moves these from their own line to one line and space separated (ie 1\nzz\n21 to 1 zz 21).
This might work for you:
echo "rec:id=1&name=zz&age=21" | sed 's/[^=]*=\([^&]*\)/\1 /g'
1 zz 21
However this leaves an extra space at the end, to solve this use:
echo "rec:id=1&name=zz&age=21"|sed 's/[^=]*=\([^&]*\)/\1 /g:;s/ $//'
1 zz 21
How about parsing the values directly into variables?
inbound="rec:id=1&name=zz&age=21"
eval $(echo $inbound | cut -c5- | tr \& "\n")
echo "Name:$name, ID:$id, Age:$age"
Or even better, though slightly more arcane:
inbound="rec:id=1&name=zz&age=21"
IFS=\& eval $(cut -c5- <<< $inbound)
echo "Name:$name, ID:$id, Age:$age"