Add leading 0 in sed substitution - sed

I have input data:
foo 24
foobar 5 bar
bar foo 125
and I'd like to have output:
foo 024
foobar 005 bar
bar foo 125
So I can use this sed substitutions:
s,\([a-z ]\+\)\([0-9]\)\([a-z ]*\),\100\2\3,
s,\([a-z ]\+\)\([0-9][0-9]\)\([a-z ]*\),\10\2\3,
But, can I make one substitution, that will do the same? Something like:
if (one digit) then two leading 0
elif (two digits) then one leading 0
Regards.

I doubt that the "if - else" logic can be incorporated in one substitution command without saving the intermediate data (length of the match for instance). It doesn't mean you can't do it easily, though. For instance:
$ N=5
$ sed -r ":r;s/\b[0-9]{1,$(($N-1))}\b/0&/g;tr" infile
foo 00024
foobar 00005 bar
bar foo 00125
It uses recursion, adding one zero to all numbers that are shorter than $N digits in a loop that ends when no more substitutions can be made. The r label basically says: try to do substitution, then goto r if found something to substitute. See more on flow control in sed here.

Use two substitute commands: the first one will search for one digit and will insert two zeroes just before, and the second one will search for a number with two digits and will insert one zero just before. GNU sed is needed because I use the word boundary command to search for digits (\b).
sed -e 's/\b[0-9]\b/00&/g; s/\b[0-9]\{2\}\b/0&/g' infile
EDIT to add a test:
Content of infile:
foo 24 9
foo 645 bar 5 bar
bar foo 125
Run previous command with following output:
foo 024 009
foo 645 bar 005 bar
bar foo 125

Add the max number of leading zeros first, then take this number of characters from the end:
echo 55 | sed -e 's:^:0000000:' -e 's:0\+\(.\{8\}\)$:\1:'
00000055

You seem to have the sed options covered, here's one way with awk:
BEGIN { RS="[ \n]"; ORS=OFS="" }
/^[0-9]+$/ { $0 = sprintf("%03d", $0) }
{ print $0, RT }

I find the following sed approach to pad an integer number with zeroes to 5 (n) digits quite straighforward:
sed -e "s/\<\([0-9]\{1,4\}\)\>/0000\1/; s/\<0*\([0-9]\{5\}\)\>/\1/"
If there is at least one, at most 4 (n-1) digits, add 4 (n-1) zeroes in
front
If there is any number of zeroes followed by 5 (n) digits after the first transformation, keep just these last 5 (n) digits
When there happen to be more than 5 (n) digits, this approach behaves the usual way -- nothing is padded or trimmed.
Input:
0
1
12
123
1234
12345
123456
1234567
Output:
00000
00001
00012
00123
01234
12345
123456
1234567

This might work for you (GNU sed):
echo '1.23 12,345 1 12 123 1234 1' |
sed 's/\(^\|\s\)\([0-9]\(\s\|$\)\)/\100\2/g;s/\(^\|\s\)\([0-9][0-9]\(\s\|$\)\)/\10\2/g'
1.23 12,345 001 012 123 1234 001
or perhaps a little easier on the eye:
sed -r 's/(^|\s)([0-9](\s|$))/\100\2/g;s/(^|\s)([0-9][0-9](\s|$))/\10\2/g'

Related

How to substitute with basic regex with alternating signs?

I want to do the following to all of the statements in the file:
Input: xblahxxblahxxblahblahx
Output: <blah><blah><blahblah>
So far I am thinking of using sed -i 's/x/</g' something.ucli
You can use
sed 's/x\([^x]*\)x/<\1>/g'
Details:
x - an x
\([^x]*\) - Group 1 (\1 refers to this group value from the replacement pattern): zero or more (*) chars other than x ([^x])
x - an x
See the online demo:
#!/bin/bash
s='xblahxxblahxxblahblahx'
sed 's/x\([^x]*\)x/<\1>/g' <<< "$s"
# => <blah><blah><blahblah>
If x is a multichar string, e.g.xyz, it will be easier with perl:
perl -pe 's/xyz(.*?)xyz/<$1>/g'
See this online demo.

sed: behavior of H and D

My sed script is this:
# script.sed
1,3H
1,3g
3D
When I run it, I get the following:
$ seq 5 | sed -f script.sed
1
1
2
4
5
However, this seems wrong to me. On line 3, once the D command is executed, the pattern space has
1
2
3
When the cycle is restarted, H should set the hold space to:
<empty_line>
1
2
3
1
2
3
Then, g should set the pattern space to the same content. D will then remove the first (empty) line. Every time the cycle is restarted, the hold space will effectively double. Hence, this should lead to an infinite loop.
What am I missing?
Below, I show how I interpret the expected execution, showing as an ordered pair the result of the command, with the pattern space first and the hold space following:
1: H(1,\n1) g(\n1,\n1) > \n1\n
2: H(2,\n1\n2) g(\n1\n2,\n1\n2) > \n1\n2\n
3: H(3,\n1\n2\n3) g(\n1\n2\n3,\n1\n2\n3) D(,\n1\n2\n3) >
4: > 4\n
5: > 5\n
If I take the output of this interpretation and concatenate it into an echo command with the -e option, I get:
$ echo -e '\n1\n\n1\n2\n4\n5\n'
1
1
2
4
5

Using tab as sed separator

I would like to include tab as delimited new row to a file inp.txt.
This is the input produced by R:
inp <- 'AX-1 1 125
AX-2 2 456
AX-3 3 3445'
inp <- read.table(text=inp, header=F)
write.table(inp, "inp.txt", col.names=F, row.names=F, quote=F, sep="\t")
That´s what I am trying to do:
sed -i '1i The name\tThe pos\tThe pos2\' inp.txt
However, those three col names: 1- The name, 2- The pos, 3- The pos2 are not separated by tab in the output file. It just contain the \t string. Someone can help me here with the syntax?
Put the tab in a variable:
tab=$(echo "\t")
or
tab=$'\t'
Then you can use it in your sed script:
sed -i "1i The name${tab}The pos${tab}The pos2" inp.txt

sed: Delete first line of hold space?

How do I delete the first line of the hold space in sed?
I've tried
x;
s/.*\n//;
x;
But .*\n matches up to the last newline, deleting all the lines except for the last one.
this should remove the 1st line from "hold space"
x;s/[^\n]*\n//
Example:
kent$ sed -n 'H;${x;p}' <(seq 3)
1
2
3
remove the first empty line:
kent$ sed -n 'H;${x;s/[^\n]*\n//;p}' <(seq 3)
1
2
3
Simple put any random string with h i.e 1h;1d, by default it's empty.

Multiple mathematical operations on a file containing numbers

I have extracted the following data using 'grep' & 'sed' pipes from a file and now I want to perform a mathematical equation on the last two numbers, delete them and replace them with a single number.
Mathematical operations
Add the numbers together
divide by 2
multiply by 141
ROUNDUP to whole number
File Data
AJ29 IO_0_VRN_10 77.234 78.011
AJ30 IO_L1P_T0_100M 89.886 90.789
AJ31 IO_L1N_T0_100S 101.388 102.406
AK29 IO_L2P_T0_101M 66.163 66.828
AL29 IO_L2N_T0_101S 63.626 64.266
So the line starting AJ29 should appear as:
AJ29 IO_0_VRN_10 10945
I could put it in MS excel / Open Office calc and do this but want to avoid MS and keep it in a single linux script if it is possible. Hope you can help. The script I have so far is below and ideally I'd like to add a few more pipes to achieve this.
grep IOB xc7vx690tffg1930.pkg | sed 's/pin//g' | sed 's/IOB_[A-Za-z0-9]*//g' | sed 's/ /-/g' | sed 's/\t//g' | sed 's/^[-]*//g' | sed 's/-/ /g' | sed 's/ [0-9][0-9] //g' | sed 's/[[:space:]]\+/,/g' | sed 's/,X[0-9A-Z]*,//g' | sed 's/,[0-9]*[A-Z],//g' | sed 's/N\.A\.,/,/g' | sed 's/,$//g' | sed 's/,/ /g'
For calculations, use awk!
$ awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 141); NF--}1' file
AJ29 IO_0_VRN_10 10945
AJ30 IO_L1P_T0_100M 12738
AJ31 IO_L1N_T0_100S 14367
AK29 IO_L2P_T0_101M 9376
AL29 IO_L2N_T0_101S 9016
This replaces the penultimate field with the result of (penultimate*last)/2 * 141). To make it round, we use %.0f format as indicated in Awk printf number in width and round it up.
Also, it looks to me that you are piping way too many things: I counted one call to grep and 13 (!) to sed. You can probably use sed -e 'first block' -e 'second block' ... instead.
Explanation
In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
{...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
$(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
{$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
{...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".