Why does sed (insert line) output spaces between each character? - sed

I have split a larger data file into individual 2-column files for each field. This results in something like this:
0.00 3.02211e+07
1.00 3.02211e+07
2.00 3.02211e+07
3.00 3.02211e+07
4.00 3.02211e+07
5.00 3.01295e+07
6.00 3.00608e+07
7.00 2.99768e+07
When I try to add a row via sed,
sed -i '1i pressure-prof' myfile.txt the output has a space character between each character (including existing spaces). If I look in notepad++, the extra spaces appear as the ASCII "NULL". In the terminal it looks like this:
pressure-prof
0 . 0 0 3 . 0 2 2 1 1 e + 0 7
1 . 0 0 3 . 0 2 2 1 1 e + 0 7
2 . 0 0 3 . 0 2 2 1 1 e + 0 7
3 . 0 0 3 . 0 2 2 1 1 e + 0 7
4 . 0 0 3 . 0 2 2 1 1 e + 0 7
5 . 0 0 3 . 0 1 2 9 5 e + 0 7
6 . 0 0 3 . 0 0 6 0 8 e + 0 7
7 . 0 0 2 . 9 9 7 6 8 e + 0 7
This is on Windows, and I think sed is being provided by cygwin or msys2. I don't know if that has anything to do with the output format issues.
Yes, I can resort to opening up files in a text editor and just adding that way. I would like to be able to utilize sed in the future though.
Thanks for any thoughts and assistance.

cat myfile.txt | tr -d ' ' | sed 's/./0 /4' | sed '1s/0 //' > mf2 && mv mf2 myfile.txt
Run that after you've finished adding your rows. Using tr initially wipes all the spaces, and then sed counts to the fourth character and re-adds a space.

Related

Replace value if just first character of line matches in sed command

I am trying to replace the value with N.A if the first column is less than or equal to 5 means between 0 and 5 using this command :
sed -e '/^[0-5]/ s/2/N.A/g' snp.example.1 > result2
For instance input line,
4 2 0 0 2 0 0 2 0 2 0
Converted to:
4 N.A 0 0 N.A 0 0 N.A 0 N.A 0
But instead of just checking for the first character it also looks for the second character and replaces the values with N.A.
For instance input line should not be change as it's first column contains value (33) which is greater than 5:
33 2 2 2 2 2 2 2 2 2 2
But its also get converted:
33 N.A N.A N.A N.A N.A N.A N.A N.A N.A N.A
Your kind help will be highly appreciated.
You can do this with sed, but awk seems a better choice for making numerical comparisons:
awk '$1 < 6{gsub("2","N.A")}1' input
To restrict the changes to lines whose first number is 5 or less, try:
sed -e '/^[0-5] / s/2/N.A/g' example
Note the space afer [0-5].
For example, consider this input file:
$ cat example
4 2 0 0 2 0 0 2 0 2 0
33 2 2 2 2 2 2 2 2 2 2
Our command produces:
$ sed -e '/^[0-5] / s/2/N.A/g' example
4 N.A 0 0 N.A 0 0 N.A 0 N.A 0
33 2 2 2 2 2 2 2 2 2 2

replace character with increasing numbers per line

I have large matrix files consisting of only "0" and "a" in clolumns and I want to do what this does:
perl -pe 'BEGIN { our $i = 1; } s/a/($i++)/ge;'; < FILE > NEW_FILE
but only increment once for each line instead of every instance on each line.
So if my first line in the file is:
0 0 a a a
The perl command gives me:
0 0 1 2 3
While i would want
0 0 1 1 1
and on the next line for instance 2 0 2 0 2 and so on...
This should be possible to do with awk, but using:
'{ i=1; gsub(/a/,(i+1));print}' tmp2
just gives me 0's and 2's for all lines...
Just increment before, not on every substitution:
awk '{i++; gsub(/a/,i)}1' file
This way, the variable gets updated once per line, not once per record.
The same applies to the Perl script:
perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' file
Test
$ cat a
0 0 a a a
2 3 a a a
$ awk '{i++; gsub(/a/,i)}1' a
0 0 1 1 1
2 3 2 2 2
$ perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' a
0 0 1 1 1
2 3 2 2 2
You can simply replace every occurrence of a with the current line number
perl -pe 's/a/$./g' FILE > NEW_FILE
perl -pe'$i++;s/a/$i/g'
or if you like to increment only for lines with any substitution
perl -pe'/a/&&$i++;s/a/$i/g'
In action:
$ cat a
0 0 a a a
1 2 0 0 0
2 3 a a a
$ perl -pe'$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 3 3 3
$ perl -pe'/a/&&$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 2 2 2

Percentage of each value in a row respective to first value

I've a big file which consisting the data in following format.
11 6 2 3
19 5 1 13
9 3 0 6
15 7 1 7
7 6 0 1
9 3 4 2
I want to calculate percentage of each value of a row starting from 2nd column respective to the first column value. Something like (6/11)*100; (2/11)*100; (3/11)*100 for every row in the file.
Expected output
54.5 18.1 27.2
26.3 5.2 68.4
...
...
I've tried in awk,
awk '{a=($2/$1)*100; b=($3/$1)*100; c=($4/$1)*100}END{print a, b, c}'`
and the result is awk: cmd. line:1: (FILENAME=try FNR=27) fatal: division by zero attempted. Is that only due to presence of 0 in some of rows or anything wrong with the awk oneliner?
Yes, you have zeros in some $1s. You need something like:
$ cat file
11 6 2 3
0 5 1 13
9 3 0 6
15 7 1 7
0 6 0 1
9 3 4 2
$ awk 'BEGIN{CONVFMT="%.1f"} {for (i=2;i<=NF;i++) $i=($1==0?"NaN":$i*100/$1)} 1' file
11 54.5 18.2 27.3
0 NaN NaN NaN
9 33.3 0 66.7
15 46.7 6.7 46.7
0 NaN NaN NaN
9 33.3 44.4 22.2
Replace "NaN" with whatever else you want displayed when $1 is zero if you don't like "NaN" (you should have included that case in your sample input).
Using perl from command line,
perl -lane 'print join "\t", map $F[0] ? $_*100/$F[0] : "Nan", #F[1..$#F]' file

Pattern multiply - sed?

I have these kind of rows
0 1 1
I would like to multiply it by let's say 2 or 4 to get this pattern
0 0 0 0 1 1 1 1 1 1 1 1
Now, I have some piece of old code, which basically does this in the case of multiplying by 5.
But I cannot convert this script to do it for example 2 or 4 times...
Can anyone help me to figure it out?
Here is the code:
sed -e 's/\([01]\)/\1\1\1\1/7g ; s/\([01]\{2,\}\)/\1\1\1/g ; s/\b\([01]\)\b/\1\1\1\1\1/g ; s/\([01]\)\B/\1 /g'
$ echo '0 1 1' | sed -r 's/\S/& & & & &/g'
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
using sed repeat 4 times:
kent$ echo "0 1 1
1 1 0"|sed 's/[01]/& & & &/g'
0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 0 0 0
with awk, you can give the times you want to repeat as parameter: e.g. say repeat 5 times:
kent$ echo "0 1 1
dquote> 1 1 0"|awk -v t=5 '{f=1;while(f<=NF){ n=1;while(n<=t){printf "%s ",$f;n++;}f++;} print "";}'
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
This might work for you:
echo -e '0 1 1\n1 1 1 0 0' | sed "s/\S/$(echo {1..4}| sed 's/\S*/\&/g')/g"
0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
This provides an OTT solution but it is programmable i.e. change 4 to any value you wish to multiply by.

line extraction dependin on range for specific colums

I would like to extract some lines from a text file, I have started to tweak sed lately,
I have a file with the structure
88 3 3 0 0 1 101 111 4 3
89 3 3 0 0 1 3 4 112 102
90 3 3 0 0 1 102 112 113 103
91 3 3 0 0 2 103 113 114 104
What I would like to do is to extract the information according to the second column, I use sth like in my bash script(argument 2 is infile)
sed -n '/^[0-9]* [23456789]/ p' < $2 > out
however I have different entries other than the range [23456789], for instance 10, since it is composed of 1 and 0, to get that these two characters should be in the range I guess, however there are entries with '1'(for the second column) that I do not like to keep so how can write '10's but not '1's.
Best,
Umut
sed -rn '/^[0-9]* ([23456789]|10)/ p' < $2 > out
You need the extend-regexp support (-r) to have the | operator (or)
Another interesting way is:
sed -rn '/^[0-9]* ([23456789]|[0-9]{2,})/ p' < $2 > out
Which means [23456789] or 2 or more repetition of a digit.
The instant you see variable-sized columns in your data, you should start thinking about awk:
awk '$2 > 1 && $2 < 11 {print}{}'
will do the trick assuming your file format is correct.
sed -rn '/^[0-9]* (2|3|4|5|6|7|8|9|10)/p' < $2 > out