I have split a larger data file into individual 2-column files for each field. This results in something like this:
0.00 3.02211e+07
1.00 3.02211e+07
2.00 3.02211e+07
3.00 3.02211e+07
4.00 3.02211e+07
5.00 3.01295e+07
6.00 3.00608e+07
7.00 2.99768e+07
When I try to add a row via sed,
sed -i '1i pressure-prof' myfile.txt the output has a space character between each character (including existing spaces). If I look in notepad++, the extra spaces appear as the ASCII "NULL". In the terminal it looks like this:
pressure-prof
0 . 0 0 3 . 0 2 2 1 1 e + 0 7
1 . 0 0 3 . 0 2 2 1 1 e + 0 7
2 . 0 0 3 . 0 2 2 1 1 e + 0 7
3 . 0 0 3 . 0 2 2 1 1 e + 0 7
4 . 0 0 3 . 0 2 2 1 1 e + 0 7
5 . 0 0 3 . 0 1 2 9 5 e + 0 7
6 . 0 0 3 . 0 0 6 0 8 e + 0 7
7 . 0 0 2 . 9 9 7 6 8 e + 0 7
This is on Windows, and I think sed is being provided by cygwin or msys2. I don't know if that has anything to do with the output format issues.
Yes, I can resort to opening up files in a text editor and just adding that way. I would like to be able to utilize sed in the future though.
Thanks for any thoughts and assistance.
cat myfile.txt | tr -d ' ' | sed 's/./0 /4' | sed '1s/0 //' > mf2 && mv mf2 myfile.txt
Run that after you've finished adding your rows. Using tr initially wipes all the spaces, and then sed counts to the fourth character and re-adds a space.
How to select those lines which have a value < 10 value from a large matrix of 21 columns and 150 rows.eg.
miRNameIDs degradome AGO LKM......till 21
osa-miR159a 0 42 42
osa-miR396e 0 7 9
vun-miR156a 121 77 4
ppt-miR156a 12 7 4
gma-miR6300 118 2 0
bna-miR156a 0 114 48
gma-miR156k 0 46 1
osa-miR1882e 0 7 0
.
.
.
Desired output is:-
miRNameIDs degradome AGO LKM......till 21
vun-miR156a 121 77 4
gma-miR6300 118 2 0
bna-miR156a 0 114 48
.
.
.
till 150 rows
Using a perl one-liner
perl -ane 'print if $. == 1 || grep {$_ > 50} #F[1..$#F]' file.txt
Explanation:
Switches:
-a: Splits the line on space and loads them in an array #F
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
$. == 1: Checks if the current line is line number 1.
grep {$_ > 50} #F[1..$#F]: Looks at each entries from the array to see if it is greater than 50.
||: Logical OR operator. If any of our above stated condition is true, it prints the line.
Why is Ghostscript pswrite encoding my text in its output? Consider the following MWE:
%!PS-Adobe-3.0
%%Title: mwe.ps
%%Pages: 001
%%BoundingBox: 0 0 595 842
%%EndComments
%%Page: 1 1
%%PageBoundingBox: 0 0 595 842
0 0 1 setrgbcolor
0 0 595 842 rectfill
1 0 0 setrgbcolor
247 371 100 100 rectfill
/Times-Roman findfont
72 scalefont
setfont
newpath
247 300 moveto
(Chris) show
showpage
Saving this MWE to file and viewing in GSview will display a blue page with red square and my name underneath. Now run this file through Ghostscript 9.06 with the following command line:
"c:\Program Files\gs\gs9.06\bin\gswin64c.exe" ^
-dSAFER -dBATCH -dNOPAUSE ^
-sDEVICE=pswrite -sPAPERSIZE=a4 -r72 -sOutputFile=mwe_gs.ps mwe.ps
See Ghostscript output below. Can someone please explain what is happening here. Whilst the two rectfill commands are still apparent, my text (Chris) has been encoded and is no longer distinguishable.
Is there an alternative postscript device which would retain my text please?
<snip>
%%Page: 1 1
%%PageBoundingBox: 0 0 595 842
%%BeginPageSetup
GS_pswrite_2_0_1001 begin
595 842 /a4 setpagesize
/pagesave save store 197 dict begin
1 1 scale
%%EndPageSetup
gsave mark
255 0 r6
0 0 595 842 rf
255 0 r3
247 371 100 100 rf
Q q
0 0 595 0 0 842 ^ Y
255 0 r3
249 299 43 50 /5D
$C
,6CW56m1G"ZORNkWR*rB:!c2;9rlWTH="2^^[(q"h>cG<omZ2l^=qC[XbO:8_[?kji-8^"N#3q*
jhL~>
,
289 300 41 49 /0P
$C
4r?0p$m<EkK3,0>s8W-!s8W-!s8W,u]<1irI=*p=<t0>_#<)>Is8K6,aTi'$~>
,
325 300 30 33 /5I
$C
49S"pc4+Rhs8W-!s8W)oqdD:saRZq[4+k%):]~>
,
349 300 24 49 /0T
$C
4q%Ms%;PqCs8W-!s8W%1_qkn/K?*sYFSGd:5Q~>
,
377 299 23 34 /5M
$C
-TQR7$&O'!K+D:XribR9;$mr4#sqUi.T#,dX=Y&Llg+F`d^HC#%$"]~>
,
cleartomark end end pagesave restore
showpage
%%PageTrailer
%%Trailer
%%Pages: 1
%%EOF
NOTE: This might seem an odd activity but I'm exploring the idea of using Ghostscript to 'clean up' postscript output from Matlab application..
The 'text' has been converted to images, not vector paths. This is a serious limitation of the pswrite device, and one of the reasons it is deprecated, you should use the ps2write device instead. The only reason the pswrite device is still included at all is for epswrite which uses it (which is why the pswrite and epswrite output looks the same). At some point there will be an eps2write device and pswrite will be binned.
ps2write output is, by default, compressed. If you want uncompressed output, use the -dCompressPages=false switch on the command line.
If all you want is the location of the text you might consider the txtwrite device. The default implementation of this creates a plain text representation of the input, but you can have it output a faked up XML instead which includes things like the origin of the text.
Here is a simple example of the show operator being redefined to display position information about the show, along with performing the standard show operation. With ghostscript you can run multiple files, so the header file would be a prefix to the other file, which alters standard behavior.
The redefined show could have included font name and size. The data could have been written to a disk file, rather than dumped to the console. Any of other operator could have also been redefined, like rectfill, fill, stroke... Because the original operator is also called, you can convert a .ps to .pdf using a pdfwrite device, while at the same time obtaining position information.
gswin32c.exe -dBATCH -dNOPAUSE header.ps trash.ps
gswin32c.exe -sDEVICE=pdfwrite -dCompressPages=false -sOutputFile=test.pdf header.ps trash.ps
output
currentpoint x:247.0 y:300.0 pathbbox 249.015,298.992 400.066,349.184 text:Chris currentrgbcolor:1.0,0.0,0.0( )
currentpoint x:50.0 y:90.0 pathbbox 50.8682,89.2852 181.327,139.184 text:Fred currentrgbcolor:1.0,0.0,0.0( )
currentpoint x:150.0 y:200.0 pathbbox 150.867,184.298 304.154,247.673 text:Mary currentrgbcolor:1.0,0.0,0.0( )
currentpoint x:300.0 y:350.0 pathbbox 300.867,348.993 598.79,398.681 text:Mr. Green currentrgbcolor:0.0,1.0,0.0( )
currentpoint x:100.0 y:400.0 pathbbox 100.866,399.202 358.547,449.183 text:Mr. Blue currentrgbcolor:0.0,0.0,1.0( )
Header.ps
/mydict 5 dict def
mydict begin
/show
{
(currentpoint ) print
currentpoint exch 10 string cvs ( x:) print print 10 string cvs ( y:) print print
gsave dup false charpath flattenpath
( pathbbox ) print
pathbbox
4 -1 roll 10 string cvs print (,) print
3 -1 roll 10 string cvs print ( ) print
2 -1 roll 10 string cvs print (,) print
10 string cvs print ( ) print
grestore
( text:) 10 string cvs print
dup print ( ) print
( currentrgbcolor:) print
currentrgbcolor
3 -1 roll 10 string cvs print (,) print
2 -1 roll 10 string cvs print (,) print
10 string cvs print ( ) ==
systemdict /show get exec
} def
trash.ps
%!PS-Adobe-3.0
%%Title: mwe.ps
%%Pages: 001
%%BoundingBox: 0 0 595 842
%%EndComments
%%Page: 1 1
%%PageBoundingBox: 0 0 595 842
0 0 1 setrgbcolor
0 0 595 842 rectfill
1 0 0 setrgbcolor
247 371 100 100 rectfill
/Times-Roman findfont
72 scalefont
setfont
newpath
247 300 moveto (Chris) show
50 90 moveto (Fred) show
150 200 moveto (Mary) show
0 1 0 setrgbcolor
300 350 moveto (Mr. Green) show
0 0 1 setrgbcolor
100 400 moveto (Mr. Blue) show
showpage
The text has been converted to vector paths. 249 299 43 50 /5D begins the first letter "C", then 289 300 is the "h", 289 300 the "r"....
What pswrite has done is eliminate the need for a font, so while your original code used /Times-Roman, the distilled code doesn't need any font, but rather draws the text using vectors.
I'm not sure exactly what you are after, but you could try "ps2write" or "epswrite" as alternatives to "pswrite". pswrite is used to write to ps level 1 standard and ps2write will write ps level 2 output. Nobody requires ps level 1 anymore, so level 2 would be acceptable. The epswrite will write to encapsulated postscript (eps).
I'm trying to add a bunch of 0s at the end of a line. The way the line is identified is that it is followed by a line which starts with "expr1"
in Vim what I do is:
s/\nexpr1/ 0 0 0 0 0 0\rexpr1/
and it works fine. I know that in ubuntu \n is what is normally used to terminate the line but whenever I do that I get a ^# symbol so \r works fine for me. I thought I'd use this with sed but it hasn't really worked. here is what I normally write:
sed "s/\nexpr1/ 0 0 0 0 0 0\rexpr1/" infile > outfile
The end-of-line marker is $. Try this:
s/$/ 0 0 0 0 0 0/
Depending on your environment, you might need to escape the $.
awk '{$0=$0" 0 0 0 0 0 "}1' file > tmp && mv tmp file
ruby -i.bak -ne '$_=$_.chomp!+" 0 0 0 0 0\n";print' file
awk '$(NF + 1) = " 0 0 0 0 0 0"' infile > outfile
I would like to extract some lines from a text file, I have started to tweak sed lately,
I have a file with the structure
88 3 3 0 0 1 101 111 4 3
89 3 3 0 0 1 3 4 112 102
90 3 3 0 0 1 102 112 113 103
91 3 3 0 0 2 103 113 114 104
What I would like to do is to extract the information according to the second column, I use sth like in my bash script(argument 2 is infile)
sed -n '/^[0-9]* [23456789]/ p' < $2 > out
however I have different entries other than the range [23456789], for instance 10, since it is composed of 1 and 0, to get that these two characters should be in the range I guess, however there are entries with '1'(for the second column) that I do not like to keep so how can write '10's but not '1's.
Best,
Umut
sed -rn '/^[0-9]* ([23456789]|10)/ p' < $2 > out
You need the extend-regexp support (-r) to have the | operator (or)
Another interesting way is:
sed -rn '/^[0-9]* ([23456789]|[0-9]{2,})/ p' < $2 > out
Which means [23456789] or 2 or more repetition of a digit.
The instant you see variable-sized columns in your data, you should start thinking about awk:
awk '$2 > 1 && $2 < 11 {print}{}'
will do the trick assuming your file format is correct.
sed -rn '/^[0-9]* (2|3|4|5|6|7|8|9|10)/p' < $2 > out