copy and replacing from file1 to file specific line - sed

I have two files, file1.traj and file2.traj. Both these files contain identical data and the data are arranged in same format in them.
The first line of both files is a comment. At line 7843 of both files there is cartesian coordinate X, Y and Z. And at line 15685
there is another cartesian coordinate X, Y and Z. The number of lines in between two cartesian coordinates are 7841.
What I want to do is copy ONLY the X Y Z coordinates from file1.traj to file2.traj throughout the whole file.
I tried to use paste command but I cant do for specified line alone.
Here i showed the data format in the file. I used the number for clarity purpose.
line.1 trajectory generated by ptraj
line.2 5.844 4.178 7.821 6.423 4.054 8.578 6.606 4.907 6.827 7.557
line.3 4.385 6.722 6.877 6.384 7.283 5.950 6.884 7.565 7.668 6.282
line.2 8.474 7.721 7.127 8.928 7.628 7.205 6.259 8.589 6.712 6.110
line.3 7.712 8.602 6.643 8.151 8.654 7.495 6.940 7.183 4.871 6.108
line.4 7.887 4.864 7.755 7.814 3.754 8.697 7.267 3.724 7.081 7.633
line.5 2.478 6.246 8.089 2.604 8.026 8.853 3.943 6.623 5.754 4.529
.
.
.
. 1.516 41.749 54.260 0.108 41.176 54.536 -0.626 40.627 53.818 -0.303
. 41.920 42.179 3.556 3.251 41.623 3.530 2.472 42.558 2.678 3.304
. 44.723 1.496 5.937 44.339 1.355 6.803 44.866 0.614 5.593 52.401
line.7842 86.323 2.974 52.385 85.816 3.785 51.879 85.808 2.359
line.7843 104.140 159.533 88.303
line.7844 4.792 5.052 8.317 5.279 4.463 8.898 5.663 5.341 7.220 6.267
line.7845 4.438 7.137 6.477 6.566 7.627 5.857 7.407 7.936 7.301 6.170
. 8.741 7.647 7.020 9.023 7.315 7.107 6.475 8.171 6.435 6.413
. 7.823 8.416 6.704 8.208 8.473 7.582 6.560 7.126 5.141 5.816
.
.
.
. 52.050 7.905 42.026 38.561 1.747 39.847 39.375 2.235 39.972 38.634
. 1.382 38.965 0.810 0.477 39.394 0.717 -0.349 39.867 0.222 1.081
. 39.847 43.073 5.033 2.756 43.387 5.428 1.942 42.256 4.598 2.511
line.15683 47.302 4.261 7.071 47.801 4.632 7.799 47.256 4.968 6.428 54.279
line.15684 0.498 3.477 53.964 0.612 2.580 53.500 0.612 4.021
line.15685 104.140 159.533 88.303
line.15686 4.970 4.868 7.979 5.342 4.250 8.612 5.988 5.450 7.184 6.903
line 15687 4.861 7.246 6.381 6.921 7.550 5.526 7.597 7.536 6.953 7.009
.
.
Is there any possibility to use awk or sed?

Using AWK you can do it,i don't think its perfect answer ,but its tricky
sushanth#mymachine:~$ cat file1.txt
104.140 159.533 88.303
5.844 4.178 7.821 6.423 4.054 8.578 6.606 4.907 6.827 7.557
4.385 6.722 6.877 6.384 7.283 5.950 6.884 7.565 7.668 6.282
8.474 7.721 7.127 8.928 7.628 7.205 6.259 8.589 6.712 6.110
104.140 159.533 88.303
7.712 8.602 6.643 8.151 8.654 7.495 6.940 7.183 4.871 6.108
7.887 4.864 7.755 7.814 3.754 8.697 7.267 3.724 7.081 7.633
2.478 6.246 8.089 2.604 8.026 8.853 3.943 6.623 5.754 4.529
here i have used print only lines of 35 characters or longer using awk
sushanth#mymachine:~$ cat file1.txt | awk 'length < 35' > file2.txt
104.140 159.533 88.303
104.140 159.533 88.303

Related

Ghostscript -- convert PS to PNG, rotate, and scale

I have an application that creates very nice data plots rendered in PostScript with letter size and landscape mode. An example of the input file is at http://febo.com/uploads/blip.ps. [ Note: this image renders properly in a viewer, but the PNG conversion comes out with the image sideways. ] I need to convert these PostScript files into PNG images that are scaled down and rotated 90 degrees for web presentation.
I want to do this with ghostscript and no other external tool, because the conversion program will be used on both Windows and Linux systems and gs seems to be a common denominator. (I'm creating a perl script with a "PS2png" function that will call gs, but I don't think that's relevant to the question.)
I've searched the web and spent a couple of days trying to modify examples I've found, but nothing I have tried does the combination of (a) rotate, (b) resize, (c) maintain the aspect ratio and (d) avoid clipping.
I did find an example that injects a "scale" command into the postscript stream, and that seems to work well to scale the image to the desired size while maintaining the aspect ratio. But I can't find a way to rotate the resized image so that the, e.g., 601 x 792 point (2504 x 3300 pixel) postscript input becomes an 800 x 608 pixel png output.
I'm looking for the ghostscript/postscript fu that I can pass to the gs command line to accomplish this.
I've tried gs command lines with various combinations of -dFIXEDMEDIA, -dFitPage, -dAutoRotatePages=/None, or /All, -c "<> setpagedevice", changing -dDISPLAYWIDTHPOINTS and -dDISPLAYHEIGHTPOINTS, -g[width]x[height], -dUseCropBox with rotated coordinates, and other things I've forgotten. None of those worked, though it wouldn't surprise me if there's a magic combination of some of them that will. I just haven't been able to find it.
Here is the core code that produces the scaled but not rotated output:
## "$molps" is the input ps file read to a variable
## insert the PS "scale" command
$molps = $sf . " " . $sf . " scale\n" . $molps;
$gsopt1 = " -r300 -dGraphicsAlphaBits=4 -dTextAlphaBits=4";
$gsopt1 = $gsopt1 . " -dDEVICEWIDTHPOINTS=$device_width_points";
$gsopt1 = $gsopt1 . " -dDEVICEHEIGHTPOINTS=$device_height_points";
$gsopt1 = $gsopt1 . " -sOutputFile=" . $outfile;
$gscmd = "gs -q -sDEVICE=pnggray -dNOPAUSE -dBATCH " . $gsopt1 . " - ";
system("echo \"$molps\" \| $gscmd");
$device_width_points and $device_height_points are calculated by taking the original image size and applying the scaling factor $sf.
I'll be grateful to anyone who can show me the way to accomplish this. Thanks!
Better Answer:
You almost had it with your initial research. Just set orientation in the gs call:
... | gs ... -dAutoRotatePages=/None -c '<</Orientation 3>> setpagedevice' ...
cf. discussion of setpagedevice in the Red Book, and ghostscript docs (just before section 6.2)
Original Answer:
As well as "scale", you need "rotate" and "translate", not necessarily in that order.
Presumably these are single-page PostScript files?
If you know the bounding box of the Postscript, and the dimensions of the png, it is not too arduous to calculate the necessary transformation. It'll be about one line of code. You just need to ensure you inject it in the correct place.
Chapter 6 of the Blue Book has lots of details
A ubc.ca paper provides some illustrated examples (skip to page 4)
Simple PostScript file to play around with. You'll just need the three translate,scale,rotate commands in some order. The rest is for demonstrating what's going on.
%!
% function to define a 400x400 box outline, origin at 0,0 (bottom left)
/box { 0 0 moveto 0 400 lineto 400 400 lineto 400 0 lineto closepath } def
box clip % pretend the box is our bounding box
clippath stroke % draw initial black bounding box
(Helvetica) findfont 50 scalefont setfont % setup a font
% draw box, and show some text # 100,100
box stroke
100 100 moveto (original) show
% try out some transforms
1 0 0 setrgbcolor % red
.5 .5 scale
box stroke
100 100 moveto (+scaled) show
0 1 0 setrgbcolor % green
300 100 translate
box stroke
100 100 moveto (+translated) show
0 0 1 setrgbcolor % blue
45 rotate
box stroke
100 100 moveto (+rotated) show
showpage
It may be possible to insert the calculated transformation into the gs commandline like this:
... | gs ... -c '1 2 scale 3 4 translate 5 6 rotate' -# ...
Thanks to JHNC, I think I have it licked now, and for the benefit of posterity, here's what worked. (Please upvote JHNC, not this answer.)
## code to determine original size, scaling factor, rotation goes above
my $device_width_points;
my $device_height_points;
my $orientation;
if ($rotation) {
$orientation = 3;
$device_width_points = $ytotal_png_pt;
$device_height_points = $xtotal_png_pt;
} else {
$orientation = 0;
$device_width_points = $xtotal_png_pt;
$device_height_points = $ytotal_png_pt;
}
my $orientation_string =
" -dAutoRotatePages=/None -c '<</Orientation " .
$orientation . ">> setpagedevice'";
## $ps = .ps file read into variable
## insert the PS "scale" command
$ps = $sf . " " . $sf . " scale\n" . $ps;
$gsopt1 = " -r300 -dGraphicsAlphaBits=4 -dTextAlphaBits=4";
$gsopt1 = $gsopt1 . " -dDEVICEWIDTHPOINTS=$device_width_points";
$gsopt1 = $gsopt1 . " -dDEVICEHEIGHTPOINTS=$device_height_points";
$gsopt1 = $gsopt1 . " -sOutputFile=" . $outfile;
$gsopt1 = $gsopt1 . $orientation_string;
$gscmd = "gs -q -sDEVICE=pnggray -dNOPAUSE -dBATCH " . $gsopt1 . " - ";
system("echo \"$ps\" \| $gscmd");
One of the problems I had was that some options apparently don't play well together -- for example, I tried using the -g option to set the output size in pixels, but in that case the rotation didn't work. Using the DEVICE...POINTS commands instead did work.

How to extract data set from a text file?

Am quite new in the Unix field and I am currently trying to extract data set from a text file. I tried with sed, grep, awk but it seems to only work with extracting lines, but I want to extract an entire dataset... Here is an example of file from which I'd like to extract the 2 data sets (figures after the lines "R.Time Intensity")
[Header]
Application Name LabSolutions
Version 5.87
Data File Name C:\LabSolutions\Data\Antoine\170921_AC_FluoSpectra\069_WT3a derivatized lignin LiCl 430_GPC_FOREVER_430_049.lcd
Output Date 2017-10-12
Output Time 12:07:32
[Configuration]
Instrument Name BOTAN127-Instrument1
Instrument # 1
Line # 1
# of Detectors 3
Detector ID Detector A Detector B PDA
Detector Name Detector A Detector B PDA
# of Channels 1 1 2
[LC Chromatogram(Detector A-Ch1)]
Interval(msec) 500
# of Points 9603
Start Time(min) 0,000
End Time(min) 80,017
Intensity Units mV
Intensity Multiplier 0,001
Ex. Wavelength(nm) 405
Em. Wavelength(nm) 430
R.Time (min) Intensity
0,00000 -709779
0,00833 -709779
0,01667 17
0,02500 3
0,03333 7
0,04167 19
0,05000 9
0,05833 5
0,06667 2
0,07500 24
0,08333 48
[LC Chromatogram(Detector B-Ch1)]
Interval(msec) 500
# of Points 9603
Start Time(min) 0,000
End Time(min) 80,017
Intensity Units mV
Intensity Multiplier 0,001
R.Time (min) Intensity
0,00000 149
0,00833 149
0,01667 -1
I would greatly appreciate any idea. Thanks in advance.
Antoine
awk '/^[^0-9]/&&d{d=0} /R.Time/{d=1}d' file
Brief explanation,
Set d as a flag to determine print line or not
/^[^0-9]/&&d{d=0}: if regex ^[^0-9] matched && d==1, disabled d
/R.Time/{d=1}: if string "R.Time" searched, enabled d
awk '/R.Time/,/LC/' file|grep -v -E "R.Time|LC"
grep part will remove the R.Time and LC lines that come as a part of the output from awk
I think it's a job for sed.
sed '/R.Time/!d;:A;N;/\n$/!bA' infile

Boolean function confusion

I'm trying to do the Kmap for F(A,B,C,D)= A’B’C’D’+AC’D’+B’CD’+A’BCD+BC’D . I'm getting a little confused because not all the variable groupings have the same number of variables. some have 4 and some have 3. is this equivalent to F(A,B,C,D) = F(0,2,4,5,7) ? I don't know if you have to do something extra if theres a variable missing. like in the 2nd grouping (AC'D') theres no B. so do we have to do something to compensate for the missing term or is this just 4.
I'm not sure where those numbers 0,4,2,5,7 are coming from, a Karnaugh map (assuming that's what you meant) simply specifies the truth output for given inputs.
If a term is missing, then it has no effect on the outcome so either of its two possible values will affect the truth output. So, in essence, the following two expressions are identical:
AC'D' <=> A(B)C'D' + A(B')C'D'
If more than one term is missing, then you simply allow for more than two possibilities (2n, where n is the number of missing terms). So A would also be:
ABCD + ABCD' + ABC'D + ABC'D' + AB'CD + AB'CD' + AB'C'D + AB'C'D'
(A matched against the 23 = 8 possibilities for the B, C and D variables).
Hence the map for your particular function:
A'B'C'D' + AC'D' + B'CD' + A'BCD+BC'D
would be, for term one A'B'C'D':
AB:00 01 10 11
CD:00 T . . .
01 . . . .
10 . . . .
11 . . . .
or'ed with term two AC'D', equivalent to ABC'D + AB'C'D:
AB:00 01 10 11
CD:00 . . T T
01 . . . .
10 . . . .
11 . . . .
or'ed with term three B'CD', which expands to AB'CD + A'B'CD:
AB:00 01 10 11
CD:00 . . . .
01 . . . .
10 T . T .
11 . . . .
and, finally, or'ed with term four BC'D, equal to ABC'D + A'BC'D:
AB:00 01 10 11
CD:00 . . . .
01 . T . T
10 . . . .
11 . . . .
Combining all these gives you:
AB:00 01 10 11
CD:00 T . T T
01 . T . T
10 T . T .
11 . . . .
A term with 3 variables just means that it covers 2 cells in the Karnaugh map.
AB 00 01 11 10
CD
00 XX YYYYYY
01
11
10
So XX is A'B'C'D' and YYYYYY is AC'D'.
The point in AC'D' is that the value of B is irrelevant, so it can be 1 or 0, so there are 2 cells.
Good luck
Where the variables are missing call the don't care condition. If you are making a truth table for the expression, you don't care if the variables can take both 0 or 1 as per reduce the expression. Or use X (cross) for them

If matched then print all using awk

I have a file which contains many sub-sections each starting with [begin] and ending with [end]:
[begin li1_1378184738754_91]
header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME |0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|
attrs=0|0|0||0|
ptitle=690751404|1|1|1|Rest:1998636||||||2700401|175619|900.5636134725806|0.985486|39.166666666666664|$9.99|100.0|1|||
seller=1998636|1|9.99|1|-1||0|||||true||4.7937584|10412|false|
ptitle=5543369186|2|1|1|Rest:1533891||||||2700211|19615|886.8211044369053|0.776121|34.0|$119.99|100.0|1|||
seller=1533891|1|119.99|3|-1|1.0:text,In+size+6.0%2C7.0%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C...,0.0,,,,0,0,|2|||||true||2.95|20|true|
ptitle=622529158|3|1|1|||||||2700408|67402|796.5289827432475|0.893899|63.0|$5.27|100.0|1|||
seller=4281413|1|5.27|1|-1||0|||||true||4.695052|1769|true|
ptitle=5507199621|4|1|1|||||||2700220|56412|706.9031281251306|0.791171|45.0|$99.99|100.0|1|||
seller=4806107|1|-1.0|1|-1|1.0:sale,$,30.000000000000014,0.0,,,,0,0,:text,In+size+6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8.5%2C9.0%2C9...,0.0,,,,0,0,|2||||$130 $30.00 off|false||5.0|1|false|
ptitle=5502728013|5|1|1|||||||900000|0|698.7772340643119|0.836740|75.0|$40.95|100.0|1|||
seller=955448|1|40.95|1|-1||0|||||false||4.142857|7|false|
ptitle=840662011|6|1|1|Rest:265238||||||300233|62718|683.2927820751431|0.995513|52.0|$22.95|100.0|1|||
seller=265238|1|22.95|1|-1||0|||||false||4.478261|23|false|
ptitle=848084980|8|1|1|||||||2700073|145653|670.4809846773688|0.880587|60.0|$24.99|100.0|1|||
seller=5267046|1|24.99|1|-1||0|||||true||0.0|0|false|
ptitle=891200492|9|1|1|Rest:1030132||||||2701003|17215|668.8437575254773|0.825491|32.0|$519.99|100.0|1|||
seller=1030132|1|519.99|1|-1||0|||||false||4.7391305|23|false|
ptitle=641974054|10|1|1|||||||900000|69433|667.6678790058678|0.752129|57.0|$4.19|100.0|1|||
seller=3365158|1|4.19|1|-1||0|||||true||4.70907|4410|true|
ptitle=517591869|12|1|1|Rest:4802895||||||2700408|127644|643.0972570735605|0.893899|17.25|$23.95|100.0|1|||
seller=4318776|1|-1.0|3|-1||0|||||false||0.0|0|false|
ptitle=541549480|13|1|1|Rest:1180414||||||2702000|105832|597.4904572011968|0.752129|24.666666666666664|$8.27|100.0|1|||
seller=4636561|1|8.27|1|-1||0|||||false||4.8283377|734|true|
ptitle=1020561900|14|1|1|||||||2700063|159813|594.4717491579845|0.934869|75.0|$5.39|100.0|1|||
seller=4722645|1|5.39|1|-1|1.0:sale,$,0.6000000000000005,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$5.99 $0.60 off|true||4.3942246|1593|true|
ptitle=507792308|15|1|1|Rest:4683455||||||2702000|105832|591.7739184402442|0.768311|22.5|$9.48|100.0|1|||
seller=4910651|1|-1.0|2|-1||0|||||false||5.0|1|false|
ptitle=1090571346|16|1|1|Rest:4452919||||||2700211|20824|776.4814913363535|0.776121|35.0|$59.99|100.0|1|||
seller=1533891|1|59.99|1|-1|1.0:sale,$,49.99999999999999,0.0,,,,0,0,:text,In+size+7.5%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C10.5...,0.0,,,,0,0,|2||||$110 $50.00 off|true||2.95|20|true|
ptitle=573017390|17|1|1|||||||2700073|91937|679.695660577044|0.880587|33.5|$14.85|100.0|1|||
seller=4281413|1|14.85|1|-1||0|||||true||4.695052|1769|true|
ptitle=5502723300|18|1|1|||||||900000|0|639.3095640940136|0.836740|75.0|$50.95|100.0|1|||
seller=955448|1|50.95|1|-1||0|||||false||4.142857|7|false|
ptitle=940022974|20|1|1|||||||2700600|58701|569.9503499778303|0.875839|59.0|$14.40|100.0|1|||
seller=4825227|1|14.4|1|12||0|||||true||4.0289855|276|true|
ptitle=5513277553|21|1|1|||||||2700220|56412|565.2712749001105|0.776121|44.33333333333333|$129.95|100.0|1|||
seller=4825252|1|129.95|1|23||0|||||true||4.0289855|276|true|
ptitle=890329961|22|1|1|||||||2700408|133796|564.7642425785796|0.837916|34.75|$61.95|100.0|1|||
seller=4825235|1|61.95|4|19||0|||||true||4.0289855|276|true|
ptitle=753852910|24|1|1|||||||2700073|146738|557.7419123688652|0.934869|47.69230769230769|$26.99|100.0|1|||
seller=4722645|1|26.99|10|-1|1.0:sale,$,3.0,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$29.99 $3.00 off|true||4.3942246|1593|true|
ptitle=654738989|26|1|1|||||||900000|84012|554.7756559595525|0.752129|57.0|$3.19|100.0|1|||
seller=3365158|1|3.19|1|-1||0|||||true||4.70907|4410|true|
ptitle=707747307|27|1|1|Rest:4736009||||||2700063|76249|552.234395428327|0.889614|19.857142857142854|$6.39|100.0|1|||
seller=4736009|1|6.39|1|-1||0|||||false||4.8071113|15356|true|
ptitle=63531001|28|1|1|||||||2700408|82712|625.0421885589608|0.893899|47.166666666666664|$7.69|100.0|1|||
seller=4281413|1|7.69|3|-1||0|||||true||4.695052|1769|true|
ptitle=5502728016|29|1|1|||||||900000|0|605.9895507237038|0.836740|75.0|$503.00|100.0|1|||
seller=955448|1|503.0|1|-1||0|||||false||4.142857|7|false|
ptitle=507792308|31|1|1|Rest:4683455||||||2702000|105832|559.6902659046442|0.752129|22.5|$8.99|100.0|1|||
seller=5105812|1|-1.0|1|-1||0|||||false||0.0|0|false|
ptitle=753852910|32|1|1|||||||2700073|146738|545.9987095658629|0.870929|47.69230769230769|$22.49|100.0|1|||
seller=4143386|1|22.49|6|-1|1.0:sale,$,7.5,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$29.99 $7.50 off|false||4.7316346|2355|true|
ptitle=5513277553|33|1|1|Rest:1533891||||||2700220|56412|653.3133907916089|0.825491|44.33333333333333|$149.99|100.0|1|||
seller=1533891|1|149.99|3|-1|1.0:text,In+size+5.0%2C5.5%2C6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8...,0.0,,,,0,0,|2|||||true||2.95|20|true|
ptitle=63531001|34|1|1|||||||2700408|82712|541.8233547780552|0.893899|47.166666666666664|$7.72|100.0|1|||
seller=2370155|1|7.72|4|-1||0|||||false||4.85|40|false|
ptitle=1018957017|35|1|1|||||||2700073|145653|540.6093714604533|0.860614|56.0|$25.95|100.0|1|||
seller=5036683|1|25.95|1|-1||0|||||false||4.8405056|366|false|
ptitle=743682867|36|1|1|||||||2700073|63437|539.5985846455641|0.870929|58.0|$46.99|100.0|1|||
seller=193176|1|46.99|1|-1||0|||||true||4.8511987|1418|true|
ptitle=679858288|37|1|1|||||||2700063|188669|535.1360632897284|0.902031|30.0|$12.41|100.0|1|||
seller=4143386|1|12.41|2|-1|1.0:sale,$,1.379999999999999,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$13.79 $1.38 off|false||4.7316346|2355|true|
ptitle=994328713|38|1|1|||||||2700073|71463|534.7715925279717|0.870929|58.0|$1.29|100.0|1|||
seller=1787388|1|1.29|1|-1||0|||||false||4.680464|3624|false|
ptitle=886915818|40|1|1|||||||2700444|201835|529.7519801432289|0.934869|65.5|$44.99|100.0|1|||
seller=4561883|1|44.99|2|-1||0|||||true||4.7913384|508|false|
seller_hidden=227502|990765963|1147436601|-1
seller_hidden=5310958|622529158|5645627277|-1
seller_hidden=4825254|5543369186|5651114316|23
seller_hidden=5289138|5548930281|5653769481|-1
[end li1_1378184738754_91]
I am trying to run the command cat /home/nextag/logs/OutpdirImpressions.log.2013-09-02 | awk -F "$begin" '{print $0}' | awk '$0 ~ "header=7075" {print $0}'
As per this command i want to split the entire file into sub-sections beginning with the word 'begin'. Now in that i want those sub-sections which contains 'header=7075'
Expected output is that it will print the entire sub-section(those which contain that string), but i am getting only this portion as output:
header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME
|0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|
I have tried using if in various ways, but it doesn't works. Can somebody please help me.
I tried awk -F "$begin" '{if($0 ~ "header=7075") {print $0}}' /home/nextag/logs/OutpdirImpressions.log.2013-09-02. It gave the same result
Can somebody please suggest that how do i get the complete sub-section in the result
Try this awk one-liner:
awk '$1=="[end"{p=0}/^header=7075/{p=1}p' file
In parts:
$1=="[end"{p=0} if you reach a line, with the first word "[end", then set the flag to zero
/^header=7075/{p=1} If you reach a line, which begins with "header=7075", set set the flag to one
p if the flag is non-zero, print the current line (equivalent to p{print} or p{print $0} or p!=0{print $0}

create file by feching corresponding data from other files?

I have a list of SNPs for example (let's call it file1):
SNP_ID chr position
rs9999847 4 182120631
rs999985 11 107192257
rs9999853 4 148436871
rs999986 14 95803856
rs9999883 4 870669
rs9999929 4 73470754
rs9999931 4 31676985
rs9999944 4 148376995
rs999995 10 78735498
rs9999963 4 84072737
rs9999966 4 5927355
rs9999979 4 135733891
I have another list of SNP with corresponding P-value (P) and BETA (as shown below) for different phenotypes here i have shown only one (let's call it file2):
CHR SNP BP A1 TEST NMISS BETA SE L95 U95 STAT P
1 rs3094315 742429 G ADD 1123 0.1783 0.2441 -0.3 0.6566 0.7306 0.4652
1 rs12562034 758311 A ADD 1119 -0.2096 0.2128 -0.6267 0.2075 -0.9848 0.3249
1 rs4475691 836671 A ADD 1111 -0.006033 0.2314 -0.4595 0.4474 -0.02608 0.9792
1 rs9999847 878522 A ADD 1109 -0.2784 0.4048 -1.072 0.5149 -0.6879 0.4916
1 rs999985 890368 C ADD 1111 0.179 0.2166 -0.2455 0.6034 0.8265 0.4087
1 rs9999853 908247 C ADD 1110 -0.02015 0.2073 -0.4265 0.3862 -0.09718 0.9226
1 rs999986 918699 G ADD 1111 -1.248 0.7892 -2.795 0.2984 -1.582 0.114
Now I want to make two files named file3 and file4 such that:
file3 should contain:
SNPID Pvalue_for_phenotype1 Pvalue_for_phenotype2 Pvalue_for_phenotype3 and so on....
rs9999847 0.9263 0.00005 0.002 ..............
The first column (SNPIDs) in file3 will be fixed (all the snps in my chip will be listed here), and i want to write a programe so that it will match snp id in file3 and file2 and will fetch the P-value for that corresponding snp id and put it in file3 from file2.
file4 should contain:
SNPID BETAvale_for_phenotype1 BETAvale_for_phenotype2 BETAvale_for_phenotype3 .........
rs9999847 0.01812 -0.011 0.22
the 1st column (SNPIDs) in file4 will be fixed (all the SNPs in my chip will be listed here), and I want to write a program so that it will match SNP ID in file4 and file2 and will fetch the BETA for that corresponding SNP ID and put it in file4 from file2.
it's a simple exercise about How to transfer the data of columns to rows (with awk)?
file2 to file3.
I assumed that you have got machine with large RAM, because I think that you have got million lines into file2.
you could save this code into column2row.awk file:
#!/usr/bin/awk -f
BEGIN {
snp=2
val=12
}
{
if ( vector[$snp] )
vector[$snp] = vector[$snp]","$val
else
vector[$snp] = $val
}
END {
for (snp in vector)
print snp","vector[snp]
}
where snp is column 2 and val is column 12 (pvalue).
now you could run script:
/usr/bin/awk -f column2row.awk file2 > file3
If you have got small RAM, then you could divide load:
cat file1 | while read l; do s=$(echo $l|awk '{print $1}'); grep -w $s file2 > $s.snp; /usr/bin/awk -f column2row.awk $s.snp >> file3; done
It recovers from $l (line) first parameter ($s, snp name), search $s into file2 and create small file about each snp name.
and then it uses awk script to generate file3.
file2 to file4.
you could modify value about val into awk script from column 12 to 7.