Using sed to copy data between two numerical patterns to a new file

Using sed to copy data between two numerical patterns to a new file - sed

I'm running a bunch (~320) computational chemistry experiments and I need to pull a small amount of the data out of each of the files so that I can do some work on it in MatLab.
I'm pretty sure I can use sed to make this work, but try as I might I don't seem to be able to do so.
I need all of the data starting at the line beginning with "1 1" and ending with the line starting with "33 33".
I J FI(I,J) k(I,J) K(I,J)
1 1 -337.13279 -0.06697 -0.00430
2 2 3804.89120 8.52972 0.54787
3 3 3195.69653 6.01702 0.38648
4 4 3189.18684 5.99253 0.38490
5 5 3183.73262 5.97205 0.38359
6 6 3174.47525 5.93737 0.38136
7 7 3167.88746 5.91275 0.37978
8 8 1628.80868 1.56311 0.10040
9 9 1623.56055 1.55306 0.09975
10 10 1518.21620 1.35806 0.08723
11 11 1476.93012 1.28520 0.08255
12 12 1341.24087 1.05990 0.06808
13 13 1312.30373 1.01466 0.06517
14 14 1264.73004 0.94242 0.06053
15 15 1185.62592 0.82822 0.05320
16 16 1175.54013 0.81419 0.05230
17 17 1170.41211 0.80710 0.05184
18 18 1090.20196 0.70027 0.04498
19 19 1039.29190 0.63639 0.04088
20 20 1015.00116 0.60699 0.03899
21 21 1005.05773 0.59516 0.03823
22 22 986.55965 0.57345 0.03683
23 23 917.65537 0.49615 0.03187
24 24 842.93089 0.41863 0.02689
25 25 819.00146 0.39520 0.02538
26 26 758.39720 0.33888 0.02177
27 27 697.11173 0.28632 0.01839
28 28 628.75684 0.23292 0.01496
29 29 534.75856 0.16849 0.01082
30 30 499.35579 0.14692 0.00944
31 31 422.01320 0.10493 0.00674
32 32 409.30255 0.09870 0.00634
33 33 227.12411 0.03039 0.00195
33 2nd derivatives larger than 0.371D-04 over 561
MatLab is not a fan of text, so I'd like to not use text delimiters (though there are some in the header of this data section) and keep the data contained to only the numeric lines.
The data files contain a lot of other numbers as well, so I need to match the occurrence of "1 1" at the start of the line and "33 33" as the end of the copy. These 'indices' exist only in this block of info.
I attempted to use
% sed -n /"1 1"/,/"33 33"/p input.file > output.file
But I get a WHOLE BUNCH of data in the output file as it copies everything that shows up between any "1" and "33"
Is there any way to do what I'm looking for?
Also, I'm using the tcsh as that is what my servers run.

How about using awk
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{t=0}' file
Recommand by #mklement0, if there is only one block, to avoid processing the remainder of the file you can update the command to:
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{exit}' file

Your problem is twofold. First, there are two blanks between the ones, but your regex only allows for one (judging from the now indented code). Second, you are probably not precise enough; the /1 1/ pattern matches 11 11, for example, and 111 111 and so on.
So, you should consider:
sed -n -e '/^ *1 *1 /,/^33 *33 /p' -e '/^33 33 /q' input.file > output.file
The patterns are anchored to the start of line by the ^ (caret). The numbers are separated by one or more blanks (there are other, longer-winded ways of writing that in standard sed; the + option is not standard sed but is widely available). And the numbers are terminated by a blank. The chances are that the first expression alone will give you what you want. The second expression terminates the search early when it recognizes the 33 33 input line, which can save a significant amount of file I/O and hence processing time if the input file is big enough.
If the lines with ID numbers in the hundreds have some different format, then it should be fairly straight-forward to tweak the regexes to match what is used. If the data contains tabs instead of (or as well as) blanks, you can tweak the regexes to manage that, too.

If you data is all formatted exactly the same as this file, then you can use sed to just read the 3rd through the 35th line (rows 1 1 - 33 33). This is a lot easier than parsing the values, but does require that the files have a standard format:
sed -n 3,35p data.txt
Another cheap way would be to grep for only numeric lines, and take only the first 33:
grep "^[0-9 ][0-9 .-]*$" data.txt | head -n 33

Related

Can't get the written file content in q?

I've copy the exact example in q for mortals as follows:
q)h:hopen `:D:/q4m/raw
q)h[42]
548i
q)h 10 20 30
548i
q)hclose h
q)get `:D:/q4m/raw
'D:/q4m/raw
[0] get `:D:/q4m/raw
Look into the directory, the file was created there. Why can't I get it?
Instead, if I do:
q)h:hopen `:D:/q4m/L
q)h[42]
628i
q)h[10 20 30]
628i
q)hclose h
q)get `:D:/q4m/L
0 1 2 3 4 42 10 20 30
Things get normal, why?

After testing the given code I believe your issue may be in how you intialise the file.
I assume in the code that works that you use some variation of
`:D:/q4m/L set til 5
before.
However this is not done for
`:D:/q4m/raw
If you were to use
`:D:/q4m/raw set til 5
or alternatively
.[`:D:/q4m/raw;();:;()]
beforehand then the first set of code will work.
Additionally, if we look at the binary using
read1 `:D:/q4m/raw
and
read1 `:D:/q4m/L
and the output does not include 07 near the beginning then it is not being recognised as a proper kdb list. That is, hopen simply appends to the binary file instead of amending it. (If you notice the 05 byte that indicates length of the list, this doesn't increase when you add via the handle).
eg.
The first method you get
q)read1 `:D:/q4m/raw
0x2a000000000000000a0000000000000014000000000000001e00000000000000
which dosen't really mean anything in q.
The second method gives
q)read1 `:D:/q4m/L
0xfe2007000000000005000000000000000000000000000000010000000000000002000000000..
which is a proper kdb list (notice the 07 which indicates type).
If you wish to instead just read in /q4m/raw then I suggest setting an empty list, hopen to that list and pass it `:D:/q4m/raw as follows
q)`:empty set 0#0
`:empty
q)h:hopen `:empty
q)h read1 `:D:/q4m/raw
3i
q)get `:empty
42 10 20 30
This will only work if all entries are the same type.

finding matching pattern, performing calculations and shifting the columns of file

I need help to format below file with some calculation for a particular row having some pattern
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 10 Check sum(00244-00240) 420 434
dert 03 14 16
ghh 33 Check sum(12366-12361) 8008 8046
I need to have four column file by performing subtraction for the row having text "check sum".
I wish to remove the text "Check sum" and then subtract the numbers given in ( ). For e.g. (00244-00240) will be subtracted and will be having value '4' and this '4' will be added to left column which has the value '10'.
so now this value will become '14'. After this calculation the other value on that row will shift left column wise. Thus making four columns table instead of six columns
The desired output is
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
I am new to shell scrip and appreciate your help to get above desired output using awk or sed or both. I am also ok if this can be achieved without using this awk and sed and by using other command in unix shell script

Perl can do the parsing and the addition:
perl -pe 's/(\d+)\s+Check sum\((\d+)([+-]\d+)\)/$1+$2+$3/e' file
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
If you want the output to look prettier, pipe the result through column -t

You can try this one liner:
sed "s/Check//;s/sum//;s/(//;s/)//" filename|awk '/-/{sub(/-/," ");for (i=1;i<=NF;i++);{calc=$3-$4;$2=$2+calc;$3="";$4=""}}1'
Note:
This answer does not retain the original white-spaces for the rows having calculation performed

Perl print $(^b)

When I found Perl's $^O, I was curious whether there are more variables like this, because ^ reminded me of a regular expression. When I enter
print "$(^b)";
it comes up with some numbers:
1000 81 90 91 92 93 100 150 1000
What to these mean? Is this some kind of 0xdeadbeef?

I think you are just printing out the value of $(.
The real gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getgid() , and the subsequent ones by getgroups() , one of which may be the same as the first number.
However, a value assigned to $( must be a single number used to set the real gid. So the value given by $( should not be assigned back to $( without being forced numeric, such as by adding zero. Note that this is different to the effective gid ($) ) which does take a list.
You can change both the real gid and the effective gid at the same time by using POSIX::setgid() . Changes to $( require a check to $! to detect any possible errors after an attempted change.
Here is the comparison:
diff <(perl -le 'print "$(";') <(perl -le 'print "$(^b)";')
1c1
< 20 20 402 12 33 61 79 80 81 98 100 204 401
---
> 20 20 402 12 33 61 79 80 81 98 100 204 401^b)

See the documentation on perldoc perlvar for a list of all the various built-in variables (along with their use English; equivalent names).

What do the numbers on a Git Diff header mean? [duplicate]

This question already has answers here:
What does the “##…##” meta line with at signs in svn diff or git diff mean?
(3 answers)
Closed 7 years ago.
Every time I run git diff, for each single changes I made, I get some sort of header with numbers, for example:
## -169,14 +167,12 ## function Browser(window, document, body, XHR, $log) {.....
I wonder what does the four numbers mean? I guess -169 means that this particular line of code that follows was originally in line 169 but now is in 167? And what do 14 and 12 mean?

This header is called set of change, or hunk. Each hunk starts with a line that contains, enclosed in ##, the line or line range from,no-of-lines in the file before (with a -) and after (with a +) the changes. After that come the lines from the file. Lines starting with a - are deleted, lines starting with a + are added. Each line modified by the patch is surrounded with 3 lines of context before and after.
An addition looks like this:
## -75,6 +103,8 ##
foo
bar
baz
+line1
+line2
more context
and more
and still context
That means, in the original file before line 78 (= 75 + 3 lines of context) add two lines. These will be lines 106 (= 103 + 3 lines of context) through 107 after all changes.
Note the difference in from numbers (-75 vs +103), this means that there were other changes in this file before this particular hunk, that added 28 (103 - 75) lines of code.
A deletion looks like this:
## -75,7 +75,6 ##
foo
bar
baz
-line1
more context
and more
and still context
That means, delete line 78 (= 75 + 3 lines of context) in the original file. The unchanged context will be on lines 75 to 80 after all changes.
Note that from numbers in this hunk are equal (-75 and +75), this means that either there were no changes before this hunk, or amount of added and deleted lines in previous changes are the same.
Finally, a change looks like this:
## -70,7 +70,7 ##
foo
bar
baz
-red
+blue
more context
and more
still context
That means, change line 73 (= 70 + 3 lines of context) in the file before all changes, which contains red to blue. The changed line is also line 73 (= 70 + 3 lines of context) in the file after all changes.

I wonder what does the four numbers mean?
Let's analyze a simple example
The format is basically the same the diff -u unified diff.
We start with numbers from 1 to 16 and remove 2, 3, 14 and 15:
diff -u <(seq 16) <(seq 16 | grep -Ev '^(2|3|14|15)$')
Output:
## -1,6 +1,4 ##
1
-2
-3
4
5
6
## -11,6 +9,4 ##
11
12
13
-14
-15
16
## -1,6 +1,4 ## means:
-1,6 means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.
1
2
3
4
5
6
- means "old", as we usually invoke it as diff -u old new.
+1,4 means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.
+ means "new".
We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
1
4
5
6
## -11,6 +9,4 ## for the second hunk is analogous:
on the old file, we have 6 lines, starting at line 11 of the old file:
11
12
13
14
15
16
on the new file, we have 4 lines, starting at line 9 of the new file:
11
12
13
16
Note that line 11 is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.

Summary:
Assume git diff will output [0-3] lines of context [before/after] [first/last] changes
## -[original file's number of first line displayed],[context lines + removed lines] +[changed file's number of first line displayed],[context lines + added lines] ##

understanding P4 describe / diff summary (-ds) option

Greetings P4 folks,
I am trying to understand the P4 describe -ds output. I am assuming that this is the same as the p4 diff -ds output.
Here is an example of the "Differences ..." block:
==== //depot/Groups/mygroup/trunk/main/FooBar.java#5 (text) ====
add 7 chunks 13 lines
deleted 1 chunks 1 lines
changed 16 chunks 92 / 118 lines
~
Now I understand that add and deleted lines are clear but why are there two numbers for the changed lines (92 / 118).
Thanks!
- JsD

They are the number of lines before and after the change, within the changed chunks. In other words, 92 lines were changed across 16 locations to become 118 lines of code.