What do the numbers on a Git Diff header mean? [duplicate] - diff

This question already has answers here:
What does the “##…##” meta line with at signs in svn diff or git diff mean?
(3 answers)
Closed 7 years ago.
Every time I run git diff, for each single changes I made, I get some sort of header with numbers, for example:
## -169,14 +167,12 ## function Browser(window, document, body, XHR, $log) {.....
I wonder what does the four numbers mean? I guess -169 means that this particular line of code that follows was originally in line 169 but now is in 167? And what do 14 and 12 mean?

This header is called set of change, or hunk. Each hunk starts with a line that contains, enclosed in ##, the line or line range from,no-of-lines in the file before (with a -) and after (with a +) the changes. After that come the lines from the file. Lines starting with a - are deleted, lines starting with a + are added. Each line modified by the patch is surrounded with 3 lines of context before and after.
An addition looks like this:
## -75,6 +103,8 ##
foo
bar
baz
+line1
+line2
more context
and more
and still context
That means, in the original file before line 78 (= 75 + 3 lines of context) add two lines. These will be lines 106 (= 103 + 3 lines of context) through 107 after all changes.
Note the difference in from numbers (-75 vs +103), this means that there were other changes in this file before this particular hunk, that added 28 (103 - 75) lines of code.
A deletion looks like this:
## -75,7 +75,6 ##
foo
bar
baz
-line1
more context
and more
and still context
That means, delete line 78 (= 75 + 3 lines of context) in the original file. The unchanged context will be on lines 75 to 80 after all changes.
Note that from numbers in this hunk are equal (-75 and +75), this means that either there were no changes before this hunk, or amount of added and deleted lines in previous changes are the same.
Finally, a change looks like this:
## -70,7 +70,7 ##
foo
bar
baz
-red
+blue
more context
and more
still context
That means, change line 73 (= 70 + 3 lines of context) in the file before all changes, which contains red to blue. The changed line is also line 73 (= 70 + 3 lines of context) in the file after all changes.

I wonder what does the four numbers mean?
Let's analyze a simple example
The format is basically the same the diff -u unified diff.
We start with numbers from 1 to 16 and remove 2, 3, 14 and 15:
diff -u <(seq 16) <(seq 16 | grep -Ev '^(2|3|14|15)$')
Output:
## -1,6 +1,4 ##
1
-2
-3
4
5
6
## -11,6 +9,4 ##
11
12
13
-14
-15
16
## -1,6 +1,4 ## means:
-1,6 means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.
1
2
3
4
5
6
- means "old", as we usually invoke it as diff -u old new.
+1,4 means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.
+ means "new".
We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
1
4
5
6
## -11,6 +9,4 ## for the second hunk is analogous:
on the old file, we have 6 lines, starting at line 11 of the old file:
11
12
13
14
15
16
on the new file, we have 4 lines, starting at line 9 of the new file:
11
12
13
16
Note that line 11 is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.

Summary:
Assume git diff will output [0-3] lines of context [before/after] [first/last] changes
## -[original file's number of first line displayed],[context lines + removed lines] +[changed file's number of first line displayed],[context lines + added lines] ##

Related

finding matching pattern, performing calculations and shifting the columns of file

I need help to format below file with some calculation for a particular row having some pattern
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 10 Check sum(00244-00240) 420 434
dert 03 14 16
ghh 33 Check sum(12366-12361) 8008 8046
I need to have four column file by performing subtraction for the row having text "check sum".
I wish to remove the text "Check sum" and then subtract the numbers given in ( ). For e.g. (00244-00240) will be subtracted and will be having value '4' and this '4' will be added to left column which has the value '10'.
so now this value will become '14'. After this calculation the other value on that row will shift left column wise. Thus making four columns table instead of six columns
The desired output is
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
I am new to shell scrip and appreciate your help to get above desired output using awk or sed or both. I am also ok if this can be achieved without using this awk and sed and by using other command in unix shell script
Perl can do the parsing and the addition:
perl -pe 's/(\d+)\s+Check sum\((\d+)([+-]\d+)\)/$1+$2+$3/e' file
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
If you want the output to look prettier, pipe the result through column -t
You can try this one liner:
sed "s/Check//;s/sum//;s/(//;s/)//" filename|awk '/-/{sub(/-/," ");for (i=1;i<=NF;i++);{calc=$3-$4;$2=$2+calc;$3="";$4=""}}1'
Note:
This answer does not retain the original white-spaces for the rows having calculation performed

Using sed to copy data between two numerical patterns to a new file

I'm running a bunch (~320) computational chemistry experiments and I need to pull a small amount of the data out of each of the files so that I can do some work on it in MatLab.
I'm pretty sure I can use sed to make this work, but try as I might I don't seem to be able to do so.
I need all of the data starting at the line beginning with "1 1" and ending with the line starting with "33 33".
I J FI(I,J) k(I,J) K(I,J)
1 1 -337.13279 -0.06697 -0.00430
2 2 3804.89120 8.52972 0.54787
3 3 3195.69653 6.01702 0.38648
4 4 3189.18684 5.99253 0.38490
5 5 3183.73262 5.97205 0.38359
6 6 3174.47525 5.93737 0.38136
7 7 3167.88746 5.91275 0.37978
8 8 1628.80868 1.56311 0.10040
9 9 1623.56055 1.55306 0.09975
10 10 1518.21620 1.35806 0.08723
11 11 1476.93012 1.28520 0.08255
12 12 1341.24087 1.05990 0.06808
13 13 1312.30373 1.01466 0.06517
14 14 1264.73004 0.94242 0.06053
15 15 1185.62592 0.82822 0.05320
16 16 1175.54013 0.81419 0.05230
17 17 1170.41211 0.80710 0.05184
18 18 1090.20196 0.70027 0.04498
19 19 1039.29190 0.63639 0.04088
20 20 1015.00116 0.60699 0.03899
21 21 1005.05773 0.59516 0.03823
22 22 986.55965 0.57345 0.03683
23 23 917.65537 0.49615 0.03187
24 24 842.93089 0.41863 0.02689
25 25 819.00146 0.39520 0.02538
26 26 758.39720 0.33888 0.02177
27 27 697.11173 0.28632 0.01839
28 28 628.75684 0.23292 0.01496
29 29 534.75856 0.16849 0.01082
30 30 499.35579 0.14692 0.00944
31 31 422.01320 0.10493 0.00674
32 32 409.30255 0.09870 0.00634
33 33 227.12411 0.03039 0.00195
33 2nd derivatives larger than 0.371D-04 over 561
MatLab is not a fan of text, so I'd like to not use text delimiters (though there are some in the header of this data section) and keep the data contained to only the numeric lines.
The data files contain a lot of other numbers as well, so I need to match the occurrence of "1 1" at the start of the line and "33 33" as the end of the copy. These 'indices' exist only in this block of info.
I attempted to use
% sed -n /"1 1"/,/"33 33"/p input.file > output.file
But I get a WHOLE BUNCH of data in the output file as it copies everything that shows up between any "1" and "33"
Is there any way to do what I'm looking for?
Also, I'm using the tcsh as that is what my servers run.
How about using awk
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{t=0}' file
Recommand by #mklement0, if there is only one block, to avoid processing the remainder of the file you can update the command to:
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{exit}' file
Your problem is twofold. First, there are two blanks between the ones, but your regex only allows for one (judging from the now indented code). Second, you are probably not precise enough; the /1 1/ pattern matches 11 11, for example, and 111 111 and so on.
So, you should consider:
sed -n -e '/^ *1 *1 /,/^33 *33 /p' -e '/^33 33 /q' input.file > output.file
The patterns are anchored to the start of line by the ^ (caret). The numbers are separated by one or more blanks (there are other, longer-winded ways of writing that in standard sed; the + option is not standard sed but is widely available). And the numbers are terminated by a blank. The chances are that the first expression alone will give you what you want. The second expression terminates the search early when it recognizes the 33 33 input line, which can save a significant amount of file I/O and hence processing time if the input file is big enough.
If the lines with ID numbers in the hundreds have some different format, then it should be fairly straight-forward to tweak the regexes to match what is used. If the data contains tabs instead of (or as well as) blanks, you can tweak the regexes to manage that, too.
If you data is all formatted exactly the same as this file, then you can use sed to just read the 3rd through the 35th line (rows 1 1 - 33 33). This is a lot easier than parsing the values, but does require that the files have a standard format:
sed -n 3,35p data.txt
Another cheap way would be to grep for only numeric lines, and take only the first 33:
grep "^[0-9 ][0-9 .-]*$" data.txt | head -n 33

output format of cvs diff

I modified line 494 of a certain file, and use cvs diff -u4 to see what I have modified, cvs outputs something like :
## -490,9 +490,9 ##
if (!(hPtr->hStatus & (HOST_STAT_UNAVAIL | HOST_STAT_UNLICENSED |
HOST_STAT_UNREACH))){
printf(" %s:\n",
_i18n_msg_get(ls_catd,NL_SETN,1612, "CURRENT LOAD USED FOR SCHEDULING")); /* catgets 1612 */
- prtLoad(hPtr, lsInfo);
+ prtLoad(hPtr, lsInfo,bhostParams);
if (lsbSharedResConfigured_) {
/* there are share resources */
retVal = makeShareFields(hPtr->host, lsInfo, &nameTable,
I didn't understand what the first line "## -490,9 +490,9 ##" mean, I did modify line 494, but why CVS writes 490 instead? Could anyone tell me what does "## -490,9 +490,9 ##" mean?
The "u" gives you a unified diff and the "4" give you 4 lines of context on either side. From the WP entry I just linked:
The format of the range information line is as follows:
## -l,s +l,s ##
The hunk range information contains two hunk ranges. The range for the
hunk of the original file is preceded by a minus symbol, and the range
for the new file is preceded by a plus symbol. Each hunk range is of
the format l,s where l is the starting line number and s is the number
of lines the change hunk applies to for each respective file.
So basically the number isn't the line that was changed. It's the start of the range being displayed in that hunk. Using your example, the hunk starts at line 490 and 9 lines were in the range. The reason the range covers 9 lines is because of the one line you changed and the four lines of context on either side.
Note that your example seems to have some newlines stripped. I would recommend you fix it so it is clear for other people.

How to extract certain columns from a big Notepad text file?

I have a big text file and the data in it are in 5 columns, but I need just the first and the last column of that.
It will take many days and probably with mistake if I want to enter the data of this two column one-by-one from here to another file.
Is there a fast way to do this?
For example:
1 1.0000000000000000 0.0000000000 S {0}
2 1.5000000000000000 0.3010299957 C {2}
3 1.7500000000000000 0.6020599913 S {0,2}
4 2.0000000000000000 0.7781512504 C {3}
5 2.3333333333333333 1.0791812460 C {3,2}
6 2.5000000000000000 1.3802112417 S {3,0,2}
7 2.5277777777777778 1.5563025008 S {0,3}
8 2.5833333333333333 1.6812412374 S {3,0,0,2}
9 2.8000000000000000 1.7781512504 C {5,2}
10 3.0000000000000000 2.0791812460 C {5,0,2}
I need the first column (numbering) and the last inside { }.
ALT + Left Mouse Click puts you in Column Mode Select. It's quite an useful shortcut that may help you.
in Notepad++, you can use regular expression to do replacement:
the regex for find and replace is:
^( +\d+).+\{([\d,]+)\}$
\1 \2
then can change the:
1 1.0000000000000000 0.0000000000 S {0}
2 1.5000000000000000 0.3010299957 C {2}
3 1.7500000000000000 0.6020599913 S {0,2}
4 2.0000000000000000 0.7781512504 C {3}
5 2.3333333333333333 1.0791812460 C {3,2}
6 2.5000000000000000 1.3802112417 S {3,0,2}
7 2.5277777777777778 1.5563025008 S {0,3}
8 2.5833333333333333 1.6812412374 S {3,0,0,2}
9 2.8000000000000000 1.7781512504 C {5,2}
10 3.0000000000000000 2.0791812460 C {5,0,2}
to:
1 0
2 2
3 0,2
4 3
5 3,2
6 3,0,2
7 0,3
8 3,0,0,2
9 5,2
10 5,0,2
if not want the leading space, then use:
^( +\d+).+\{([\d,]+)\}$
\1 \2
will change to:
1 0
2 2
3 0,2
4 3
5 3,2
6 3,0,2
7 0,3
8 3,0,0,2
9 5,2
10 5,0,2
You should use awk or gawk which is available on windows platform also. Use gawk "{print $1,$5}" inpfile > outfile. I copied your file named it 'one'. You can see the output which consists of 1st and 5th column of your file.
>gawk "{print $1, $5}" one
1 {0}
2 {2}
3 {0,2}
4 {3}
5 {3,2}
6 {3,0,2}
7 {0,3}
8 {3,0,0,2}
9 {5,2}
10 {5,0,2}
You can import it into Excel and manipulate it there.
If you are using .NET, FileHelpers may save you a lot of time. From your post we can't tell what technology you are hoping to use to accomplish this.
Ultraedit has a tool for selecting columns and opens large files (I tried a 900 Mb file on a 2008 desktop and it opened in 3 minutes). I think it has a demo version fully operational.
Excel could work if you do not have too many rows.
Cheers,
One more way is to copy the data to MS word file.
Then use
{Alt + left mouse click}
Then you can drag on the selected column and you can see only a single column is selected.
Copy and paste wherever you want.
There is only one way to convolve ungodly amounts of data. That is with the command prompt.
$cat text.txt | sed 's/{.*,//;s/ */ /g;s/[{}]//g' | awk '{print $1","$5}' > clean_text.csv
This 15 second fix is not available in Windows OS. It will take you less time to download and install Linux on that old dead computer in your closet than it will to get your data in and out of Excel.
Happy coding!

understanding P4 describe / diff summary (-ds) option

Greetings P4 folks,
I am trying to understand the P4 describe -ds output. I am assuming that this is the same as the p4 diff -ds output.
Here is an example of the "Differences ..." block:
==== //depot/Groups/mygroup/trunk/main/FooBar.java#5 (text) ====
add 7 chunks 13 lines
deleted 1 chunks 1 lines
changed 16 chunks 92 / 118 lines
~
Now I understand that add and deleted lines are clear but why are there two numbers for the changed lines (92 / 118).
Thanks!
- JsD
They are the number of lines before and after the change, within the changed chunks. In other words, 92 lines were changed across 16 locations to become 118 lines of code.