understanding P4 describe / diff summary (-ds) option - diff

Greetings P4 folks,
I am trying to understand the P4 describe -ds output. I am assuming that this is the same as the p4 diff -ds output.
Here is an example of the "Differences ..." block:
==== //depot/Groups/mygroup/trunk/main/FooBar.java#5 (text) ====
add 7 chunks 13 lines
deleted 1 chunks 1 lines
changed 16 chunks 92 / 118 lines
~
Now I understand that add and deleted lines are clear but why are there two numbers for the changed lines (92 / 118).
Thanks!
- JsD

They are the number of lines before and after the change, within the changed chunks. In other words, 92 lines were changed across 16 locations to become 118 lines of code.

Related

Why is no output files written in prinseqlite perl loop?

I am completely new to this type of coding/command lines, so I am sorry if I am asking this question in a wrong way.
I want to loop over all files in a directory (I am quality trimming DNA sequencing files (.fastq format))
I have written this loop:
for i in *.fastq; do
perl /apps/prinseqlite/0.20.4/prinseq-lite.pl -fastq $i -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq -out_bad null; done
The code itself seems to work, I can see in my terminal that it is taking the right files and it is doing the trimming (it is writing a summary log in the terminal as it goes), but no output files are generated - i.e these ones:
-out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq
If I run the code in a non-loop way, just on one file it works (= the output is generated). link this example:
prinseq-lite.pl -fastq 60782_merged_rRNA.fastq -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good 60782_merged_rRNA_filt_codeTEST.fastq -out_bad null
Is there a simple reason/answer to this?
This problem has nothing to do with Perl at all.
/proj/forhot/qfiltered/looptest/$i_filtered.fastq is read by the shell as interpolating the contents of i_filtered. There is no such shell variable, so this argument turns into /proj/forhot/qfiltered/looptest/.fastq ($i_filtered turns into nothing).
Therefore all of your prinseq-lite.pl executions place their output in the same file, which (because its name starts with a .) is "hidden": You need to use ls -a to see it, not just ls.
Fix
... -out_good /proj/forhot/qfiltered/looptest/${i}_filtered.fastq
Note that this would give you e.g. 60782_merged_rRNA.fastq_filtered.fastq for an input file of 60782_merged_rRNA.fastq. If you want to get rid of the duplicate .fastq part, you need something like:
... -out_good /proj/forhot/qfiltered/looptest/"${i%.fastq}"_filtered.fastq

Diff command - avoiding monolithic grouping of consecutive differing lines

Playing around with the standard linux diff command, I could not find a way to avoid the following type of grouping in its output (the output listings here assume the unified format)
This question aims at the case that each line differs by little from its counterpart in the other file, and it's more useful to see each line next to its counterpart.
I would like instead of having groups like this show up in the comparison output:
- line 1
- line 2
- line 3
+ line 1 modified
+ line 2 modified
+ line 3 modified
To get this:
- line 1
+ line 1 modified
- line 2
+ line 2 modified
- line 3
+ line 3 modified
Of course, this is a convenience question as this can be accomplished by writing your own code to post-process the diff output, or diverging from the lcs algorithm with your own algorithm. I don't think variants like wdiff etc. would help much, as the plain diff -U0 output format fits my needs very well except for this grouping property, whereas wdiff introduces other aspects that are not optimal for my case.
I'm looking for a command-line way, or a library that can be used in code, not a UI tool.
I was trying to solve this myself. The closest I go was this:
diff -y -W 10000 file1 file2 | grep '|' | sed 's/\s*|\s*/\n/g'
The one issue is that this assumes there are no "white space" difference at the beginning of the lines (or that you don't care about it).

Using sed to copy data between two numerical patterns to a new file

I'm running a bunch (~320) computational chemistry experiments and I need to pull a small amount of the data out of each of the files so that I can do some work on it in MatLab.
I'm pretty sure I can use sed to make this work, but try as I might I don't seem to be able to do so.
I need all of the data starting at the line beginning with "1 1" and ending with the line starting with "33 33".
I J FI(I,J) k(I,J) K(I,J)
1 1 -337.13279 -0.06697 -0.00430
2 2 3804.89120 8.52972 0.54787
3 3 3195.69653 6.01702 0.38648
4 4 3189.18684 5.99253 0.38490
5 5 3183.73262 5.97205 0.38359
6 6 3174.47525 5.93737 0.38136
7 7 3167.88746 5.91275 0.37978
8 8 1628.80868 1.56311 0.10040
9 9 1623.56055 1.55306 0.09975
10 10 1518.21620 1.35806 0.08723
11 11 1476.93012 1.28520 0.08255
12 12 1341.24087 1.05990 0.06808
13 13 1312.30373 1.01466 0.06517
14 14 1264.73004 0.94242 0.06053
15 15 1185.62592 0.82822 0.05320
16 16 1175.54013 0.81419 0.05230
17 17 1170.41211 0.80710 0.05184
18 18 1090.20196 0.70027 0.04498
19 19 1039.29190 0.63639 0.04088
20 20 1015.00116 0.60699 0.03899
21 21 1005.05773 0.59516 0.03823
22 22 986.55965 0.57345 0.03683
23 23 917.65537 0.49615 0.03187
24 24 842.93089 0.41863 0.02689
25 25 819.00146 0.39520 0.02538
26 26 758.39720 0.33888 0.02177
27 27 697.11173 0.28632 0.01839
28 28 628.75684 0.23292 0.01496
29 29 534.75856 0.16849 0.01082
30 30 499.35579 0.14692 0.00944
31 31 422.01320 0.10493 0.00674
32 32 409.30255 0.09870 0.00634
33 33 227.12411 0.03039 0.00195
33 2nd derivatives larger than 0.371D-04 over 561
MatLab is not a fan of text, so I'd like to not use text delimiters (though there are some in the header of this data section) and keep the data contained to only the numeric lines.
The data files contain a lot of other numbers as well, so I need to match the occurrence of "1 1" at the start of the line and "33 33" as the end of the copy. These 'indices' exist only in this block of info.
I attempted to use
% sed -n /"1 1"/,/"33 33"/p input.file > output.file
But I get a WHOLE BUNCH of data in the output file as it copies everything that shows up between any "1" and "33"
Is there any way to do what I'm looking for?
Also, I'm using the tcsh as that is what my servers run.
How about using awk
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{t=0}' file
Recommand by #mklement0, if there is only one block, to avoid processing the remainder of the file you can update the command to:
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{exit}' file
Your problem is twofold. First, there are two blanks between the ones, but your regex only allows for one (judging from the now indented code). Second, you are probably not precise enough; the /1 1/ pattern matches 11 11, for example, and 111 111 and so on.
So, you should consider:
sed -n -e '/^ *1 *1 /,/^33 *33 /p' -e '/^33 33 /q' input.file > output.file
The patterns are anchored to the start of line by the ^ (caret). The numbers are separated by one or more blanks (there are other, longer-winded ways of writing that in standard sed; the + option is not standard sed but is widely available). And the numbers are terminated by a blank. The chances are that the first expression alone will give you what you want. The second expression terminates the search early when it recognizes the 33 33 input line, which can save a significant amount of file I/O and hence processing time if the input file is big enough.
If the lines with ID numbers in the hundreds have some different format, then it should be fairly straight-forward to tweak the regexes to match what is used. If the data contains tabs instead of (or as well as) blanks, you can tweak the regexes to manage that, too.
If you data is all formatted exactly the same as this file, then you can use sed to just read the 3rd through the 35th line (rows 1 1 - 33 33). This is a lot easier than parsing the values, but does require that the files have a standard format:
sed -n 3,35p data.txt
Another cheap way would be to grep for only numeric lines, and take only the first 33:
grep "^[0-9 ][0-9 .-]*$" data.txt | head -n 33

output format of cvs diff

I modified line 494 of a certain file, and use cvs diff -u4 to see what I have modified, cvs outputs something like :
## -490,9 +490,9 ##
if (!(hPtr->hStatus & (HOST_STAT_UNAVAIL | HOST_STAT_UNLICENSED |
HOST_STAT_UNREACH))){
printf(" %s:\n",
_i18n_msg_get(ls_catd,NL_SETN,1612, "CURRENT LOAD USED FOR SCHEDULING")); /* catgets 1612 */
- prtLoad(hPtr, lsInfo);
+ prtLoad(hPtr, lsInfo,bhostParams);
if (lsbSharedResConfigured_) {
/* there are share resources */
retVal = makeShareFields(hPtr->host, lsInfo, &nameTable,
I didn't understand what the first line "## -490,9 +490,9 ##" mean, I did modify line 494, but why CVS writes 490 instead? Could anyone tell me what does "## -490,9 +490,9 ##" mean?
The "u" gives you a unified diff and the "4" give you 4 lines of context on either side. From the WP entry I just linked:
The format of the range information line is as follows:
## -l,s +l,s ##
The hunk range information contains two hunk ranges. The range for the
hunk of the original file is preceded by a minus symbol, and the range
for the new file is preceded by a plus symbol. Each hunk range is of
the format l,s where l is the starting line number and s is the number
of lines the change hunk applies to for each respective file.
So basically the number isn't the line that was changed. It's the start of the range being displayed in that hunk. Using your example, the hunk starts at line 490 and 9 lines were in the range. The reason the range covers 9 lines is because of the one line you changed and the four lines of context on either side.
Note that your example seems to have some newlines stripped. I would recommend you fix it so it is clear for other people.

What do the numbers on a Git Diff header mean? [duplicate]

This question already has answers here:
What does the “##…##” meta line with at signs in svn diff or git diff mean?
(3 answers)
Closed 7 years ago.
Every time I run git diff, for each single changes I made, I get some sort of header with numbers, for example:
## -169,14 +167,12 ## function Browser(window, document, body, XHR, $log) {.....
I wonder what does the four numbers mean? I guess -169 means that this particular line of code that follows was originally in line 169 but now is in 167? And what do 14 and 12 mean?
This header is called set of change, or hunk. Each hunk starts with a line that contains, enclosed in ##, the line or line range from,no-of-lines in the file before (with a -) and after (with a +) the changes. After that come the lines from the file. Lines starting with a - are deleted, lines starting with a + are added. Each line modified by the patch is surrounded with 3 lines of context before and after.
An addition looks like this:
## -75,6 +103,8 ##
foo
bar
baz
+line1
+line2
more context
and more
and still context
That means, in the original file before line 78 (= 75 + 3 lines of context) add two lines. These will be lines 106 (= 103 + 3 lines of context) through 107 after all changes.
Note the difference in from numbers (-75 vs +103), this means that there were other changes in this file before this particular hunk, that added 28 (103 - 75) lines of code.
A deletion looks like this:
## -75,7 +75,6 ##
foo
bar
baz
-line1
more context
and more
and still context
That means, delete line 78 (= 75 + 3 lines of context) in the original file. The unchanged context will be on lines 75 to 80 after all changes.
Note that from numbers in this hunk are equal (-75 and +75), this means that either there were no changes before this hunk, or amount of added and deleted lines in previous changes are the same.
Finally, a change looks like this:
## -70,7 +70,7 ##
foo
bar
baz
-red
+blue
more context
and more
still context
That means, change line 73 (= 70 + 3 lines of context) in the file before all changes, which contains red to blue. The changed line is also line 73 (= 70 + 3 lines of context) in the file after all changes.
I wonder what does the four numbers mean?
Let's analyze a simple example
The format is basically the same the diff -u unified diff.
We start with numbers from 1 to 16 and remove 2, 3, 14 and 15:
diff -u <(seq 16) <(seq 16 | grep -Ev '^(2|3|14|15)$')
Output:
## -1,6 +1,4 ##
1
-2
-3
4
5
6
## -11,6 +9,4 ##
11
12
13
-14
-15
16
## -1,6 +1,4 ## means:
-1,6 means that this piece of the first file starts at line 1 and shows a total of 6 lines. Therefore it shows lines 1 to 6.
1
2
3
4
5
6
- means "old", as we usually invoke it as diff -u old new.
+1,4 means that this piece of the second file starts at line 1 and shows a total of 4 lines. Therefore it shows lines 1 to 4.
+ means "new".
We only have 4 lines instead of 6 because 2 lines were removed! The new hunk is just:
1
4
5
6
## -11,6 +9,4 ## for the second hunk is analogous:
on the old file, we have 6 lines, starting at line 11 of the old file:
11
12
13
14
15
16
on the new file, we have 4 lines, starting at line 9 of the new file:
11
12
13
16
Note that line 11 is the 9th line of the new file because we have already removed 2 lines on the previous hunk: 2 and 3.
Summary:
Assume git diff will output [0-3] lines of context [before/after] [first/last] changes
## -[original file's number of first line displayed],[context lines + removed lines] +[changed file's number of first line displayed],[context lines + added lines] ##