How does mercurial's bisect work when the range includes branching? - version-control

If the bisect range includes multiple branches, how does hg bisect's search work. Does it effectively bisect each sub-branch (I would think that would be inefficient)?
For instance, borrowing, with gratitude, a diagram from an answer to this related question, what if the bisect got to changeset 7 on the "good" right-side branch first.
# 12:8ae1fff407c8:bad6
|
o 11:27edd4ba0a78:bad5
|
o 10:312ba3d6eb29:bad4
|\
| o 9:68ae20ea0c02:good33
| |
| o 8:916e977fa594:good32
| |
| o 7:b9d00094223f:good31
| |
o | 6:a7cab1800465:bad3
| |
o | 5:a84e45045a29:bad2
| |
o | 4:d0a381a67072:bad1
| |
o | 3:54349a6276cc:good4
|/
o 2:4588e394e325:good3
|
o 1:de79725cb39a:good2
|
o 0:2641cc78ce7a:good1
Will it then look only between 7 and 12, missing the real first-bad that we care about? (thus using "dumb" numerical order) or is it smart enough to use the full topography and to know that the first bad could be below 7 on the right-side branch, or could still be anywhere on the left-side branch.
The purpose of my question is both (a) just to understand the algorithm better, and (b) to understand whether I can liberally extend my initial bisect range without thinking hard about what branch I go to. I've been in high-branching bisect situations where it kept asking me after every test to extend beyond the next merge, so that the whole procedure was essentially O(n). I'm wondering if I can just throw the first "good" marker way back past some nest of merges without thinking about it much, and whether that would save time and give correct results.

To quote from Mercurial: The Definitive Guide:
The hg bisect command is aware of the “branchy” nature of a Mercurial
project's revision history, so it has no problems dealing with
branches, merges, or multiple heads in a repository. It can prune
entire branches of history with a single probe, which is how it
operates so efficiently.
The code that does the work is in hbisect.py and actually looks at the descendent and ancestor trees from each node where the state has been determined.
It looks to me like the changeset chosen to test is chosen by weighting "how central" it is in graph of those yet to test (i.e. bisecting by ancestors vs. non-ancestors, rather than chronology):
108 x = len(a) # number of ancestors
109 y = tot - x # number of non-ancestors
110 value = min(x, y) # how good is this test?

Related

Circular Queue theory

I'm need help understanding the Circular Queue concept. I read a couple post on stackoverflow and none of the answers are answering a mental block I'm having.
For example say I have 8 cells in a Circular Queue.
Head Tail
empty|U | I | S | K | M | empty | empty
Say I insert two characters F & P, which will make the queue change to.
Tail Head
empty|U | I | S | K | M | F | P
Now lets make things interesting , what if I remove 3 entries.
Tail Head
empty| empty | empty | empty | K | M | F | P
Clearly my Head and Tail has now changed and I have 3 new available spots. But for good measures I wanted to add two more entries.
Tail Head
A| B | empty | empty | K | M | F | P
Here is my questions
Did I implement this right? LOL What happens when you fill the queue up completely as in the Tail and Head are in the same position i.e "K"? If some one can explain this concepts a little more detail and clarity I would appreciated it.
Thanks!
It looks to me like you have it right. You could make you diagram more clear by showing the integer values for head and tail
There are many explanations and examples on circular queues. I have found no better explanation than the one I posted in an answer that I offered some time ago here. It explains how head and tail show if the queue is empty, has room, or is full.
In the last row of your diagram, the queue has room for 2 more items. Adding a third would make tail = head, and would overwrite K, which you don't want to do.
When tail = head, the queue is empty. Testing for a full queue is slightly more complicated. See the link for a full explanation.
Did I implement this right?
Yes, indeed you did.
What happens when you fill the queue up completely as in the Tail and Head are in the same position i.e "K"?
K will be overwritten. This overflow condition can be checked by the condition TAIL == HEAD.
If some one can explain this concepts a little more detail and clarity I would appreciated it.
What you have to understand that in a traditional linear FIFO queue, the elements needed to be shifted continuously whenever the maximum size is reached. For example, if the queue has size of 5, then after 5 (numbers 1-5) consecutive inserts and then a delete (number 1 gets deleted), the queue becomes [null, 2, 3, 4, 5]. You can see here that although there is place for 1 more element, you cannot insert unless you shift all the elements up by one. This is why circular queue is used. It doesn't need element-shifting.
However, if your queue is constantly being overflown, the entire purpose of queue is lost. I would recommend using linked list (linear or circular) as it dynamically adds and deletes elements.
Remember that queue is used practically when there is a limitation on memory. E.g. Input/Output stream is a queue. When memory is plenty and data overriting is not prefered, linked lists are used.

Org (version 7.9) converts periods and hyphens in tables to 0

I am using the Org mode that comes with Emacs 24.3, and I am having an issue that when Org creates a table from the result of a code block it is replacing characters like '-' and '.' with 0 (integer zero). Then when I pass the table to another code block that's expecting a column of strings I get type errors etc.
I haven't been able to find anything useful, as it seems to be practically un-Googleable. Has anyone had the same problem? If I update to the latest version of org-mode, will that fix it?
EDIT:
I updated to Org 8.2 and this problem seems to have gone away. Now I have another (related) problem, where returning a table with a cell containing a string consisting of one double quote character ('"' in python) messes something up; Org added 2 extra columns to the table, one had something like
(quote (quote ) ())
in it. The reason my tables have things like this in them is that I'm working with part-of-speech tags from natural language data.
It's pretty obvious Org is doing some stuff to try to interpret the table contents, and not dealing well with meta characters. Technically I think these are bugs where Org should be dealing better with unexpected input.
EDIT 2:
Here is a minimal reproduction with Org 7.9.3f (system Python is 3.4):
#+TBLNAME: table
| DT | The |
| . | . |
| - | - |
#+BEGIN_SRC python :var table=table
return table
#+END_SRC
#+RESULTS:
| DT | The |
| 0 | 0 |
| 0 | 0 |
Incidentally, Org does not like the '"' character at all, in tables or in code blocks (I just get a "End of file during parsing" message when the above table has a cell with just '"' in it). It's probably just better to avoid it altogether, so I think my problem is solved. If nobody wants to add anything, I'll answer this myself in a day or so.

PostgreSQL VCL controls

One of the products I write software for is an accounting type application. It is written in C++, uses C++ Builder and VCL controls, connects to PostgreSQL database running on Linux.
The PostgreSQL database is currently at version 8.4.x. We use UTF8 encoding. Everything works pretty good.
We are running tests of our software against PostgreSQL v9.2.3 with exact same encoding and are finding a problem in which all our text editing inputs are replacing multiple lines with \r\n characters.
So for example, you enter 3 lines of text and hit enter key after each line then save it and read it back, I get one line with the line ending characters removed. When we fetch the data from the database, we wind up with one line like so: line1\r\nline2\r\nline3\r\n where "\r\n" is displayed instead of getting 0x0A, 0x0D in the stream.
Our application is not Unicode aware. Borland's AnsiString. (In the process of migrating this app. to C++ Builder XE). Does anyone know what might be causing this or offer some things to try to fix this in the current code base while the larger conversion is underway?
I've tried the Borland DBText and DBRichText controls and they both do the same thing.
The other point I should mention is we only tested against new PostgreSQL on the server and are still using a 8.x PostgreSQL client library (psql.lib). So the client and server version aren't exactly at the same level but I don't suspect this is an issue but any insight certainly welcome.
UPDATE:
Here are some command line results from the two versions of PostgreSQL.
Version 9.2.3
testdb=# select * from notes where oid=5146352;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+-------------------------------+----------+----------+-----------+-----------------------------
3001 | 11579522 | eric | 2013-02-15 22:38:24.136517+00 | f | f | Test Note | line1\r\nline2\r\nline3\r\n
Version 8.4.8
testdb=# select * from notes where oid=16490575;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+------------------------------+----------+----------+--------------+----------
3001 | 11579522 | eric | 2013-02-18 20:15:23.10943-05 | f | f | <> | line1\r
: line2\r
: line3\r
:
Not sure how to format this for SO, but in the 8.4.8 command line output, I have 3 new lines printed on the screen where as the 9.2.3 version concatenates the output.
The insert for both databases is the same client. So something changed in the way PostgreSQL handles new line characters and I'm wondering if there is a config setting to revert the old behavior or something I can do within my select statement to get the old behavior back.
8.4 has standard_conforming_strings set to off by default, and 9.2 has it on by default.
When it's off, in a literal string, '\n' means a newline as in the C language, whereas when it's on, it means a backslash character followed by the character n.
To go back to the 8.4 behavior, you may issue SET standard_conforming_strings=off inside your sessions
or
ALTER DATABASE yourdb SET standard_conforming_strings=off;
for it to persist and be the default for new connections to this database.
Long term it's recommended to adapt your code to deal with standard_conforming_strings to on since it's the way forward.
Your problem looks like something to do with postgres config variable standard_conforming_strings. Before Postgres 9.1, this was turned off by default. Thats why postgres did not treat backslashes literally but interpreted them. But According to SQL standard, backslashes should be treated literally. So, from postgres 9.1, this config variable has been turned on and you see your \r\n as literal instead of interpretations.
Although this i not the right approach, to make it work in your case, you need to edit your server's configuration file(postgresql.conf) and turn off this setting(standard_conforming_strings=on)

Stata mmerge update replace gives wrong output

I wanted to test what happened if I replace a variable with a different data type:
clear
input id x0
1 1
2 13
3 .
end
list
save tabA, replace
clear
input id str5 x0
1 "1"
2 "23"
3 "33"
end
list
save tabB, replace
use tabA, clear
mmerge id using tabB, type(1:1) update replace
list
The result is:
+--------------------------------------------------+
| id x0 _merge |
|--------------------------------------------------|
1. | 1 1 in both, master agrees with using data |
2. | 2 13 in both, master agrees with using data |
3. | 3 . in both, master agrees with using data |
+--------------------------------------------------+
This seems very strange to me. I expected breakdown or disagreement. Is this a bug or am I missing something?
mmerge is user-written (Jeroen Weesie, SSC, 2002).
If you use the official merge in an up-to-date Stata, you will get what you expect.
. merge 1:1 id using tabB, update replace
x0 is str5 in using data
r(106);
I have not looked inside mmerge. My own guess is that what you see is a feature from the author's point of view, namely that it's not a problem if one variable is numeric and one variable is string so long as their contents agree. But why are you not using merge directly? There was a brief period several years ago when mmerge had some advantages over merge, but that's long past. BTW, I agree in wanting my merges to be very conservative and not indulgent on variable types.

Easy to remember fingerprints for data?

I need to create fingerprints for RSA keys that users can memorize or at least easily recognize. The following ideas have come to mind:
Break the SHA1 hash into portions of, say 4 bits and use them as coordinates for Bezier splines. Draw the splines and use that picture as a fingerprint.
Use the SHA1 hash as input for some fractal algorithm. The result would need to be unique for a given input, i.e. the output can't be a solid square half the time.
Map the SHA1 hash to entries in a word list (as used in spell checkers or password lists). This would create a passphrase consisting of real words.
Instead of a word list, use some other large data set like Google maps (map the SHA1 hash to map coordinates and use the map region(s) as a fingerprint)
Any other ideas? I'm sure this has been implemented in one form or another.
OpenSSH contains something like that, under the name "visual host key". Try this:
ssh -o VisualHostKey=yes somesshhost
where somesshhost is some machine with a SSH server running. It will print out a "fingerprint" of the server key, both in hexadecimal, and as an ASCII-art image which may look like this:
+--[ RSA 2048]----+
| .+ |
| + o |
| o o + |
| + o + |
| . o E S |
| + * . |
| X o . |
| . * o |
| .o . |
+-----------------+
Or like this:
+--[ RSA 1024]----+
| .*BB+ |
| . .++o |
| = oo. |
| . =o+.. |
| So+.. |
| ..E. |
| |
| |
| |
+-----------------+
Apparently, this is inspired from techniques described in this article. OpenSSH is opensource, with a BSD-like license, so chances are that you could simply reuse their code (it seems to be in the key.c file, function key_fingerprint_randomart()).
For item 3 (entries in a word list), see RFC-1751 - A Convention for Human-Readable 128-bit Keys, which notes that
The authors of S/Key devised a system to make the 64-bit one-time
password easy for people to enter.
Their idea was to transform the password into a string of small
English words. English words are significantly easier for people to
both remember and type. The authors of S/Key started with a
dictionary of 2048 English words, ranging in length from one to four
characters. The space covered by a 64-bit key (2^64) could be covered
by six words from this dictionary (2^66) with room remaining for
parity. For example, an S/Key one-time password of hex value:
EB33 F77E E73D 4053
would become the following six English words:
TIDE ITCH SLOW REIN RULE MOT
You could also use a compound fingerprint to improve memorability, like english words followed (or preceeded) by one or more key-dependent images.
For generating the image, you could use things like Identicon, Wavatar, MonsterID, or RoboHash.
Example:
TIDE ITCH SLOW
REIN RULE MOT
I found something called random art which generates an image from a hash. There is a Python implementation available for download: http://www.random-art.org/about/
There is also a paper about using random art for authentication: http://sparrow.ece.cmu.edu/~adrian/projects/validation/validation.pdf
It's from 1999; I don't know if further research has been done on this.
Your first suggestion (draw the path of splines for every four bytes, then fill using the nonzero fill rule) is exactly what I use for visualization in hashblot.