Easy to remember fingerprints for data? - hash

I need to create fingerprints for RSA keys that users can memorize or at least easily recognize. The following ideas have come to mind:
Break the SHA1 hash into portions of, say 4 bits and use them as coordinates for Bezier splines. Draw the splines and use that picture as a fingerprint.
Use the SHA1 hash as input for some fractal algorithm. The result would need to be unique for a given input, i.e. the output can't be a solid square half the time.
Map the SHA1 hash to entries in a word list (as used in spell checkers or password lists). This would create a passphrase consisting of real words.
Instead of a word list, use some other large data set like Google maps (map the SHA1 hash to map coordinates and use the map region(s) as a fingerprint)
Any other ideas? I'm sure this has been implemented in one form or another.

OpenSSH contains something like that, under the name "visual host key". Try this:
ssh -o VisualHostKey=yes somesshhost
where somesshhost is some machine with a SSH server running. It will print out a "fingerprint" of the server key, both in hexadecimal, and as an ASCII-art image which may look like this:
+--[ RSA 2048]----+
| .+ |
| + o |
| o o + |
| + o + |
| . o E S |
| + * . |
| X o . |
| . * o |
| .o . |
+-----------------+
Or like this:
+--[ RSA 1024]----+
| .*BB+ |
| . .++o |
| = oo. |
| . =o+.. |
| So+.. |
| ..E. |
| |
| |
| |
+-----------------+
Apparently, this is inspired from techniques described in this article. OpenSSH is opensource, with a BSD-like license, so chances are that you could simply reuse their code (it seems to be in the key.c file, function key_fingerprint_randomart()).

For item 3 (entries in a word list), see RFC-1751 - A Convention for Human-Readable 128-bit Keys, which notes that
The authors of S/Key devised a system to make the 64-bit one-time
password easy for people to enter.
Their idea was to transform the password into a string of small
English words. English words are significantly easier for people to
both remember and type. The authors of S/Key started with a
dictionary of 2048 English words, ranging in length from one to four
characters. The space covered by a 64-bit key (2^64) could be covered
by six words from this dictionary (2^66) with room remaining for
parity. For example, an S/Key one-time password of hex value:
EB33 F77E E73D 4053
would become the following six English words:
TIDE ITCH SLOW REIN RULE MOT
You could also use a compound fingerprint to improve memorability, like english words followed (or preceeded) by one or more key-dependent images.
For generating the image, you could use things like Identicon, Wavatar, MonsterID, or RoboHash.
Example:
TIDE ITCH SLOW
REIN RULE MOT

I found something called random art which generates an image from a hash. There is a Python implementation available for download: http://www.random-art.org/about/
There is also a paper about using random art for authentication: http://sparrow.ece.cmu.edu/~adrian/projects/validation/validation.pdf
It's from 1999; I don't know if further research has been done on this.

Your first suggestion (draw the path of splines for every four bytes, then fill using the nonzero fill rule) is exactly what I use for visualization in hashblot.

Related

How to write in two columns like a table in Linux man pages?

I'm creating a custom man page for my C library, and I'd like to do a thing like this
LIST OF FUNCTIONS |<--- terminal window side
|
Function Description |
function1 function1's description |
function2 function2's description |
which is longer than the |<--- here if the text
first one | overlaps out of the window,
function3 function3's description | it auto-aligns to Description
... ... |
How could I do that?
I think that it's a combination of https://tldp.org/HOWTO/Man-Page/q3.html and then use GROFF - https://www.linuxjournal.com/article/1158
.SH DESCRIPTION
.B foo
frobnicates the bar library by tweaking internal
symbol tables. By default it parses all baz segments
and rearranges them in reverse order by time for the
.BR xyzzy (1)
linker to find them. The symdef entry is then compressed
using the WBG (Whiz-Bang-Gizmo) algorithm.
All files are processed in the order specified.
There is a command on the linuxjournal site with the following:
$ groff -Tascii -man coffee.man | more
The groff man page starts with the following:
The man macro package for groff is used to produce manual pages
(“man pages”) like the one you are reading.

Postgres Full-Text Search with Hyphen and Numerals

I have observed what seems to me an odd behavior of Postgres' to_tsvector function.
SELECT to_tsvector('english', 'abc-xyz');
returns
'abc':2 'abc-xyz':1 'xyz':3
However,
SELECT to_tsvector('english', 'abc-001');
returns
'-001':2 'abc':1
Why not something like this?
'abc':2 'abc-001':1 '001':3
And what should I do to be able to search by the numeric portion alone, without the hyphen?
Seems the text search parser identifies the hyphen followed by digits to be the sign of a signed integer. Debug with ts_debug():
SELECT * FROM ts_debug('english', 'abc-001');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+--------------+------------+---------
asciiword | Word, all ASCII | abc | {simple} | simple | {abc}
int | Signed integer | -001 | {simple} | simple | {-001}
Other text search configurations (like 'simple' instead of 'english') won't help as the parser itself is "at fault" here (debatable).
A simple way around it (other than modifying the parser, which I never tried) would to pre-process strings and replace hyphens with m-dash (—) or just blanks to make sure those are identified as "Space symbols". (Actual signed integers lose their negative sign in the process.)
SELECT to_tsvector('english', translate('abc-001', '-', '—'))
## to_tsquery ('english', '001'); -- true now
db<>fiddle here
This can be circumvented with PG13's dict-int addon's absval option. See the official documentation.
But in case you're stuck with an earlier PG version, here's the generalized version of a "number or negative number" workaround in a query.
select regexp_replace($$'test' & '1':* & '2'$$::tsquery::text,
'''([.\d]+''(:\*)?)', '(''\1 | ''-\1)', 'g')::tsquery;
This results in:
'test' & ( '1':* | '-1':* ) & ( '2' | '-2' )
It replaces lexemes that look like positive numbers with "number or negative number" kind of subqueries.
The double cast ::tsquery::text is just there to show how you would pass a tsquery casted to text.
Note that it handles prefix matching numeric lexemes as well.

Org (version 7.9) converts periods and hyphens in tables to 0

I am using the Org mode that comes with Emacs 24.3, and I am having an issue that when Org creates a table from the result of a code block it is replacing characters like '-' and '.' with 0 (integer zero). Then when I pass the table to another code block that's expecting a column of strings I get type errors etc.
I haven't been able to find anything useful, as it seems to be practically un-Googleable. Has anyone had the same problem? If I update to the latest version of org-mode, will that fix it?
EDIT:
I updated to Org 8.2 and this problem seems to have gone away. Now I have another (related) problem, where returning a table with a cell containing a string consisting of one double quote character ('"' in python) messes something up; Org added 2 extra columns to the table, one had something like
(quote (quote ) ())
in it. The reason my tables have things like this in them is that I'm working with part-of-speech tags from natural language data.
It's pretty obvious Org is doing some stuff to try to interpret the table contents, and not dealing well with meta characters. Technically I think these are bugs where Org should be dealing better with unexpected input.
EDIT 2:
Here is a minimal reproduction with Org 7.9.3f (system Python is 3.4):
#+TBLNAME: table
| DT | The |
| . | . |
| - | - |
#+BEGIN_SRC python :var table=table
return table
#+END_SRC
#+RESULTS:
| DT | The |
| 0 | 0 |
| 0 | 0 |
Incidentally, Org does not like the '"' character at all, in tables or in code blocks (I just get a "End of file during parsing" message when the above table has a cell with just '"' in it). It's probably just better to avoid it altogether, so I think my problem is solved. If nobody wants to add anything, I'll answer this myself in a day or so.

Search text between symbol

I have this text (taken from concatenated field row)
Astronomic Event 2013/1434H - Aceh ....
How do We search it by 2013 or 1434h keywords?
I have tried below code but it resulting no row.
to_tsvector result:
'2013/1434h':8,12 'aceh':1 'bin.....
Sample Case:
WITH sample_table as
(SELECT to_tsvector('Astronomic Event 2013/1434H - Aceh') sample_content)
SELECT *
FROM sample_table, to_tsquery('2013') q
WHERE sample_content ## q
How do We search it by 2013 or 1434h keywords?
It seems like you want to replace:
to_tsquery('1434h') q
with:
to_tsquery('1434h | 2013') q
http://www.postgresql.org/docs/current/static/functions-textsearch.html
Side note: the to_tsquery() syntax is extremely capricious. It doesn't allow for much if any fantasy, and many of the assumptions in Postgres are everything but end-user friendly.
More often than not, you'll be better off using plainto_tsquery(), which allows any amount of garbage to be thrown at it. Thus, consider pre-processing the string before issuing the query. For instance, you could split the string, and OR the original parts together:
where sc.text_index ## (plainto_tsquery('1434h') || plainto_tsquery('2013'))
Doing so will make your code a bit more complex, but it won't rely on your users needing to understand that (contrary to what they're accustomed to in Google) they should enter 'quick & brown & fox & jumps & lazy & dog' instead of plain 'The quick brown fox jumps over the lazy dog'.
Edit: I ended up actually trying your sample query, and it seems you're actually running into a parser issue:
# SELECT alias, description, token FROM ts_debug('Astronomic Event 2013/1434H - Aceh');
alias | description | token
-----------+-------------------+------------
asciiword | Word, all ASCII | Astronomic
blank | Space symbols |
asciiword | Word, all ASCII | Event
blank | Space symbols |
file | File or path name | 2013/1434H
blank | Space symbols |
blank | Space symbols | -
asciiword | Word, all ASCII | Aceh
(8 rows)
http://www.postgresql.org/docs/current/static/textsearch-parsers.html
It looks like you might need to write (or find) and configure an app-specific parser. Having never done this personally, the best I can do is to highlight that Postgres allows this and includes a sample:
http://www.postgresql.org/docs/current/static/test-parser.html
Alternatively, change your tsvector-related trigger so that it matches e.g. \d{4}/\d+[a-zA-Z] or whatever seems most appropriate, and adds spaces accordingly, before converting it to a tsvector. Something as simple as the following might do the trick if you never need to store file names:
SELECT alias, description, token
FROM ts_debug(replace('Astronomic Event 2013/1434H - Aceh', '/', ' / '));

PostgreSQL VCL controls

One of the products I write software for is an accounting type application. It is written in C++, uses C++ Builder and VCL controls, connects to PostgreSQL database running on Linux.
The PostgreSQL database is currently at version 8.4.x. We use UTF8 encoding. Everything works pretty good.
We are running tests of our software against PostgreSQL v9.2.3 with exact same encoding and are finding a problem in which all our text editing inputs are replacing multiple lines with \r\n characters.
So for example, you enter 3 lines of text and hit enter key after each line then save it and read it back, I get one line with the line ending characters removed. When we fetch the data from the database, we wind up with one line like so: line1\r\nline2\r\nline3\r\n where "\r\n" is displayed instead of getting 0x0A, 0x0D in the stream.
Our application is not Unicode aware. Borland's AnsiString. (In the process of migrating this app. to C++ Builder XE). Does anyone know what might be causing this or offer some things to try to fix this in the current code base while the larger conversion is underway?
I've tried the Borland DBText and DBRichText controls and they both do the same thing.
The other point I should mention is we only tested against new PostgreSQL on the server and are still using a 8.x PostgreSQL client library (psql.lib). So the client and server version aren't exactly at the same level but I don't suspect this is an issue but any insight certainly welcome.
UPDATE:
Here are some command line results from the two versions of PostgreSQL.
Version 9.2.3
testdb=# select * from notes where oid=5146352;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+-------------------------------+----------+----------+-----------+-----------------------------
3001 | 11579522 | eric | 2013-02-15 22:38:24.136517+00 | f | f | Test Note | line1\r\nline2\r\nline3\r\n
Version 8.4.8
testdb=# select * from notes where oid=16490575;
docid | docno | username | created | followup | reminder | subject | comments
-------+----------+----------+------------------------------+----------+----------+--------------+----------
3001 | 11579522 | eric | 2013-02-18 20:15:23.10943-05 | f | f | <> | line1\r
: line2\r
: line3\r
:
Not sure how to format this for SO, but in the 8.4.8 command line output, I have 3 new lines printed on the screen where as the 9.2.3 version concatenates the output.
The insert for both databases is the same client. So something changed in the way PostgreSQL handles new line characters and I'm wondering if there is a config setting to revert the old behavior or something I can do within my select statement to get the old behavior back.
8.4 has standard_conforming_strings set to off by default, and 9.2 has it on by default.
When it's off, in a literal string, '\n' means a newline as in the C language, whereas when it's on, it means a backslash character followed by the character n.
To go back to the 8.4 behavior, you may issue SET standard_conforming_strings=off inside your sessions
or
ALTER DATABASE yourdb SET standard_conforming_strings=off;
for it to persist and be the default for new connections to this database.
Long term it's recommended to adapt your code to deal with standard_conforming_strings to on since it's the way forward.
Your problem looks like something to do with postgres config variable standard_conforming_strings. Before Postgres 9.1, this was turned off by default. Thats why postgres did not treat backslashes literally but interpreted them. But According to SQL standard, backslashes should be treated literally. So, from postgres 9.1, this config variable has been turned on and you see your \r\n as literal instead of interpretations.
Although this i not the right approach, to make it work in your case, you need to edit your server's configuration file(postgresql.conf) and turn off this setting(standard_conforming_strings=on)