Perl getpwuid() and getpwnam() - perl

I'm learning perl's *nix system tools and I've been staring at the following two sentences for several minutes:
You can think of getpwuid() and getpwnam() operators as random access -- they grab a specific entry by key so you have to have a key to start with. Another way of accessing the password file is sequential access -- grabbing each entry in some apparently random order.
I'm 99% sure this is a typo, but if it isn't I'm clearly missing a key idea. Can anyone shed some light on the subject?
Thanks in advance.

Not a typo, but very poorly worded. getpwuid looks up a passwd entry by UID. getpwnam looks up a password entry by name. These are "random access" like system memory is "random access"; you can pick which one you want by providing a key. (For system memory, the "key" is the address. For getpwuid, the key is the UID. For getpwnam, the key is the name.)
These are in contrast to getpwent, which simply returns the "next" entry from the passwd file. The entries will be returned in some unspecified order. This is "sequential access", like reading a file from disk. Although for getpwent you do not know what order the results will appear.
The wording is confusing because they use the word "random" for both the phrase "random access" (like a memory) and "apparently random order" (by which they mean "unspecified order").
They should have said "unspecified order" or "indeterminate order" rather than "apparently random order".

Related

Checking whether a particular word is a noun or verb

I tried to cheak with Microsoft Word (using vba code)
for pump ((oil pump) whether it is a noun or verb
according to Microsoft word it is verb ( actually pump is a noun)
I need to check it for a list of words (mostly technical)
Is it possible to compare to any database?
Something else?
I think the only answer here is "It can't be done".
Even with context, you'd need human interpretation to determine the word type in some cases.
Time flies like an arrow.
It can mean that time passes very quickly. In that case,
Time (noun) flies (verb) like an arrow (prepositional phrase).
Or it can mean that a group of insects have a preference for pointy things
Time flies (compound noun) like (verb) an arrow (noun as an object).
Or it can be a suggestion to measure the speed of insects in the same way an arrow does.
Time (verb) flies (noun) like an arrow (prepositional phrase).
The Merriam Webster Learner's Dictionary has seven possible word types for "like": verb, noun, preposition, adjective, noun (yes, again, but with another meaning), adverb, conjunction. Each of these has several sub-categories for slightly different use cases. And they don't even mention the teenager use ("And I'm like REALLY, and he's like YES...")
The reason that dictionary entries (not the MS Word spelling dictionary, but references that explain the use and meaning) are so complex is that language is complex.
It is impossible to write some VBA, throw in some RegEx and determine the word class without fault.
I'd create a context sensitive "local dictionary" for your project.
If you make "oil pump" a single entry - and search for that first, you can eliminate false readings.

Is there an idempotent hash function?

Is there a hash function that is idempotent? I know MD5 and SHA256 are not:
$ echo -n "hello world" | md5sum
5eb63bbbe01eeed093cb22bb8f5acdc3 -
$ echo -n "5eb63bbbe01eeed093cb22bb8f5acdc3" | md5sum
c0b0ef2d0f76f0133b83a9b82c1c7326 -
$ echo -n "hello world" | sha256sum
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 -
$ echo -n "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9" | sha256sum
049da052634feb56ce6ec0bc648c672011edff1cb272b53113bbc90a8f00249c -
Is there a hash algorithm that can do something like this?
$ echo -n "hello world" | idempotentsum
abcdef1234567890
$ echo -n "abcdef1234567890" | idempotentsum
abcdef1234567890
If such an algorithm does exist, is it useful cryptographically? I.e. with reasonable inputs, is it computationally infeasible to guess the input with a known output?
If such an algorithm does not exist, does it not exist because nobody has bothered to find it or is it a mathematical impossibility?
Context
I'm working on a system where a user may want to save a password in a password manager. A particularly paranoid user may prefer to save the password in a hashed form rather than in plain text. I'd like the user to be able to authenticate with this hashed password. Rather than simply trying the authentication twice (once assuming the user's password is hashed and once assuming it is not), I wondered if there was an algorithm to let me only do it once.
I know there are alternative ways of allowing users to store authentication tokens rather than plain-text passwords. But this idea popped into my head, and I am curious. I couldn't find anything about this on Google or SO.
EDIT: I am not suggesting that allowing a user to authenticate with a hashed password means it is OK for the server to not salt/hash the password. The server must still salt/hash the original password or the client-side hashed password.
EDIT: I am not suggesting that allowing the user to log in with a client-side hashed password is a genuine security improvement. As far as I know the only possible benefit this would add is if the user used this password for more than one purpose. In that case, if the user's hashed password was discovered by an attacker, then only access to my service would be compromised rather than all services sharing that password. However, best practice is to not use the same password for multiple services.
Such a function is actually quite easy to find, and it doesn't weaken the cryptography of the system (except in an obvious and trivial way). We can actually transform any hashing function into an idempotent hashing function so long as we have a way of identifying if a given value is a potential output of the hashing function (in more formal language, if an element of the domain is also an element of the range).
(A potential method of doing this is just checking the size of the input element, as most hash functions attempt to uniformly output values up to a given size. This ignores the possibility of incorrectly identifying a value that can never be output from the hash function, but that would be specific to individual hashing functions.)
We then create a new method that checks whether a value can be output from the hashing function, and if so, returns the value back. Otherwise, the function proceeds as normal and hashes the value. This new function is as secure as the original function except for hashing values of its range, for which its completely insecure, but that's unavoidable in an idempotent hashing function.
If such an algorithm does exist, is it useful cryptographically?
Well, consider this: a hash typically is a map between two sets:
A -> B
where B is the set of possible hashes, and A is the set of things that are hashable.
Now, usually A is much bigger than B -- hashes are like shorter "checksums" that can be calculated from much larger streams of data.
Typically, you'd still want as little collisions as possible in your hash, meaning that statistically, all elements from B should have the same number of elements from A that map to them, and the elements from A that map to the same element in B should be "far away" from each other under some metric. This implies, that B tries as hard as possible to be the whole set of words of a constant length. It will be immensely harder to find a systematic function that does that, but still maps each element from B to the same element in B; you "enforce" collision. In general, that's a cryptographic weakness, and a serious one at that.
Now, considering your password case: I don't see how that would make any sense. It's cryptographically a bad idea to let your user authenticate with either his/her hashed password or in plain, because no matter what you'd do, you'd give away full information on how to forge authentication to all eavesdroppers.

BFX field to large for a data item increase -S

I am getting the above error when trying to run a script produce to a report. It is a pre-existing script that has been run, successfully many times before. Research has told me that that it is something to do with the stack size? I’m running 10.2B02 in WRQ Reflections. Can anyone tell me what this statement means and how I look up the value of my –S.
Thanks,
Paul
-s is a client startup parameter. You mention "Reflections" so you are probably using a character terminal session. The -s parameter is on the command line used to start Progress (which might be inside a script). If there is a -pf somefile.pf on the command line then it is inside that "parameter file". If it is not specified the default value is 40. The maximum value is limited by available memory but setting it in the hundreds or even in the thousands is not unheard of.
You can also get the startup values by sending a SIGUSR1 to the _progres process that the session is running. I.e. kill -USR1 That will (safely) create a "protrace." file that includes startup parameters and a 4gl stack trace. The file will appear in either the current directory, the home directory or the temp-file directory (I forget which, just look for protrace*).
This error usually means that your code is manipulating a field that is too large. (Like the error says.) That might be for a lot of reasons.
One common possibility is string concatenation in a loop.
Or you might be calling lots of sub-procedures and passing parameters around.
If "nothing has changed" in the code then it probably just means that some data structure has grown slightly larger over time and increasing -s is really no big deal so long as it solves the problem.
If you keep having to increase it then it is more likely that you have some sort of coding issue. Maybe you're passing things by value that ought to be passed by reference or maybe you have run away recursion. Or something else. You'd need to provide a lot more detail to say for sure.
It is also possible (but unlikely) that you have a corrupt data record that appears to have a field in it that is too large. You could run "proutil dbName -C dbanalys" as an initial step to see if that is true.
Part of the error message is non-standard -- I'm not certain which log file it is coming from or how it got there (applications can write their own messages) but it seems that it might have something to do with trying to send an e-mail. So I'd be suspicious that either the list of recipients got too long or that the body of the e-mail is too large.

xkcd: Externalities

So the April 1, 2013 xkcd Externalities web comic features a Skein 1024 1024 hash breaking contest. I'm assuming that this must be nothing more than a brute force effort where random strings are hashed in an effort to match Randall's posted hash? Is this correct?
Also, my knowledge of Skein hashing theory is virtually non-existent but being a halfway decent programmer I was able to download and run both SkeinFish (C#) and Maarten Bodewes Skein implementation (Java) locally in 1024 1024 mode with some input strings. The hashes that they gave, however, were different than the hash that xkcd returned for the same input. This may be an extremely naive question but do different Skein implementations give different hashes? And what Skein implementation is xkcd using?
Thanks for pardoning my ignorance!
There are several different iterations of the skein algorithm. XKCD is using version 1.3, which is also the most recent. Sources can be found here (look for "V1.3")
Interestingly enough, this brute-force method is the same one employed by Bitcoin to "mine" bitcoins. The big differences are the hash algorithm (SHA-256 in that case) and the target hash (which is dynamically determined to be any hash starting with a certain number of zeros.) It takes a lot of work to discover the hash, but once it has been found it is trivial to verify the source bits and that the resulting hash meets the criteria.
Here's the source code the Stanford team used. We ran this on about a hundred 8-core EC2 servers for a while, but not the whole competition.
https://github.com/jhiesey/skeincrack
If you were hashing non-alphanumeric characters (spaces, punctuation, etc.), you may have been getting different results due to HTML form encoding. The "enctype" attribute on the form XKCD was hosting was "application/octet-stream", which according to https://developer.mozilla.org/en-US/docs/HTML/Element/form is not a browser-supported standard. I assume the browser falls back on the URL-encoding type when it sees one it doesn't recognize.
I observed the string "=" being submitted URL-encoded in Chrome, and returning a different hash than what I got locally with the latest pyskein. But when I submitted it with this curl command line (no longer works), I got the expected hash:
curl -X POST --data-binary "hashable==" "http://almamater.xkcd.com/?edu=school.edu"
The Stanford code in another answer does the same thing, and they apparently had some success. I never got any random data to locally hash to a better score than even my own school, so I never got a chance to test thoroughly how to pass arbitrary data in properly. I don't know what the exact behavior was (e.g., perhaps if you omitted hashable= the server would detect that and just hash the whole POST body), but it may have intentionally been a little tricky as part of April Fool's.

When should forms be cleaned?

By "cleaned" I mean formatting inputs such as "a1b2c3" into "A1B 2C3" or "5551234567" into "(555) 123-4567". I figure we have few options:
As the user is typing. For instance, when a user is typing a postal code, all letters are instantly capitalized, or after the user types 3 digits of a phone number, it puts brackets around them.
When the field loses focus.
Never. Formatting happens on the server-side only, just before it is inserted into the DB. The user never gets to see how it was formatted unless it is displayed on the site somewhere.
(3b) If there were form errors, or on the confirmation page. If there are form errors and the form needs to be re-displayed, the formatting on the valid inputs will appear, or if you have a confirmation page (are these inputs correct?) they will show there.
Never ever. Data should be dumped into the database as-is and only formatted in the template/view just before it is displayed back to the user.
What do you think? I think I like (2). Reminds me of how code-formatting works in Visual Studio (happens when you close a brace or type a semi-colon).
I like to either filter the field just after it loses focus (when it is critical that the field be formatted correctly before they move on to the next field - which is rarely), or I filter the field content as soon as the user hits the "SUBMIT" button (or whatever you want to call it) to send the data to the server.
This has a few advantages for me:
The user's input is not interrupted with annoying "auto-corrections" - being auto-corrected can sometimes feel like demonic possession if it is not done well.
The user really neither cares, nor needs to know that you do not want the (,), or -, in your phone number field... so take it out quietly for them. No notes, or instructions needed.
Also, I ALWAYS filter the field values anyway to protect against any kind of code-injection attacks (which are alarmingly easy to pull off if you know what you are doing). I have read about entire databases being compromised because the author did not remove potential SQL markup from submitted data.... it makes me shudder.
It also allows me to check for ALL input errors (if any), or non-filled-out required fields and report a single set of issues to the user at a single time... I have been to sites that give you so many messages while filling out a form it feels a bit like having a nagging relative over your shoulder.
I'd go with either (1) or (2), depending on the kind of input. (1) is probably most user-friendly if done right, but it will be more complex to implement neatly (e.g., what happens if I delete a digit from a hyphenated phone number - or a hyphen?). Go with (1) if you can afford it, otherwise (2).
I follow the same method I use for validation. Once on the client side, once on the server side. Whether it happens on loose focus or as they type it doesn't really matter.
As the user is typing. For instance, when a user is typing a postal code, all letters are instantly capitalized, or after the user types 3 digits of a phone number, it puts brackets around them.
This type of input is excellent for things such as entering serial codes or CD keys for software or games. I notice a lot of people get confused whether or not the code is case sensitive or if they should be inputting the dashes as well.
If you have an iPhone you'll notice when entering a phone number it is also auto formatted with brackets and spaces as you enter it. But this often turns out to be confusing as a partially typed number is not always 'grouped' correctly.
Answer: It all depends on context.