User Names and White-Spaces [closed] - user-input

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
In past many years I have registered on various applications and platforms hosted online or offline.
Why white-spaces are not allowed in User Names as spaces are very natural to names and most of the computing systems can handle them efficiently.
(Many people can raise similar questions about other special characters which are illegal. But their case is far more understandable as they are not even natural to real world naming schemes. And surely!)

One subtle problem related to spaces in user names is that the space character is "invisible" and two consecutive spaces may look very similar to a single space. Errors that arise from entering two instead of one space can be hard to spot and this is one reason to disallow spaces all together.
Some systems may disallow spaces but still allow a non-breaking space. A smart user can use this fact to include a space in his user name.

I think in reality this is probably one of those conventions that needs to be broken. Most systems now deal with a lot of sophisticated data and are used to correctly processing text which includes spaces. I was delighted to discover that fogbugz (another plug) will accept your email address, your username or your real name, as you have entered it; as your username when you log on.
This is simply a convention that is still around from the days of 8 letter file names and probably also 8 letter user names. I would suggest you allow it in your web app and let the world follow you. :)

I imagine because some code somewhere is still processing the input as a set of space seperated parameters, much the way the Windows command prompt handles unquoted file names. For example if you were to pass the user name to an external executable process, written in C, where the user name was passed on the command line, it would arrive in the C application as two arguments.
While this mightn't happen much in practice any more, much the same as many special characters, I guess its the reason why its there.

The only reason I know of that makes any sense is that if you're parsing tokens on whitespace, putting space in a user name will cause it to fail.
However, I do agree with you: in today's environment, there's probably not a lot of reason to continue doing it, except where legacy compatibility makes sense (*nix, etc).

I think because generally its a tendency to trim input field values before they are actually saved, say in Database.
Since we trim off the white spaces and if we allow them in password or user name, you can imagine there will be a big issue if user entered a password as "PWD ".

It prevents confusingly similar name combinations e.g. "John Smith" and "JohnSmith". It also makes it easier to automatically recognize names that appear within text.

It depends where they're going to be used. Not using spaces in unix user names makes sense for the same reason it makes sense not to use them in unix filenames - they're a pain to type at the command line. That said, unix does allow spaces in user names as well as in filenames.
I can see no reason for things like web apps not to allow spaces.
Actually the thing that annoys me most is web apps not allowing # in user names. When it's something with millions of users the chances of a name I really want being available is small, so I like to use my email address which at least is guaranteed to be unique.

Related

What is "dont" and "isnt" in the pertained GloVe vector files (e.g. glove.6B.50d.txt)?

I found these 2 words "dont" and "isnt" in the vector file glove.6B.50d.txt downloaded from https://nlp.stanford.edu/projects/glove/. I wonder if they were originally "don't" and "isn't". This will likely depend on the sentence_to_word parsing algorithms they used. If someone is familiar, please confirm if this is the case.
A secondary question is if this is a common way to deal with apostrophe for words like "don't", "isn't", "hasn't" and so on. i.e. just filter replace that apostrophe with an empty string such that "don" and "t" becomes one word.
Finally, I am also not sure if GloVe comes with API to do sentence_to_word parsing so you can be consistent with what the researchers have done originally.
I think dont and isnt really are originally don't and isn't. I have seen a few other such examples. I suspect this is just the specific way GloVe researchers handle this.

Io language user input

I recently started messing around with the Io programming language and think it's pretty fun and simple to learn. But I also hate that there is so little documentation and support for it. Normally I come to SO for help, but even on here the topic is sparse.
I am learning from the 7 languages in 7 weeks book, which I like, but there he mainly talks about the deeper uses of Io.
My question is probably extremely simple but I can't find an answer anywhere... How do you actually ask a user for input? I've found ways to pass along set strings, read in strings from files, but I can't find a way to ask a user for input.
What I'm working on now is writing a function that accepts 2 parameters: a string and a substring to find in that string. The function finds the substring in the string and prints the index. I don't even know if I should be asking the user for input or doing this another way...
I'm trying to get some keyboard time in on Io but it's frustrating :/
Also, does anyone know of any IRC channels that are friendly to beginners? Not necessarily just Io, but in general?
Thanks guys.
On the topic of IRC, there's irc.freenode.net and the #io channel. We're not always active, but if you hang around, I usually poke in at least once a day.
On the topic of user input however, You can do this:
x := File standardInput readLine
This will get a single line of input, up to where the user hit the enter/return key, and capture that in x.

Form fields validation: which characters to include / exclude? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
A google search gave me the methods to validate form fields, but I can already construct them. My question is, which are the characters that are safe to include and which are to exclude in a form field? Specifically, username and password.
A brief explanation would be nice too.
Thanks.
You need to exclude all characters you will never have inside you datas.
Do you think it would be any sense to have special characters if your usernames/passwords must only contains alphanumeric characters ?
Look at some REGEX for JAVA or for PHP.
There is a regexp reference table which could be usefull too.
If you give us more information about the langage you are using, we could maybe help us more.
Have a good day!
[UPDATE]
There is the security reference which is very good and the OWASP website which is a real reference for any web security related topics, look at the OWASP Cheat Sheets.
#**Cross-Site Scripting Vulnerabilities?**
#for any programming language, the chars you should reject or handle properly are:
> < ( ) [ ] ' " ; : / |
#for PHP, tools to handle with care:
strip_tags(), utf8_decode(), htmlspecialchars(), strtr()
#do Positive/Negative filtering
#check Encoding
#**SQL Injection ?**
#etc...
[/UPDATE]
If you properly sanitize your input and output, there's nothing you need to be afraid of.
Note: I'm assuming you're using PHP as your server side language.
First, use PDO (or MySQLi) with prepared statements, to eliminate the risk of SQL Injection.
Second, anything that will be displayed on your site should be sanitized against XSS attacks (so that users don't register a username of <script>doSomeEvilStuff()</script>).
That's it basically, if you're really paranoid, you should be using a whitelist (to only allow certain characters) and not a blacklist (to only disallow certain characters), since someone will always find a way to bypass a blacklist, but no one can bypass a whitelist.
For usernames, I don't see the need for anything more than /[a-zA-Z0-9_.\s!$%^&*\-+=]/ You may think otherwise. In any case, don't allow /[`<>(){}[]]/

When should forms be cleaned?

By "cleaned" I mean formatting inputs such as "a1b2c3" into "A1B 2C3" or "5551234567" into "(555) 123-4567". I figure we have few options:
As the user is typing. For instance, when a user is typing a postal code, all letters are instantly capitalized, or after the user types 3 digits of a phone number, it puts brackets around them.
When the field loses focus.
Never. Formatting happens on the server-side only, just before it is inserted into the DB. The user never gets to see how it was formatted unless it is displayed on the site somewhere.
(3b) If there were form errors, or on the confirmation page. If there are form errors and the form needs to be re-displayed, the formatting on the valid inputs will appear, or if you have a confirmation page (are these inputs correct?) they will show there.
Never ever. Data should be dumped into the database as-is and only formatted in the template/view just before it is displayed back to the user.
What do you think? I think I like (2). Reminds me of how code-formatting works in Visual Studio (happens when you close a brace or type a semi-colon).
I like to either filter the field just after it loses focus (when it is critical that the field be formatted correctly before they move on to the next field - which is rarely), or I filter the field content as soon as the user hits the "SUBMIT" button (or whatever you want to call it) to send the data to the server.
This has a few advantages for me:
The user's input is not interrupted with annoying "auto-corrections" - being auto-corrected can sometimes feel like demonic possession if it is not done well.
The user really neither cares, nor needs to know that you do not want the (,), or -, in your phone number field... so take it out quietly for them. No notes, or instructions needed.
Also, I ALWAYS filter the field values anyway to protect against any kind of code-injection attacks (which are alarmingly easy to pull off if you know what you are doing). I have read about entire databases being compromised because the author did not remove potential SQL markup from submitted data.... it makes me shudder.
It also allows me to check for ALL input errors (if any), or non-filled-out required fields and report a single set of issues to the user at a single time... I have been to sites that give you so many messages while filling out a form it feels a bit like having a nagging relative over your shoulder.
I'd go with either (1) or (2), depending on the kind of input. (1) is probably most user-friendly if done right, but it will be more complex to implement neatly (e.g., what happens if I delete a digit from a hyphenated phone number - or a hyphen?). Go with (1) if you can afford it, otherwise (2).
I follow the same method I use for validation. Once on the client side, once on the server side. Whether it happens on loose focus or as they type it doesn't really matter.
As the user is typing. For instance, when a user is typing a postal code, all letters are instantly capitalized, or after the user types 3 digits of a phone number, it puts brackets around them.
This type of input is excellent for things such as entering serial codes or CD keys for software or games. I notice a lot of people get confused whether or not the code is case sensitive or if they should be inputting the dashes as well.
If you have an iPhone you'll notice when entering a phone number it is also auto formatted with brackets and spaces as you enter it. But this often turns out to be confusing as a partially typed number is not always 'grouped' correctly.
Answer: It all depends on context.

Do Perl CGI programs have a buffer overflow or script vulnerability for HTML contact forms?

My hosting company says it is possible to fill an HTML form text input field with just the right amount of garbage bytes to cause a buffer overflow/resource problem when used with Apache/HTTP POST to a CGI-Bin Perl script (such as NMS FormMail).
They say a core dump occurs at which point an arbitrary script (stored as part of the input field text) can be run on the server which can compromise the site. They say this isn't something they can protect against in their Apache/Perl configuration—that it's up to the Perl script to prevent this by limiting number of characters in the posted fields. But it seems like the core dump could occur before the script can limit field sizes.
This type of contact form and method is in wide use by thousands of sites, so I'm wondering if what they say is true. Can you security experts out there enlighten me—is this true? I'm also wondering if the same thing can happen with a PHP script. What do you recommend for a safe site contact script/method?
I am not sure about the buffer overflow, but in any case it can't hurt to limit the POST size anyway. Just add the following on top of your script:
use CGI qw/:standard/;
$CGI::POST_MAX=1024 * 100; # max 100K posts
$CGI::DISABLE_UPLOADS = 1; # no uploads
Ask them to provide you with a specific reference to the vulnerability. I am sure there are versions of Apache where it is possible to cause buffer overflows by specially crafted POST requests, but I don't know any specific to NMS FormMail.
You definitely should ask for specifics from your hosting company. There are a lot of unrelated statements in there.
A "buffer overflow" and a "resource problem" are completely different things. A buffer overflow suggests that you will crash perl or mod_perl or httpd themselves. If this is the case, then there is a bug in one of these components, and they should reference the bug in question and provide a timeline for when they will be applying the security update. Such a bug would certainly make Bugtraq.
A resource problem on the other hand, is a completely different thing. If I send you many megabytes in my POST, then I could eat an arbitrary amount of memory. This is resolvable by configuring the LimitRequestBody directive in httpd.conf. The default is unlimited. This has to be set by the hosting provider.
They say a core dump occurs at which point an arbitrary script (stored as part of the input field text) can be run on the server which can compromise the site. They say this isn't something they can protect against in their Apache/Perl configuration—that it's up to the Perl script to prevent this by limiting number of characters in the posted fields. But it seems like the core dump could occur before the script can limit field sizes.
Again, if this is creating a core dump in httpd (or mod_perl), then it represents a bug in httpd (or mod_perl). Perl's dynamic and garbage-collected memory management is not subject to buffer overflows or bad pointers in principle. This is not to say that a bug in perl itself cannot cause this, just that the perl language itself does not have the language features required to cause core dumps this way.
By the time your script has access to the data, it is far too late to prevent any of the things described here. Your script of course has its own security concerns, and there are many ways to trick perl scripts into running arbitrary commands. There just aren't many ways to get them to jump to arbitrary memory locations in the way that's being described here.
Formail has been vulnerable to such in the past so I believe your ISP was using this to illustrate. Bad practices in any perl script could lead to such woe.
I recommend ensuring the perl script verifies all user input if possible. Otherwise only use trusted scripts and ensure you keep them updated.