XMPP DIGEST-md5 Algorithm to compute the response - xmpp

Im reading stuff on XMPP authentication flow. I understood that the response is calculated on the client using a an algorithm that goes like this
Create a string of the form "username:realm:password". Call this string X.
Compute the 16 octet MD5 hash of X. Call the result Y.
Create a string of the form "Y:nonce:cnonce:authzid". Call this string A1.
Create a string of the form "AUTHENTICATE:digest-uri". Call this string A2.
Compute the 32 hex digit MD5 hash of A1. Call the result HA1.
Compute the 32 hex digit MD5 hash of A2. Call the result HA2.
Create a string of the form "HA1:nonce:nc:cnonce:qop:HA2". Call this string KD.
Compute the 32 hex digit MD5 hash of KD. Call the result Z.
And this is embedded in the response field in the structure.
An example of decoded challenge response is
username="rob",realm="cataclysm.cx",nonce="OA6MG9tEQGm2hh",cnonce="OA6MHXh6VqTrRk",nc=00000001,qop=auth,digesturi="xmpp/cataclysm.cx",response=d388dad90d4bbd760a152321f2143af7,charset=utf-8,authzid="rob#cataclysm.cx/myResource"
But my concern is, I havent seen anywhere how that particular response field is used on the server ? Is there any use-case that really uses this. Can someone please cite references/insights/implementations how the response field is used and how does it actually plays a role in determining the authenticity of the user.
Thanks!

Currently I'am working on Digest MD5 authentication mechanism and I'm using RFC 2831 for reference and answer to your question is server generates that response again at server's site and then compares it with the response field that was received from client if they match client gets authenticated else authentication fails.
Now the reason why such comparison takes place is once a string is hashed it cannot be reversed to a source string so server computes the hashed value and those values will be equal if source string supplied was same
If my answer does not clarify your doubt let me know

Related

How to implement MongoDB ObjectId validation from scratch?

I'm developing a front-end app where I want to support searching data by the id, so I'm going to have an "object id" field. I want to validate the object id to make sure it's a valid MongoDB ObjectId before sending it to the API.
So I searched on how to do it and I found this thread, where all the answers suggest using a implementation provided by a MongoDB driver or ORM such as mongodb or mongose. However, I don't want to go that way, because I don't want to install an entire database driver/ORM in my front-end app just to use some id validation - I'd rather implement the validation myself.
Unfortunately, I couldn't find an existing implementation. Then I tried checking the ObjectId spec and implementing the validation myself, but that didn't work out either.
The specification says...
The ObjectID BSON type is a 12-byte value consisting of three different portions (fields):
a 4-byte value representing the seconds since the Unix epoch in the highest order bytes,
a 5-byte random number unique to a machine and process,
a 3-byte counter, starting with a random value.
Which doesn't make much sense to me. When it says the ObjectId has 12 bytes, it makes me think that the string representation is going to have 12 characters (1 byte = 1 char), but it doesn't. Most object ids have 24 characters.
Finally, I searched mongodb's and mongoose's source code but I didn't had much luck with that either. The best I could do was finding this line of code, but I don't know where to go from there.
TL;DR: What is the actual algorithm to check if a given string is a valid MongoDB Object Id?
You find is correct, you just stopped too early. the isValid comes from the underlying bson library: https://github.com/mongodb/js-bson/blob/a2a81bc1bc63fa5bf3f918fbcaafef25aca2df9d/src/objectid.ts#L297
And yes, you get it right - there is not much to validate. Any 12 bytes can be an object ID. The reason you see 24 characters is because not all 256 ASCII are printable/readable, so the ObjectID is usually presented in hex format - 2 characters per byte. The regexp to validate 12-bytes hex representation would be /[0-9a-f]{24}/i
TL;DR: check the constructor of ObjectId in the bson library for the official validation algorithm
Hint: you don't need most of it, as you are limited to string input on frontend.

GRPC test client GUI that supports representing a bytes type as a hex string?

MongoDB's ObjectID type is a 12 byte array. When you view the database, you can see it displayed as: ObjectId("6000e9720C683f1b8e638a49").
We also want to share this value with SQL server and pass it into a GRPC request.
When the same value stored in MS SQL server as a binary(12) column, it is displayed as: 0x6000E9720C683F1B8E638A49. It's simple enough to convert this representation to the Mongo representation.
However, when trying to pass it via GRPC as a bytes type, BloomRPC requires that we represent it in the format: "id": [96,0,233,114,12,104,63,27,142,99,138,73]
So I'm looking for a GRPC test client GUI application to replace BloomRPC that will support a hex string format similar to MongoDB or SQL server to represent the underlying byte array. Anyone have a line on something like this that could work?
We could just represent it as a string in the proto, but my personal opinion is that it should be unnecessary to do this. It will require our connected services to convert bytes->string->bytes on every GRPC call. The other 2 tools seem to be happy having a byte array in the background and representing it as a string in the front end, so if we could just get our last tool to behave the same, that would be great.

utf-8 gets stored differently on server side (JAVA)

Im trying to figure out the answer to one of my other questions but anyways maybe this will help me.
When I persist and entity to the server, the byte[] property holds different information than what I persisted. Im persisting in utf-8 to
the server.
An example.
{"name":"asd","image":[91,111,98,106,101,99,116,32,65,114,114,97,121,66,117,102,102,101,114,93],"description":"asd"}
This is the payload I send to the server.
This is what the server has
{"id":2,"name":"asd","description":"asd","image":"W29iamVjdCBBcnJheUJ1ZmZlcl0="}
as you can see the image byte array is different.
WHat im trying to do it get the image bytes saved on the server and display them on the front end. But i dont know how to get the original bytes.
No, you are wrong. Both method stored the ASCII string [object ArrayBuffer].
You are confusing the data with its representation. The data it is the same, but on both examples, you represent binary data in two different way:
The first as an array of bytes (decimal representation), on the second a classic for binary data representation: BASE64 (you may discover it because of final character =.
So you just have different representation of the same data. But so the data is stored on the same manner.
You may need to specify how to get binary data in string form (as in your example), and so the actual representation.

Decrypt SHA256 hash to original string?

Is it possible to take an original hash value, and then decode it back to the original string?
hash('sha256', $login_password_for_login) gets me a hash, as shown below, but I'd like to go from the hash value back to the original string.
With $login_password_for_login = 12345, the hash function gives me this:
5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5
I'd like to be able to retrieve the original number or string that I had for the login password. How do I reverse the hash and get that original string?
You don't 'decrypt' the hashes because hashing is not encryption.
As for undoing the hash function to get the original string, there is no way to go from hash to original item, as hashing is a one-direction action. You can take an item and get a hash, but you can't take the hash and get the original item.
Make a note that hashes should NOT be confused with encryption; encryption is a different process where you can take an item, encrypt it with some type of key (either preshared or symmetric keys like PGP keys), and then later decrypt it. Hashes do not work that way.
In comments, you indicate that you're trying to save a passcode in the database. The problem is, you don't want someone who can breach the DB to be immediately be able to decrypt passcodes, which is why hashing is so attractive.
The idea, then, is that you would consider using salted hashes, storing only the salt on a per-user basis in the DB as its own record, and then store the salted hash of their original password string in the database.
Then, to verify a password is entered proper, get the salt from the DB, get the user input for a given password, and then using the salt from the DB, get the salted hash for that input. Take that resultant hash and compare it to the salted hash stored in the DB. If they match, you have a validated password; if they don't match, it's invalid.
This way, there's actually no decryption of any passwords readily doable, which means in a data breach situation of your site the passwords are not easily able to be retrieved. (This doesn't rule out someone breaching your database, copying down the data, and trying to brute-force the passwords, but depending on what you enforce for password complexity and the effort a hacker wants to actually go through to get credentials, this is less likely to happen)
I'd write an example of this in a language I understand, but as you don't define what language you're working with, it's not going to be possible for me to write a useful example for you here.
That said, if you're working with PHP, you may find this document on crackstation.net about doing secure salted password hashing properly; there's already PHP implementations to do this proper so you wouldn't have to write your own code, supposedly.
Hashes cannot be decrypted, as they are not encryption.
Although the output of a hash function often looks similar to that of an encryption function, hashing is actually an extremely lossy form of data compression. When I say "extremely lossy", I mean "all of the original data is destroyed in order to get a fixed length." Since none of your original data remains, you cannot decrypt a hash.
That being said, hashes can be used to emulate encryption. What you do is that, when a person registers, you make a tuple containing the hashes of their username and password. Then, when somebody tries to login, you compare the hashes like this*:
import hashlib
from login_info import logins # This is an array containing the tuples.
def hasher(string: str) -> bytes:
stringer = bytes(string)
return hashlib.sha256(stringer).hexdigest()
def login(username: str, password: str) -> int: # Returns 0 if login correct, else 1.
user = hasher(username)
pass = hasher(password)
for i in range(len(logins)):
if logins[i][0] == user:
if logins[i][1] == pass:
return 0
else:
return 1
else:
return 1
* Nota Bene: I am using Python 3 for the example, as my PHP and Javascript are a little out of practice.
EDIT: On second thought, it is actually possible to (somewhat) decrypt a hash. Basically, you take the hash and then try every entry in the corresponding section of the hash table to see if it's right. This is why you should always salt password hashes.

Unique identifier for an email

I am writing a C# application which allows users to store emails in a MS SQL Server database. Many times, multiple users will be copied on an email from a customer. If they all try to add the same email to the database, I want to make sure that the email is only added once.
MD5 springs to mind as a way to do this. I don't need to worry about malicious tampering, only to make sure that the same email will map to the same hash and that no two emails with different content will map to the same hash.
My question really boils down to how one would combine multiple fields into one MD5 (or other) hash value. Some of these fields will have a single value per email (e.g. subject, body, sender email address) while others will have multiple values (varying numbers of attachments, recipients). I want to develop a way of uniquely identifying an email that will be platform and language independent (not based on serialization). Any advice?
What volume of emails do you plan on archiving? If you don't expect the archive require many terabytes, I think this is a premature optimization.
Since each field can be represented as a string or array of bytes, it doesn't matter how many values it contains, it all looks the same to a hash function. Just hash them all together and you will get a unique identifier.
EDIT Psuedocode example
# intialized the hash object
hash = md5()
# compute the hashes for each field
hash.update(from_str)
hash.update(to_str)
hash.update(cc_str)
hash.update(body_str)
hash.update(...) # the rest of the email fields
# compute the identifier string
id = hash.hexdigest()
You will get the same output if you replace all the update calls with
# concatenate all fields and hash
hash.update(from_str + to_str + cc_str + body_str + ...)
How you extract the strings and interface will vary based on your application, language, and api.
It doesn't matter that different email clients might produce different formatting for some of the fields when given the same input, this will give you a hash unique to the original email.
Have you looked at some other headers like (in my mail, OS X Mail):
X-Universally-Unique-Identifier: 82d00eb8-2a63-42fd-9817-a3f7f57de6fa
Message-Id: <EE7CA968-13EB-47FB-9EC8-5D6EBA9A4EB8#example.com>
At least the Message-Id is required. That field could well be the same for the same mailing (send to multiple recipients). That would be more effective than hashing.
Not the answer to the question, but maybe the answer to the problem :)
Why not just hash the raw message? It already encodes all the relevant fields except the envelope sender and recipient, and you can add those as headers yourself, before hashing. It also contains all the attachments, the entire body of the message, etc, and it's a natural and easy representation. It also doesn't suffer from the easily generated hash collisions of mikerobi's proposal.