Securely sharing data with BigQuery/ Hashing - hash

I need to share some data I currently have stored in BigQuery, I would like to do this securely, I note from the documentation that I can hash values, my question is if I hash a value, how can the recipient read them?
Do I need to send anything along with hashed data in order for it to be "un-hashed", or can I just tell the recipient which method I used to hash it?

Related

How can i find this Hashing Algorithm?

So hi, i have a String that is saved as a hash on an Azure SQL DB, but i can't seem to find out in which Algorithm it is saved, we want to Migrate our Users to a Firestore DB but we apparently need the Algorithm for the First Login.
Hashed String: VxfCosOIw7PDrsOqw78YwqtCwoxUK8KCwpVkw5LCn0hcf8OgZsKEwpTDqSvDmMOMwql+
Original String: Drag2311
Salt: +2zPSiLUCzdASr3dS1fRrH6vxEAU6/V4kr/73uVmRoo=
I've seen on other posts that people asked for the Original String, so i just posted all relevant information that i have, and hope that someone can help me.
EDIT: I have checked the code and couldn't find anything related to Hashing, and am relatively sure that it is Server Encryption. Its a CMK and a CEK, but i still have a hard time to find a way to look up the set Algorithm.
I have found out that i need to know the keys of the Column in MS SQL Column Encrytion, so i either have to contact the ones that set it or the Azure Support.
So rather than doing it like that, im just going to do it rather ineffectively, by migrating the data of those accounts but not the accounts themselves.
In the Firebase Functions i will define that on a new account create it should check if the email exists in a json object with the email and userid, and if it exists then it will just hook the relating userid with the old data to that account, rather than creating a new set of datas.
Thanks for your time and i noticed that i should have done it this way before.

How to recalculate private data hash from Hyperledger Fabric

I need to recompute the hash of private data to proof the integrity of the data. When private data collections are used the private data are stored in SideDBs and the hash of the data on the ledger according to the documentation. Basically the question splits up into two subquestions:
How to access the hash of the private data?
Which method to use to recompute the hash that is saved on the ledger?
Thanks in advance.
I use Hyperledger Fabric v1.4.2 with private data. I followed marbles example.
I expect to be able to calculate the private data hash and verify that it corresponds to the hash saved in the ledger.
to get the SHA256 hash (using Fabric 1.4.x contract API) use:
let pdHashBytes = await ctx.stub.getPrivateDataHash(collectionA, readKey);
let actual_hash = pdHashBytes.toString('hex');
You can calculate the private data written on Ubuntu like shown below.
echo -n "{\"name\":\"Joe\",\"quantity\":999}" |shasum -a 256
and verify they match. So that's the mechanics of using private data method and verify patterns. Now lets add information about salting mechanics, as mentioned elsewhere in this post.
For most uses of private data, you'll most likely use a random salt so the private data cannot be brute force attacked in the permissioned blockchain network (between agreed parties). The salt is passed along in the same transient field as the private data. And (later on), it will need to be included with the private data itself, when recalculating the private data hash. See https://hyperledger-fabric.readthedocs.io/en/release-1.4/private-data-arch.html#protecting-private-data-content
Don't use it, private data is security hole.
It amazes me that nobody had mentioned this before so I guess I better point this out now before more damages are being done.
The logic behind Privated data is simple, it puts data in a local embedded data store and puts a hash of that data on Blockchain.
The issue is that cryptographic hash is not an encryption mechanism, same data hashed by anyone using the same hashing algorithm (which is also very standardized) will always get the same hash! This is exactly what hash functions are designed for, and that’s why we use hash in digital signature to allow anyone to validate signed data.
However, this also means anyone can “decrypt” the data behind the hash by using dictionary attack.
Hashing is cheap, the cost of each hash on a normal laptop CPU core is about 3 microseconds, basically I can create 1 billion candidate hashes within one hour on a single laptop CPU core, and compare them to the hashes on Hyperledger Fabric DLT.
And I am just talking about using a single core on my laptop, not even 50% power of my laptop
Why is it dangerous? Because if an attacker is connected to a Blockchain system, the attacker knows the range of the data being hashed (etc, trade ID, item name, bank name, address, cell phone number), so you can easily create dictionary attack to get the true data behind the hash out.
How about adding salt to each data to be hashed? Well, that’s one thing Hyperledger Fabric didn’t do.
To their defense, Hyperledger didn’t implement salt because it is difficult to pass salts to counter parties. You can’t use DLT to pass salt value because attackers would see it, so you have to create another P2P connection with counter party. If you need to create connection with all the counter parties, what’s the point of using Blockchain in the first place?
It’s just scary that so many people are using this security whole.

is it possible for JWT to generate a same token, two times?

is it safe to only use tokens to access/change a user's data in database?
Imagine I get a request with only jwt inside of it and I want to change something for only that user using his token which is stored in database, How do I be certain that no two users have same tokens stored in database? Do I need to get his username ( another data stored in database ) and compare both of them or only token is enough?
Probablytm. The chances of users details hashing to the same string (a collision) are pretty darn small.
The header will be quite common but the payload will vary wildy, depending on the algorithm. The signature is a product of the first two so a collision depends on the payload. This has so info on that (see the accepted answer) https://crypto.stackexchange.com/questions/2558/how-many-rsa-keys-before-a-collision
The simplest answers is that it is possible but it's very very unlikely that this will happen

Decrypt SHA256 hash to original string?

Is it possible to take an original hash value, and then decode it back to the original string?
hash('sha256', $login_password_for_login) gets me a hash, as shown below, but I'd like to go from the hash value back to the original string.
With $login_password_for_login = 12345, the hash function gives me this:
5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5
I'd like to be able to retrieve the original number or string that I had for the login password. How do I reverse the hash and get that original string?
You don't 'decrypt' the hashes because hashing is not encryption.
As for undoing the hash function to get the original string, there is no way to go from hash to original item, as hashing is a one-direction action. You can take an item and get a hash, but you can't take the hash and get the original item.
Make a note that hashes should NOT be confused with encryption; encryption is a different process where you can take an item, encrypt it with some type of key (either preshared or symmetric keys like PGP keys), and then later decrypt it. Hashes do not work that way.
In comments, you indicate that you're trying to save a passcode in the database. The problem is, you don't want someone who can breach the DB to be immediately be able to decrypt passcodes, which is why hashing is so attractive.
The idea, then, is that you would consider using salted hashes, storing only the salt on a per-user basis in the DB as its own record, and then store the salted hash of their original password string in the database.
Then, to verify a password is entered proper, get the salt from the DB, get the user input for a given password, and then using the salt from the DB, get the salted hash for that input. Take that resultant hash and compare it to the salted hash stored in the DB. If they match, you have a validated password; if they don't match, it's invalid.
This way, there's actually no decryption of any passwords readily doable, which means in a data breach situation of your site the passwords are not easily able to be retrieved. (This doesn't rule out someone breaching your database, copying down the data, and trying to brute-force the passwords, but depending on what you enforce for password complexity and the effort a hacker wants to actually go through to get credentials, this is less likely to happen)
I'd write an example of this in a language I understand, but as you don't define what language you're working with, it's not going to be possible for me to write a useful example for you here.
That said, if you're working with PHP, you may find this document on crackstation.net about doing secure salted password hashing properly; there's already PHP implementations to do this proper so you wouldn't have to write your own code, supposedly.
Hashes cannot be decrypted, as they are not encryption.
Although the output of a hash function often looks similar to that of an encryption function, hashing is actually an extremely lossy form of data compression. When I say "extremely lossy", I mean "all of the original data is destroyed in order to get a fixed length." Since none of your original data remains, you cannot decrypt a hash.
That being said, hashes can be used to emulate encryption. What you do is that, when a person registers, you make a tuple containing the hashes of their username and password. Then, when somebody tries to login, you compare the hashes like this*:
import hashlib
from login_info import logins # This is an array containing the tuples.
def hasher(string: str) -> bytes:
stringer = bytes(string)
return hashlib.sha256(stringer).hexdigest()
def login(username: str, password: str) -> int: # Returns 0 if login correct, else 1.
user = hasher(username)
pass = hasher(password)
for i in range(len(logins)):
if logins[i][0] == user:
if logins[i][1] == pass:
return 0
else:
return 1
else:
return 1
* Nota Bene: I am using Python 3 for the example, as my PHP and Javascript are a little out of practice.
EDIT: On second thought, it is actually possible to (somewhat) decrypt a hash. Basically, you take the hash and then try every entry in the corresponding section of the hash table to see if it's right. This is why you should always salt password hashes.

Unique identifier for an email

I am writing a C# application which allows users to store emails in a MS SQL Server database. Many times, multiple users will be copied on an email from a customer. If they all try to add the same email to the database, I want to make sure that the email is only added once.
MD5 springs to mind as a way to do this. I don't need to worry about malicious tampering, only to make sure that the same email will map to the same hash and that no two emails with different content will map to the same hash.
My question really boils down to how one would combine multiple fields into one MD5 (or other) hash value. Some of these fields will have a single value per email (e.g. subject, body, sender email address) while others will have multiple values (varying numbers of attachments, recipients). I want to develop a way of uniquely identifying an email that will be platform and language independent (not based on serialization). Any advice?
What volume of emails do you plan on archiving? If you don't expect the archive require many terabytes, I think this is a premature optimization.
Since each field can be represented as a string or array of bytes, it doesn't matter how many values it contains, it all looks the same to a hash function. Just hash them all together and you will get a unique identifier.
EDIT Psuedocode example
# intialized the hash object
hash = md5()
# compute the hashes for each field
hash.update(from_str)
hash.update(to_str)
hash.update(cc_str)
hash.update(body_str)
hash.update(...) # the rest of the email fields
# compute the identifier string
id = hash.hexdigest()
You will get the same output if you replace all the update calls with
# concatenate all fields and hash
hash.update(from_str + to_str + cc_str + body_str + ...)
How you extract the strings and interface will vary based on your application, language, and api.
It doesn't matter that different email clients might produce different formatting for some of the fields when given the same input, this will give you a hash unique to the original email.
Have you looked at some other headers like (in my mail, OS X Mail):
X-Universally-Unique-Identifier: 82d00eb8-2a63-42fd-9817-a3f7f57de6fa
Message-Id: <EE7CA968-13EB-47FB-9EC8-5D6EBA9A4EB8#example.com>
At least the Message-Id is required. That field could well be the same for the same mailing (send to multiple recipients). That would be more effective than hashing.
Not the answer to the question, but maybe the answer to the problem :)
Why not just hash the raw message? It already encodes all the relevant fields except the envelope sender and recipient, and you can add those as headers yourself, before hashing. It also contains all the attachments, the entire body of the message, etc, and it's a natural and easy representation. It also doesn't suffer from the easily generated hash collisions of mikerobi's proposal.