I am learning the design of KDC, and find the protocol needs 3 rounds of info exchange. But I think the step of TGT is duplicate and unnecessary, for the KDC can just send the ticket in the 1st round.
So why is the design of the second round? What is the use of exchange of TGT?
It's not unnecessary. It's there as a long term optimization.
With Kerberos you have the two flows between the KDC and client:
AS-REQ: Exchanges a human supplied credential into a ticket (e.g. password, certificate, etc.).
TGS-REQ Exchanges a KDC-supplied ticket for another ticket.
The AS-REQ can request any ticket it wants. In practice it only requests krbtgt. The AS-REQ is designed to evaluate the used credential, look up the identity in the backing directory, apply any policy, and whatever else the KDC thinks is actually an expensive operation. Credential verification/derivation/etc. can be an expensive operation. Querying the directory for things like (say in Active Directory's case) group membership is incredibly expensive. This is expensive for the client because it's most likely always doing key derivation, and it's expensive for the KDC because it's always going to query the directory.
If you ask for krbtgt you unlock access to the TGS-REQ flow.
The TGS-REQ flow verifies the krbtgt, looks up the requested service in the directory, and copies the internal contents of the krbtgt ticket into the requested service ticket. That is orders of magnitude faster because it skips most of the stuff that happened in AS-REQ flow. It does still query the directory, but that's cheap compared to everything else. The client doesn't do any key derivation now.
More importantly now you don't need to keep the long term credential in memory anymore because you have the TGT.
Related
When building distributed systems, it must be ensured the client and the server eventually ends up with consistent view of the data they are operating on, i.e they never get out of sync. Extra care is needed, because network can not be considered reliable. In other words, in the case of network failure, client never knows if the operation was successful, and may decide to retry the call.
Consider a microservice, which exposes simple CRUD API, and unbounded set of clients, maintained in-house by the same team, by different teams and by different companies also.
In the example, client request a creation of new entity, which the microservice successfully creates and persists, but the network fails and client connection times out. The client will most probably retry, unknowingly persisting the same entity second time. Here is one possible solution to this I came up with:
Use client-generated identifier to prevent duplicate post
This could mean the primary key as it is, the half of the client and server -generated composite key, or the token issued by the service. A service would either persist the entity, or reply with OK message in the case the entity with that identifier is already present.
But there is more to this: What if the client gives up after network failure (but entity got persisted), mutates it's internal view of the entity, and later decides to persist it in the service with the same id. At this point and generally, would it be reasonable for the service just silently:
Update the existing entity with the state that client posted
Or should the service answer with some more specific status code about what happened? The point is, developer of the service couldn't really influence the client design solutions.
So, what are some sensible practices to keep the state consistent across distributed systems and avoid most common pitfalls in the case of network and system failure?
There are some things that you can do to minimize the impact of the client-server out-of-sync situation.
The first measure that you can take is to let the client generate the entity IDs, for example by using GUIDs. This prevents the server to generate a new entity every time the client retries a CreateEntityCommand.
In addition, you can make the command handing idempotent. This means that if the server receives a second CreateEntityCommand, it just silently ignores it (i.e. it does not throw an exception). This depends on every use case; some commands cannot be made idempotent (like updateEntity).
Another thing that you can do is to de-duplicate commands. This means that every command that you send to a server must be tagged with an unique ID. This can also be a GUID. When the server receives a command with an ID that it already had processed then it ignores it and gives a positive response (i.e. 200), maybe including some meta-information about the fact that the command was already processed. The command de-duplication can be placed on top of the stack, as a separate layer, independent of the domain (i.e. in front of the Application layer).
There are some restful apis, as follows:
api/v1/billing/invoices/{invoiceNumber}
api/v1/billing/transactions/{transactionNumber}
And, each invoice or transaction belong to a specific account.
When implementing the restful apis, we must meet: Each account can only view their own invoice or transaction.
How should we isolate the data in restful apis?
Of course, we can pass the account number to the api, such as:
api/v1//billing/invoices/{invoiceNumber}?accoutNumber=XXX
api/v1/billing/{accountNumber}/invoices/{invoiceNumber}
But the Invoice Number has been able to uniquely identify a resource. So I do not want the problem to be complicated.
Is there any other way to solve this problem?
You are mixing a lot of things here.
This is not a REST problem, this is a security problem. More precisely, it's a OWASP top 10 2013 Insecure direct object vulnerability.
Let's make it simple: you have a URL like this
.../superSensitiveStuff/1
and you want to prevent the owner of "1" from accessing to ".../superSensitiveStuff/2"
To the best of my knowledge, there are three ways of dealing with this issue:
enforcing integrity in request URLs. This strategy does not apply to all cases, it only works in those scenarios where the client issues a request to a resource previously communicated by the server. In this case, the server may add a query param like this
.../superSensitiveStuff/1?sec=HMAC(.../superSensitiveStuff/1)
where HMAC is a cryptographic HASH function. If the parameter is missing, the server will drop the request and if it's there the server will be able to verify that it's exactly the authorized URL because the HMAC value guarantees its integrity(for additional infos, hit the link above).
using unpredictable references. The problem here is that a user can guess another id. "uhmm... I have the resource number 1, let me check whether the resource number 2 exists". If you drop sequences and move to long random number this is very hard to do. The resource will become
.../superSensitiveStuff/195A23FR3548...32OT465
This is good because it's effective and cheap.
exploiting a mixed RBAC-ABAC approach. RBAC stands for Role Based Access Control and this is what you are using. The leading A of the second acronym stands for Attribute. This means that access is provided on the basis of a user role and an attribute. In this case is the userId, since it must be authenticated for accessing private resources. In few words, when a user requests a specific .../superSensitiveStuff resource it is loaded from the repository when you have the ownership information for that resource. It could be a DB, for example, and your SuperSensitiveStuff java business model could be like this
public class SuperSensitiveStuff {
private String userId;
private String secretStuff;
...
}
now, in your controller you can do the following
String principal = getPrincipal(); //you request the logged userId
SuperSensitiveStuff resource = myService.load(id); //you load the resource using the {id} in the request path
if (resource.getUserId.equals(principal))
return resource //200 ok, this is an authorized access
else
throw new EvilAttemptException() //401 unauthorized, cheater detected
I'm in the midst of finalising the set of cmdlets for a server application. Part of the application includes security principal management and data object management, and "expiration" of both (timed and manual). After the expiration date, login and access for the security principal is refused and access to the data owned by that principal is optionally prevented (either immediately by deletion or as part of automatic maintenance by marking it as expired).
From the output of Get-Verb, I cannot see an obvious synonym for Expire, which is the most natural choice of verb for the action being undertaken here. Expire on a security principal expires the principal and may also expire all their stored data, while expire of a data object is restricted to that object.
Set- is already in use for both object types, and has a partial overlap in functionality (Expire- forces a date in the past, and removes data, while Set- will allow future or past dates but NOT remove the data).
In this fashion Expire is combining two operations (Set+Remove) and for data-security reasons, we wouldn't want to force separation into the two operations (that's already possible).
For this reason, I also consider that Disable- is not appropriate since it suggests the possibility of reversal with Enable-.
I also think Remove- by itself is inappropriate since there are data records specifically not deleted as part of the operation.
Unpublish seems very close at least for the data, but again it seems that the intent is for Unpublish and Publish to be paired, and in this case it would not be reversible. It also does not make sense when applied to the security principal.
So which (if any) standard verb would you expect to use, if you wanted to expire something?
Looking at the list of approved verbs, two jump out at me:
Deny (dn):
Refuses, objects, blocks, or opposes the state of a resource or process.
Revoke (rk): Specifies an action that does not allow access to a resource. This verb is paired with Grant.
I wouldn't worry too much if there is not a paired operation, since that happens with some of the built-in cmdlets. Stop-Computer, for example, has no paired Start-Computer. There is Remove-Variable, but no Add-Variable (there is New-Variable). I think that it is only important if a paired command exists that it is named consistently.
Another option may be to use something like a Set-ObjectExpiration/Get-ObjectExpiration especially, if it makes sense to want to query when objects are going to expire.
What about Invoke? It could be Invoke-ExpireAppObject Or something like that.
There really isn't an approved verb that fits your scenario based on MS reccomendations
I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists
I'm writing an app which main purpose is to keep list of users
purchases.
I would like to ensure that even I as a developer (or anyone with full
access to the database) could not figure out how much money a
particular person has spent or what he has bought.
I initially came up with the following scheme:
--------------+------------+-----------
user_hash | item | price
--------------+------------+-----------
a45cd654fe810 | Strip club | 400.00
a45cd654fe810 | Ferrari | 1510800.00
54da2241211c2 | Beer | 5.00
54da2241211c2 | iPhone | 399.00
User logs in with username and password.
From the password calculate user_hash (possibly with salting etc.).
Use the hash to access users data with normal SQL-queries.
Given enough users, it should be almost impossible to tell how much
money a particular user has spent by just knowing his name.
Is this a sensible thing to do, or am I completely foolish?
I'm afraid that if your application can link a person to its data, any developer/admin can.
The only thing you can do is making it harder to do the link, to slow the developer/admin, but if you make it harder to link users to data, you will make it harder for your server too.
Idea based on #no idea :
You can have a classic user/password login to your application (hashed password, or whatever), and a special "pass" used to keep your data secure. This "pass" wouldn't be stored in your database.
When your client log in your application I would have to provide user/password/pass. The user/password is checked with the database, and the pass would be used to load/write data.
When you need to write data, you make a hash of your "username/pass" couple, and store it as a key linking your client to your data.
When you need to load data, you make a hash of your "username/pass" couple, and load every data matching this hash.
This way it's impossible to make a link between your data and your user.
In another hand, (as I said in a comment to #no) beware of collisions. Plus if your user write a bad "pass" you can't check it.
Update : For the last part, I had another idea, you can store in your database a hash of your "pass/password" couple, this way you can check if your "pass" is okay.
Create a users table with:
user_id: an identity column (auto-generated id)
username
password: make sure it's hashed!
Create a product table like in your example:
user_hash
item
price
The user_hash will be based off of user_id which never changes. Username and password are free to change as needed. When the user logs in, you compare username/password to get the user_id. You can send the user_hash back to the client for the duration of the session, or an encrypted/indirect version of the hash (could be a session ID, where the server stores the user_hash in the session).
Now you need a way to hash the user_id into user_hash and keep it protected.
If you do it client-side as #no suggested, the client needs to have user_id. Big security hole (especially if it's a web app), hash can be easily be tampered with and algorithm is freely available to the public.
You could have it as a function in the database. Bad idea, since the database has all the pieces to link the records.
For web sites or client/server apps you could have it on your server-side code. Much better, but then one developer has access to the hashing algorithm and data.
Have another developer write the hashing algorithm (which you don't have access to) and stick in on another server (which you also don't have access to) as a TCP/web service. Your server-side code would then pass the user ID and get a hash back. You wouldn't have the algorithm, but you can send all the user IDs through to get all their hashes back. Not a lot of benefits to #3, though the service could have logging and such to try to minimize the risk.
If it's simply a client-database app, you only have choices #1 and 2. I would strongly suggest adding another [business] layer that is server-side, separate from the database server.
Edit:
This overlaps some of the previous points. Have 3 servers:
Authentication server: Employee A has access. Maintains user table. Has web service (with encrypted communications) that takes user/password combination. Hashes password, looks up user_id in table, generates user_hash. This way you can't simply send all user_ids and get back the hashes. You have to have the password which isn't stored anywhere and is only available during authentication process.
Main database server: Employee B has access. Only stores user_hash. No userid, no passwords. You can link the data using the user_hash, but the actual user info is somewhere else.
Website server: Employee B has access. Gets login info, passes to authentication server, gets hash back, then disposes login info. Keeps hash in session for writing/querying to the database.
So Employee A has user_id, username, password and algorithm. Employee B has user_hash and data. Unless employee B modifies the website to store the raw user/password, he has no way of linking to the real users.
Using SQL profiling, Employee A would get user_id, username and password hash (since user_hash is generated later in code). Employee B would get user_hash and data.
Keep in mind that even without actually storing the person's identifying information anywhere, merely associating enough information all with the same key could allow you to figure out the identity of the person associated with certain information. For a simple example, you could call up the strip club and ask which customer drove a Ferrari.
For this reason, when you de-identify medical records (for use in research and such), you have to remove birthdays for people over 89 years old (because people that old are rare enough that a specific birthdate could point to a single person) and remove any geographic coding that specifies an area containing fewer than 20,000 people. (See http://privacy.med.miami.edu/glossary/xd_deidentified_health_info.htm)
AOL found out the hard way when they released search data that people can be identified just by knowing what searches are associated with an anonymous person. (See http://www.fi.muni.cz/kd/events/cikhaj-2007-jan/slides/kumpost.pdf)
The only way to ensure that the data can't be connected to the person it belongs to is to not record the identity information in the first place (make everything anonymous). Doing this, however, would most likely make your app pointless. You can make this more difficult to do, but you can't make it impossible.
Storing user data and identifying information in separate databases (and possibly on separate servers) and linking the two with an ID number is probably the closest thing that you can do. This way, you have isolated the two data sets as much as possible. You still must retain that ID number as a link between them; otherwise, you would be unable to retrieve a user's data.
In addition, I wouldn't recommend using a hashed password as a unique identifier. When a user changes their password, you would then have to go through and update all of your databases to replace the old hashed password IDs with the new ones. It is usually much easier to use a unique ID that is not based on any of the user's information (to help ensure that it will stay static).
This ends up being a social problem, not a technological problem. The best solutions will be a social solution. After hardening your systems to guard against unauthorized access (hackers, etc), you will probably get better mileage working on establishing trust with your users and implementing a system of policies and procedures regarding data security. Include specific penalties for employees who misuse customer information. Since a single breach of customer trust is enough to ruin your reputation and drive all of your users away, the temptation of misusing this data by those with "top-level" access is less than you might think (since the collapse of the company usually outweighs any gain).
The problem is that if someone already has full access to the database then it's just a matter of time before they link up the records to particular people. Somewhere in your database (or in the application itself) you will have to make the relation between the user and the items. If someone has full access, then they will have access to that mechanism.
There is absolutely no way of preventing this.
The reality is that by having full access we are in a position of trust. This means that the company managers have to trust that even though you can see the data, you will not act in any way on it. This is where little things like ethics come into play.
Now, that said, a lot of companies separate the development and production staff. The purpose is to remove Development from having direct contact with live (ie:real) data. This has a number of advantages with security and data reliability being at the top of the heap.
The only real drawback is that some developers believe they can't troubleshoot a problem without production access. However, this is simply not true.
Production staff then would be the only ones with access to the live servers. They will typically be vetted to a larger degree (criminal history and other background checks) that is commiserate with the type of data you have to protect.
The point of all this is that this is a personnel problem; and not one that can truly be solved with technical means.
UPDATE
Others here seem to be missing a very important and vital piece of the puzzle. Namely, that the data is being entered into the system for a reason. That reason is almost universally so that it can be shared. In the case of an expense report, that data is entered so that accounting can know who to pay back.
Which means that the system, at some level, will have to match users and items without the data entry person (ie: a salesperson) being logged in.
And because that data has to be tied together without all parties involved standing there to type in a security code to "release" the data, then a DBA will absolutely be able to review the query logs to figure out who is who. And very easily I might add regardless of how many hash marks you want to throw into it. Triple DES won't save you either.
At the end of the day all you've done is make development harder with absolutely zero security benefit. I can't emphasize this enough: the only way to hide data from a dba would be for either 1. that data to only be accessible by the very person who entered it or 2. for it to not exist in the first place.
Regarding option 1, if the only person who can ever access it is the person who entered it.. well, there is no point for it to be in a corporate database.
It seems like you're right on track with this, but you're just over thinking it (or I simply don't understand it)
Write a function that builds a new string based on the input (which will be their username or something else that cant change overtime)
Use the returned string as a salt when building the user hash (again I would use the userID or username as an input for the hash builder because they wont change like the users' password or email)
Associate all user actions with the user hash.
No one with only database access can determine what the hell the user hashes mean. Even an attempt at brute forcing it by trying different seed, salt combinations will end up useless because the salt is determined as a variant of the username.
I think you've answered you own question with your initial post.
Actually, there's a way you could possibly do what you're talking about...
You could have the user type his name and password into a form that runs a purely client-side script which generates a hash based on the name and pw. That hash is used as a unique id for the user, and is sent to the server. This way the server only knows the user by hash, not by name.
For this to work, though, the hash would have to be different from the normal password hash, and the user would be required to enter their name / password an additional time before the server would have any 'memory' of what that person bought.
The server could remember what the person bought for the duration of their session and then 'forget', because the database would contain no link between the user accounts and the sensitive info.
edit
In response to those who say hashing on the client is a security risk: It's not if you do it right. It should be assumed that a hash algorithm is known or knowable. To say otherwise amounts to "security through obscurity." Hashing doesn't involve any private keys, and dynamic hashes could be used to prevent tampering.
For example, you take a hash generator like this:
http://baagoe.com/en/RandomMusings/javascript/Mash.js
// From http://baagoe.com/en/RandomMusings/javascript/
// Johannes Baagoe <baagoe#baagoe.com>, 2010
function Mash() {
var n = 0xefc8249d;
var mash = function(data) {
data = data.toString();
for (var i = 0; i < data.length; i++) {
n += data.charCodeAt(i);
var h = 0.02519603282416938 * n;
n = h >>> 0;
h -= n;
h *= n;
n = h >>> 0;
h -= n;
n += h * 0x100000000; // 2^32
}
return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
};
mash.version = 'Mash 0.9';
return mash;
}
See how n changes, each time you hash a string you get something different.
Hash the username+password using a normal hash algo. This will be the same as the key of the 'secret' table in the database, but will match nothing else in the database.
Append the hashed pass to the username and hash it with the above algorithm.
Base-16 encode var n and append it in the original hash with a delimiter character.
This will create a unique hash (will be different each time) which can be checked by the system against each column in the database. The system can be set up be allow a particular unique hash only once (say, once a year), preventing MITM attacks, and none of the user's information is passed across the wire. Unless I'm missing something, there is nothing insecure about this.