How to generate one-time-use links? Any CMS or framework solutions? - content-management-system

I'm making a site for a writers management company. They get tons of script submissions every day from prospective and often unsolicited writers. The new site will allow a prospective writer to submit a short logline / sample of his or her idea. This idea gets sent to an email account at the management group. If the management group likes what they see, they want to be able to approve that submission from within the email and have a unique link dispatched to the submitter to upload their full script. This link would either only work once, or only for a certain amount of time so that only the intended recipient could use it.
So, can anyone point me in the direction of some sort of (I'm assumine PHP + mySQL) CMS or framework that could accomplish this? I've searched a lot, but I can't seem to figure out the right way to phrase this query to a search engine.
I have moderate programming experience, but not much with PHP outside of some simple Wordpress hacks.
Thanks!

I will just give you general guidelines on a simple way to construct such a system.
I assume that the Writer is somehow Registered into the system, and his/her profile contains a valid mail address.
So, when he submits the sample, you would create an entry on the "Sample" table. Then you would mail a Manager with the sample and a link. This link would point to a script giving the database "id" of the sample as a parameter (this script should verify that the manager is logged on -- if not, show the login screen and after successful login redirect him back).
This script would then be aware of the Manager's intention to allow the Writer to submit his work. Now the fun begins.
There are many possibilities:
You can create an entry in an appropriate "SubmitAuthorizations" DB table containing the id of the Writer and the date this authorization was given (ie, the date when the row was added to your DB). Then you simply send a mail to the Writer with a link like "upload.php?id=42", where the id is the authorization id. This script would check if the logged user is the correct Writer, and if he is within the allowed timeframe (by comparing the stored "authorization date" and the current date).
The next is the one I prefer: without a special table just for handling something trivial (let's say you will never want to "edit" an authorization, nor "cancel" it, but it may still "expire"). You simply simply give the Writer a link with 2 parameters: the date the authorization was given and an authorization key, like: "upload.php?authDate=20091030&key=87a62d726ef7..."
Let me explain how it works.
The script would first verify if the Writer is logged on (if not, show the login page with a redirection after successful login).
So, now it's time to validate the request: that is, check if this is not a "forged" link. How to do this? It's just a "smart" way of construction this authorization key.
You can do something like:
key = hash(concat(userId, ";", authDate, ";", seed));
Well, here hash() is what we call a "one-way function", like MD5, SHA1, etc. Then concat() is simply a string concatenation function. Finally seed is something like a "master password", completely random and that will not change (for if you change it all the issued links would stop working) just to increase security -- let's say a hacker correctly guesses you are using MD5 (which is easy) and the he tries to hack your system by hashing some combinations of the username and the date.
Also, for a request to be valid, it must be in the correct time frame.
So, if both the key is valid, and the date is within the time frame, you are able to accept an upload.
Some points to note:
This is a very simple system, but might be exactly what you need.
You should avoid MD5 for the hashing function, take something like SHA1 instead.
For the link sent to the Writer, you could "obfuscate" the parameter names, ie, call them "k" for the "key" and "d" for the "authDate".
For the date, you could chose another format, more "cryptic", like the unix epoch.
Finally, you can encode the parameters with something like "base64" (or simply apply some character replacing function like rot13 for instance, but that take digits into account aswell) just in order to make them more difficult to guessing
Just for completeness, in the validation script you can also check if the Writer has already sent a file on the time frame, thus making it impossible to him to send many files within the time frame.
I have recently implemented something like this twice on the company I work for, for two completely different uses. Once you get the idea, it is extremelly simple to implement it -- maybe less than 10 lines of code for the whole key-generation and validation process.
On one of them, the agent equivalent to your Writer had no account into the system (actually it would be his first contact with the system) -- there was only his "profile" on the system, managed by someone else. In this case, you would have to include the "Writer"'s id on the parameters to the "Upload" script aswell.
I hope this helps, and that it was clear enough. If I find the time, I will blog about it with an working example on some language.

Related

What is the best way to import data into holochain from another source, like mongo?

MongoDB => Holochain Rust DHT
How to import, if possible
If I am using a different app backend, like mongo, and I get my holochain set up correctly and configured, is there a way to get the data from mongo to holochain? How would I do that?
Here is the question in context
Definitely technologically possible; you could write a nodejs script, fire up a Holochain container with the holochain-nodejs library, and import all the data as one agent. Then when users join the HC-based network, they vouch for their identity in some way and 'claim' all the data as theirs.
Here's a sketch of how it could look:
you (let's call you 'agent 0') import all the data.
For each user, you create an 'anchor' with the user's ID (I'll explain anchors in a
sec) and link each piece of data to the anchor.
You also record that
user's password hash as a private entry on your own source chain. A
user joins the network and is required to prove continuity of
identity.
They do this by using node-to-node messaging to send their
user ID and their password hash to you privately. You authorise them
to claim their identity by publishing an entry that says that "agent
public key x = user ID". (You would probably want to link from your
authorisation entry to their user ID anchor and their public key too,
for convenience's sake.)
The user collects all their data by asking
for all the links to their user ID anchor.
The user then publishes
each piece of their data to their own source chain as a way of
'claiming' ownership of it.
Now, every redundant copy of the data in
the DHT has two authors in its metadata fields -- you and the user
that actually owns the data. Peers validate that piece of data by
saying, "Is agent 0 already the author of this piece of data?
If so,
has agent 0 published an authorisation entry that says that the new
author of this data is allowed to claim/republish it?"
Problems with this approach (not insurmountable):
Agent 0 has to be online all the time cuz they never know when a new
user is going to sign up and try to claim their data. Agent 0 has to
import a ton of data. (I don't think it'd be vastly
time-prohibitive though)
For relational data, there's the chicken-and-egg problem of how to
create links if the data doesn't exist. I'm thinking not of linking
data to data -- that can be done on initial import -- but linking
data to humans, who now have a public key which might not exist on
the DHT yet because they haven't joined the network. That would
always have to happen per-user once they join, and it could create
some cyclic dependency problems.
Anchors
Re: anchors, an anchor is just a pattern that consists of a base and a link -- the base is a simple string, so it's easy for anyone who knows the string to find it by hash. It acts as, well, an anchor to hang links off of. That's why I'm recommending using it to connect legacy user IDs to pieces of content. You can get sample source code for implementing the anchor pattern at https://github.com/holochain/mixins/tree/master/anchors (note that this is for the legacy version of Holochain, so it's written in JavaScript).
( answer provided by
pauldaoust )

CQRS - When a command cannot resolve to a domain

I'm trying to wrap my head around CQRS. I'm drawing from the code example provided here. Please be gentle I'm very new to this pattern.
I'm looking at a logon scenario. I like this scenario because it's not really demonstrated in any examples i've read. In this case I do not know what the aggregate id of the user is or even if there is one as all I start with is a username and password.
In the fohjin example events are always fired from the domain (if needed) and the command handler calls some method on the domain. However if a user logon is invalid I have no domain to call anything on. Also most, if not all of the base Command/Event classes defined in the fohjin project pass around an aggregate id.
In the case of the event LogonFailure I may want to update a LogonAudit report.
So my question is: how to handle commands that do not resolve to a particular aggregate? How would that flow?
public void Execute(UserLogonCommand command)
{
var user = null;//user looked up by username somehow, should i query the report database to resolve the username to an id?
if (user == null || user.Password != command.Password)
;//What to do here? I want to raise an event somehow that doesn't target a specific user
else
user.LogonSuccessful();
}
You should take into account that it most cases CQRS and DDD is suitable just for some parts of the system. It is very uncommon to model entire system with CQRS concepts - it fits best to the parts with complex business domain and I wouldn't call logging user in a particularly complex business scenario. In fact, in most cases it's not business-related at all. The actual business domain starts when user is already identified.
Another thing to remember is that due to eventual consistency it is extremely beneficial to check as much as we can using only query-side, without event creating any commands/events.
Assuming however, that the information about successful / failed user log-ins is meaningful I'd model your scenario with following steps
User provides name and password
Name/password is validated against some kind of query database
When provided credentials are valid RegisterValidUserCommand(userId) is executed which results in proper event
If provided credentials are not valid
RegisterInvalidCredentialsCommand(providedUserName) is executed which results in proper event
The point is that checking user credentials is not necessarily part of business domain.
That said, there is another related concept, in which not every command or event needs to be business - related, thus it is possible to handle events that don't need aggregates to be loaded.
For example you want to change data that is informational-only and in no way affects business concepts of your system, like information about person's sex (once again, assuming that it has no business meaning).
In that case when you handle SetPersonSexCommand there's actually no need to load aggregate as that information doesn't even have to be located on entities, instead you create PersonSexSetEvent, register it, and publish so the query side could project it to the screen/raport.

Creation Concurrency with CQRS and EventStore

Baseline info:
I'm using an external OAuth provider for login. If the user logs into the external OAuth, they are OK to enter my system. However this user may not yet exist in my system. It's not really a technology issue, but I'm using JOliver EventStore for what it's worth.
Logic:
I'm not given a guid for new users. I just have an email address.
I check my read model before sending a command, if the user email
exists, I issue a Login command with the ID, if not I issue a
CreateUser command with a generated ID. My issue is in the case of a new user.
A save occurs in the event store with the new ID.
Issue:
Assume two create commands are somehow issued before the read model is updated due to browser refresh or some other anomaly that occurs before consistency with the read model is achieved. That's OK that's not my problem.
What Happens:
Because the new ID is a Guid comb, there's no chance the event store will know that these two CreateUser commands represent the same user. By the time they get to the read model, the read model will know (because they have the same email) and can merge the two records or take some other compensating action. But now my read model is out of sync with the event store which still thinks these are two separate entities.
Perhaps it doesn't matter because:
Replaying the events will have the same effect on the read model
so that should be OK.
Because both commands are duplicate "Create" commands, they should contain identical information, so it's not like I'm losing anything in the event store.
Can anybody illuminate how they handled similar issues? If some compensating action needs to occur does the read model service issue some kind of compensation command when it realizes it's got a duplicate entry? Is there a simpler methodology I'm not considering?
You're very close to what I'd consider a proper possible solution. The scenario, if I may summarize, is somewhat like this:
Perform the OAuth-entication.
Using the read model decide between a recurring visitor and a new visitor, based on the email address.
In case of a new visitor, send a RegisterNewVisitor command message that gets handled and stored in the eventstore.
Assume there is some concurrency going on that, for the same email address, causes two RegisterNewVisitor messages, each containing what the system thinks is the key associated with the email address. These keys (guids) are different.
Detect this duplicate key issue in the read model and merge both read model records into one record.
Now instead of merging the records in the read model, why not send a ResolveDuplicateVisitorEmailAddress { Key1, Key2 } towards your domain model, leaving it up to the domain model (the codified form of the business decision to be taken) to resolve this issue. You could even have a dedicated read model to deal with these kind of issues, the other read model will just get a kind of DuplicateVisitorEmailAddressResolved event, and project it into the proper records.
Word of warning: You've asked a technical question and I gave you a technical, possible solution. In general, I would not apply this technique unless I had some business indicator that this is worth investing in (what's the frequency of a user logging in concurrently for the first time - maybe solving it this way is just a way of ignoring the root cause (flakey OAuth, no register new visitor process in place, etc)). There are other technical solutions to this problem but I wanted to give you the one closest to what you already have in place. They range from registering new visitors sequentially to keeping an in-memory projection of the visitors not yet in the read model.

GWT: Pragmatic unlocking of an entity

I have a GWT (+GAE) webapp that allows users to edit Customer entities. When a user starts editing, the lockedByUser attribute is set on the Customer entity. When the user finishes editing the Customer, the lockedByUser attribute is cleared.
No Customer entity can be modified by 2 users at the same time. If a user tries to open the Customer screen which is already opened by a different user, he get's a "Customer XYZ is being modified by user ABC".
The question is what is the most pragmatic and robust way to handle the case where the user forcefully closes the browser and hence the lockedByUser attribute is not cleared.
My first thought is a timer on the user side that would update the lockRefreshedTime each 30 seconds or so. A different user trying to modify the Customer would then look at the lockRefreshedTime and if the if the refresh happened more then say 35 seconds ago, it would acquire the lock by setting the lockedByUser and updating the lockRefreshedTime.
Thanks,
Matyas
FWIW, your lock with expiry approach is the one used by WebDAV (and implemented in tools like Microsoft Word, for instance).
To cope for network latency, you should renew your lock at least half-way through the lock lifetime (e.g. the lock expires after 2 minutes, and you renew it every minute).
Have a look there for much more details on how clients and servers should behave: https://www.rfc-editor.org/rfc/rfc4918#section-6 (note that, for example, they always assume failure is possible: "a client MUST NOT assume that just because the timeout has not expired, the lock still exists"; see https://www.rfc-editor.org/rfc/rfc4918#section-6.6 )
Another approach is to have an explicit lock/unlock flow, rather than an implicit one.
Alternatively, you could allow several users to update the customer at the same time, using a "one field at a time" approach: send an RPC to update a specific field on each ValueChangeEvent on that field. Handling conflicts (another user has updated the field) is then made a bit easier, or could be simply ignored: if user A changed the customers address from "foo" to "bar", it really means to set "bar" in the field, not to change _from "foo" to "bar", so if the actual value on the server has already be updated by user B from "foo" to "baz", that wouldn't be a problem, user A would have probably still set the value to "bar", changing it from "foo" or from "baz" doesn't really matter.
Using a per-field approach, "implicit locks" (the time it takes to edit and send the changes to the server) are much shorter, because they're reduced to a single field.
The "challenge" then is to update the form in near real-time when another user saved a change to the edited customer; or you could choose to not do that (not try to do it in near real-time).
The way to go is this:
Execute code on window close in GWT
You have to ask the user to confirm to really close the window in edit mode.
If the user really wants to exit you can then send an unlock call.

How to separate a person's identity from his personal data?

I'm writing an app which main purpose is to keep list of users
purchases.
I would like to ensure that even I as a developer (or anyone with full
access to the database) could not figure out how much money a
particular person has spent or what he has bought.
I initially came up with the following scheme:
--------------+------------+-----------
user_hash | item | price
--------------+------------+-----------
a45cd654fe810 | Strip club | 400.00
a45cd654fe810 | Ferrari | 1510800.00
54da2241211c2 | Beer | 5.00
54da2241211c2 | iPhone | 399.00
User logs in with username and password.
From the password calculate user_hash (possibly with salting etc.).
Use the hash to access users data with normal SQL-queries.
Given enough users, it should be almost impossible to tell how much
money a particular user has spent by just knowing his name.
Is this a sensible thing to do, or am I completely foolish?
I'm afraid that if your application can link a person to its data, any developer/admin can.
The only thing you can do is making it harder to do the link, to slow the developer/admin, but if you make it harder to link users to data, you will make it harder for your server too.
Idea based on #no idea :
You can have a classic user/password login to your application (hashed password, or whatever), and a special "pass" used to keep your data secure. This "pass" wouldn't be stored in your database.
When your client log in your application I would have to provide user/password/pass. The user/password is checked with the database, and the pass would be used to load/write data.
When you need to write data, you make a hash of your "username/pass" couple, and store it as a key linking your client to your data.
When you need to load data, you make a hash of your "username/pass" couple, and load every data matching this hash.
This way it's impossible to make a link between your data and your user.
In another hand, (as I said in a comment to #no) beware of collisions. Plus if your user write a bad "pass" you can't check it.
Update : For the last part, I had another idea, you can store in your database a hash of your "pass/password" couple, this way you can check if your "pass" is okay.
Create a users table with:
user_id: an identity column (auto-generated id)
username
password: make sure it's hashed!
Create a product table like in your example:
user_hash
item
price
The user_hash will be based off of user_id which never changes. Username and password are free to change as needed. When the user logs in, you compare username/password to get the user_id. You can send the user_hash back to the client for the duration of the session, or an encrypted/indirect version of the hash (could be a session ID, where the server stores the user_hash in the session).
Now you need a way to hash the user_id into user_hash and keep it protected.
If you do it client-side as #no suggested, the client needs to have user_id. Big security hole (especially if it's a web app), hash can be easily be tampered with and algorithm is freely available to the public.
You could have it as a function in the database. Bad idea, since the database has all the pieces to link the records.
For web sites or client/server apps you could have it on your server-side code. Much better, but then one developer has access to the hashing algorithm and data.
Have another developer write the hashing algorithm (which you don't have access to) and stick in on another server (which you also don't have access to) as a TCP/web service. Your server-side code would then pass the user ID and get a hash back. You wouldn't have the algorithm, but you can send all the user IDs through to get all their hashes back. Not a lot of benefits to #3, though the service could have logging and such to try to minimize the risk.
If it's simply a client-database app, you only have choices #1 and 2. I would strongly suggest adding another [business] layer that is server-side, separate from the database server.
Edit:
This overlaps some of the previous points. Have 3 servers:
Authentication server: Employee A has access. Maintains user table. Has web service (with encrypted communications) that takes user/password combination. Hashes password, looks up user_id in table, generates user_hash. This way you can't simply send all user_ids and get back the hashes. You have to have the password which isn't stored anywhere and is only available during authentication process.
Main database server: Employee B has access. Only stores user_hash. No userid, no passwords. You can link the data using the user_hash, but the actual user info is somewhere else.
Website server: Employee B has access. Gets login info, passes to authentication server, gets hash back, then disposes login info. Keeps hash in session for writing/querying to the database.
So Employee A has user_id, username, password and algorithm. Employee B has user_hash and data. Unless employee B modifies the website to store the raw user/password, he has no way of linking to the real users.
Using SQL profiling, Employee A would get user_id, username and password hash (since user_hash is generated later in code). Employee B would get user_hash and data.
Keep in mind that even without actually storing the person's identifying information anywhere, merely associating enough information all with the same key could allow you to figure out the identity of the person associated with certain information. For a simple example, you could call up the strip club and ask which customer drove a Ferrari.
For this reason, when you de-identify medical records (for use in research and such), you have to remove birthdays for people over 89 years old (because people that old are rare enough that a specific birthdate could point to a single person) and remove any geographic coding that specifies an area containing fewer than 20,000 people. (See http://privacy.med.miami.edu/glossary/xd_deidentified_health_info.htm)
AOL found out the hard way when they released search data that people can be identified just by knowing what searches are associated with an anonymous person. (See http://www.fi.muni.cz/kd/events/cikhaj-2007-jan/slides/kumpost.pdf)
The only way to ensure that the data can't be connected to the person it belongs to is to not record the identity information in the first place (make everything anonymous). Doing this, however, would most likely make your app pointless. You can make this more difficult to do, but you can't make it impossible.
Storing user data and identifying information in separate databases (and possibly on separate servers) and linking the two with an ID number is probably the closest thing that you can do. This way, you have isolated the two data sets as much as possible. You still must retain that ID number as a link between them; otherwise, you would be unable to retrieve a user's data.
In addition, I wouldn't recommend using a hashed password as a unique identifier. When a user changes their password, you would then have to go through and update all of your databases to replace the old hashed password IDs with the new ones. It is usually much easier to use a unique ID that is not based on any of the user's information (to help ensure that it will stay static).
This ends up being a social problem, not a technological problem. The best solutions will be a social solution. After hardening your systems to guard against unauthorized access (hackers, etc), you will probably get better mileage working on establishing trust with your users and implementing a system of policies and procedures regarding data security. Include specific penalties for employees who misuse customer information. Since a single breach of customer trust is enough to ruin your reputation and drive all of your users away, the temptation of misusing this data by those with "top-level" access is less than you might think (since the collapse of the company usually outweighs any gain).
The problem is that if someone already has full access to the database then it's just a matter of time before they link up the records to particular people. Somewhere in your database (or in the application itself) you will have to make the relation between the user and the items. If someone has full access, then they will have access to that mechanism.
There is absolutely no way of preventing this.
The reality is that by having full access we are in a position of trust. This means that the company managers have to trust that even though you can see the data, you will not act in any way on it. This is where little things like ethics come into play.
Now, that said, a lot of companies separate the development and production staff. The purpose is to remove Development from having direct contact with live (ie:real) data. This has a number of advantages with security and data reliability being at the top of the heap.
The only real drawback is that some developers believe they can't troubleshoot a problem without production access. However, this is simply not true.
Production staff then would be the only ones with access to the live servers. They will typically be vetted to a larger degree (criminal history and other background checks) that is commiserate with the type of data you have to protect.
The point of all this is that this is a personnel problem; and not one that can truly be solved with technical means.
UPDATE
Others here seem to be missing a very important and vital piece of the puzzle. Namely, that the data is being entered into the system for a reason. That reason is almost universally so that it can be shared. In the case of an expense report, that data is entered so that accounting can know who to pay back.
Which means that the system, at some level, will have to match users and items without the data entry person (ie: a salesperson) being logged in.
And because that data has to be tied together without all parties involved standing there to type in a security code to "release" the data, then a DBA will absolutely be able to review the query logs to figure out who is who. And very easily I might add regardless of how many hash marks you want to throw into it. Triple DES won't save you either.
At the end of the day all you've done is make development harder with absolutely zero security benefit. I can't emphasize this enough: the only way to hide data from a dba would be for either 1. that data to only be accessible by the very person who entered it or 2. for it to not exist in the first place.
Regarding option 1, if the only person who can ever access it is the person who entered it.. well, there is no point for it to be in a corporate database.
It seems like you're right on track with this, but you're just over thinking it (or I simply don't understand it)
Write a function that builds a new string based on the input (which will be their username or something else that cant change overtime)
Use the returned string as a salt when building the user hash (again I would use the userID or username as an input for the hash builder because they wont change like the users' password or email)
Associate all user actions with the user hash.
No one with only database access can determine what the hell the user hashes mean. Even an attempt at brute forcing it by trying different seed, salt combinations will end up useless because the salt is determined as a variant of the username.
I think you've answered you own question with your initial post.
Actually, there's a way you could possibly do what you're talking about...
You could have the user type his name and password into a form that runs a purely client-side script which generates a hash based on the name and pw. That hash is used as a unique id for the user, and is sent to the server. This way the server only knows the user by hash, not by name.
For this to work, though, the hash would have to be different from the normal password hash, and the user would be required to enter their name / password an additional time before the server would have any 'memory' of what that person bought.
The server could remember what the person bought for the duration of their session and then 'forget', because the database would contain no link between the user accounts and the sensitive info.
edit
In response to those who say hashing on the client is a security risk: It's not if you do it right. It should be assumed that a hash algorithm is known or knowable. To say otherwise amounts to "security through obscurity." Hashing doesn't involve any private keys, and dynamic hashes could be used to prevent tampering.
For example, you take a hash generator like this:
http://baagoe.com/en/RandomMusings/javascript/Mash.js
// From http://baagoe.com/en/RandomMusings/javascript/
// Johannes Baagoe <baagoe#baagoe.com>, 2010
function Mash() {
var n = 0xefc8249d;
var mash = function(data) {
data = data.toString();
for (var i = 0; i < data.length; i++) {
n += data.charCodeAt(i);
var h = 0.02519603282416938 * n;
n = h >>> 0;
h -= n;
h *= n;
n = h >>> 0;
h -= n;
n += h * 0x100000000; // 2^32
}
return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
};
mash.version = 'Mash 0.9';
return mash;
}
See how n changes, each time you hash a string you get something different.
Hash the username+password using a normal hash algo. This will be the same as the key of the 'secret' table in the database, but will match nothing else in the database.
Append the hashed pass to the username and hash it with the above algorithm.
Base-16 encode var n and append it in the original hash with a delimiter character.
This will create a unique hash (will be different each time) which can be checked by the system against each column in the database. The system can be set up be allow a particular unique hash only once (say, once a year), preventing MITM attacks, and none of the user's information is passed across the wire. Unless I'm missing something, there is nothing insecure about this.