It is possible to manage users/identities in a data store that exhibits eventual consistency? - nosql

Is it possible to create/store user accounts in a data store that exhibits eventual consistency?
It seems impossible to manage account creation without a heap of architectural complexity to avoid situations where two account with the same UID (e.g. email address) can occur?
Do users of eventual consistency stores use a separate consistent DB as an identity store, or are there solutions/patterns that I should be exploring?
Thanks in advance,
Jamie

It is possible to do use management in an eventually consistent data store. We do it. It works under the following assumptions:
Conflicts shouldn't happen and when they do there's a clear path to conflict resolution. If the account ID is a person's email address, then if two separate people try to register under the same email there's a bigger problem here. What we do in this case is block both new accounts as soon as the conflict is discovered and send an email to the address in conflict explaining to the user that there's an issue (possible fraud). You can either ask the user to reset to the account or ask them to contact support.
Repeated access by the same user within the timeframe in which the data is inconsistent go to the same replica. For instance, if a person just registered and the next request is a login, you must validate that login against the data replica where the new registration details exist. So if the eventual consistency is due to multiple data centers in different geographic locations and under normal conditions a request goes to the closest data center geographically, you're OK.
There are some edge cases, such as if a user registered against one data center, then that center crashed, and now the user cannot login even though he still can see the application - served from some other data center. You can calculate the expected frequency of this case based on your number of daily new users and average data center downtime. Then decide whether it's worth worrying about one user in a (million/billion/whatever your number is) having a problem and possibly contacting support. I faced the same decision not long ago and decided that from a cost-benefit perspective the answer is no.

Related

How to Implement "Your contact just joined app" feature

I am building a mobile app (in flutter firebase, but the answer does not have to be firebase specific). And I would like to implement a feature that notifies users whenever anyone from their contact list joins the app. This seems like a very EXPENSIVE feature.
At the top of my head, I see a lambda/cloud function that is triggered everytime a user joins and then searches a database of users and their respective contacts for the existence of the new user's phone number. To me, this solution does not scale well for two reasons: if the number of total users is in the millions and the number of users joining simultaneously is a lot.
My better solution is to get the user's contacts upon joining and then searching a database of current users contacts for any of the phone numbers of the newly joined user.
Is there a solution better than the second one? If so, what is it? If the second solution is standard, what kind of backend storage mechanism provides the best search and retrieval time for a database of users and their respective contacts?
In the case of large users ill not do first solution because that may slow the sign up process instead i will creat a cron job that runs at a specific time or periodically it will get the list of the latest users signed up that day or that hour whatever you prefer then that cron will check the new user if related to any user in the databases and send a notification right away, or a better solution create a temporary table in a database in another server insert the notification informations into the other server, creat another cron job in the second server it will run at a specific time to sendthe notification

I'm trying to find a way to identify deleted user accounts from my system based on their email address without violating gdpr / privacy

My business recruits people for focus groups and one of our main selling points is that we ensure that recruits don't see the same researcher more than once.
Often we will be given customer lists from our clients where a condition of the job is that we delete the user records at the end of the project. Whilst we are able to keep the associated data that ties them to a project (for business stats etc.) we need to remove identifying & contact information -> Email address and phone numbers being how we identify a specific persons account.
My issue is:
What can i do to ensure that, if these deleted users show up in my system again, that I can identify their association with old projects / focus groups, so that we can prevent these deleted users from signing up again and being placed in a focus group with a researcher they have already seen.
My first thought was, upon "deletion", to hash their email address and remove the plaintext address, and check this hashed address against new accounts, to link their old db associations with this new account.
I am fairly new to security / privacy concepts, so I'm not sure whether this would be secure, or if being able to identify the link to the old account is a violation of privacy.
You're on the right track here. Hashing the email address or phone number means that you've effectively put that data "beyond use". So long as you delete all the other data relating to it, it does not represent "personal data" in the GDPR sense.
Also consider the basis for processing – if you are legally obliged (either by statute or by the original contract with your users) to implement this suppression mechanism, then you would be permitted to do so even if it was personal data.
Note this email marketing industry body that recommends and promotes hash-based suppression lists, and this site suggesting the same.

What happens to data that it is returned by an API and ignore

I hope you are well, i am not a developer and i wanted to draw from the massive pool of expertise in here. I have an odd ish question that i can not accept the answer that i have been given as it does not add up from a security perspective.
the situation is that our API is passing a token with reference number for payment to a card payment provider which is Payment Card Industry Data Security Standard compliant, we do not want that responsibility hence we contract them. the customer enters all the details (name, card number etc etc) on the contractors site. They have a secure reporting portal that we do the reporting of daily transactions, refunds etc, so there is no need for us to have any data other than a reference number to marry it up with the token sent from us. It transpired earlier today that their API returns not only the token with the unique reference we need but the name, last 4 digits of card, address and other identifiable information, which we do not need or we want to have sight off.
The Contractor's reply was to and i quote "just ignore the data that it is return through the API and you do not need". I asked them what happens to that data a number of times and they did not provide a direct reply they just said other organisations use it that way with no issues...which as you expect have drove me absolutely bazurke.
i have found this 5 year old answer that says that disappears to the ether. I cant accept that data just disappears, insert GDPR concerns here.
What happens to unused function return values?
Apologies for the rant
TLDR: we sent token with unidentifiable personal information to card payment provider through API, Card Providers API returns name, card, address and other identifiable data. Card Providers response just ignore the information returned from the API you do not need.
thank you in advance for all your help.
So since you use a website to contact this API I will try to break down what is occurring.
You enter in a number on your website, which in turn becomes the key reference for the API call to the payment processor. The processor receives the ref number and grabs info pertaining to that number from their database. They then send this data as a response to your API call and the data is returned to the website. Now I am just speculating here but I am guessing your website does not do anything with this data, except display it. If this is the case the data is sitting in volatile memory, on the server the website is running on. Volatile memory (RAM) is memory that is not long lived, once space is needed it will be overwritten or if the system is turned off, it will be wiped immediately. Even when this data is in volatile memory it is only used in the context of your session on the website. Once you leave the page, there is no real way(easy anyway) to get that data back. It may still exist in RAM, but it is not accessible to anyone anymore and will be destroyed or overwritten once the server realizes it is not being used anymore.
There is a chance however that your website does save the API responses you get back in your own database. It sounds like this is not the case but I cannot be sure. But to ultimately answer the question, you can ignore this data and it is not very vulnerable or accessible to the outside world, so you don't have to worry about it getting into the wrong hands in this case. I hope this helps you some! Let me know if I can clarify anything for you further!
If no one uses the data, or looks at it, or stores it -- if it's just ignored -- then, yes, it disappears.
More specifically, in the computer that receives it, it's probably written into some space in volatile memory, and then the space is reused and overwritten the next time a response come in. Conceptually, at least.
It's possible that the receiving application has some kind of logs that are writing out data that is received, regardless of whether the app uses it or not, but other than that, without knowing what the app is doing, it's impossible to guess further.

Open REST API attached to a database- what stops a bad actor spamming my db?

I'm a client side developer with little experience of server side, and I'm struggling to understand how to make a database-backed website without requiring users to login.
The usecase is fairly straightforward. The user lands on a website, uploads an image, and performs some processing to that image. Clicking 'share' POSTs JSON to my endpoint, stores it in a DB, and returns a unique URL in a textbox (eg, https://example.com/art/12345) which allows the user to share their artwork with others, or just to come back and do more editing later on.
What stops somebody from doing, POST <data> https://example.com/art 100 million times and filling my pay-as-you-go database?
I've seen examples of this link based method of sharing between users on plenty of sites but I don't understand how to stop abuse, or whether it is safe to just open up an API which allows writes to a database. I do not want users to have to login.
I believe the simplest method is having a quota, either by username for logged in users or by IP, if you don't require logins or only want to allow free usage to a certain point. Perhaps you could have a smaller quota for non-logged in users than for logged in users and even larger for paying users.
Your server side code that handles the POSTS and storing data into the database would have to take care of that. I'd add it to a user_data table on mine, making an additional column that tracks total space used. makes a todo
Then, when the user adds new data, increase the total space used. When they delete old data (I have versioned web pages so that eventually, the user will be able to rollback to previous versions) then the space used decreases. Having another page to look at to see where they're using space makes deciding what to delete to stay under a quota of X MB's/GB's/TB's/etc easier or maybe just an /api/delete_old_pages or notes or comments or all of the above.

iphone app - preventing spam

I've developed an app that allows users to upload some photos and share them on Facebook/Dropbox/Twitter etc. Recently it went live in the app store.
However, I'm having a problem now: a bot is creating accounts and uploading many photos on my server. I've temporarily disabled the app, but now I'm looking for an efficient way to prevent this bot from doing this.
The bot's ip address is changing very often so it's impossible to block the ip. He creates accounts with a very realistic name and email address so it's hard to find out which users are real and which are created by the bot.
I was thinking of using a captcha, but I'm not sure if my app will be rejected by Apple if I implement this. I'm preferably looking for a way so I can prevent him from doing his work and so I don't have to resend the app to Apple again.
Could anyone give me some advice on what I could possibly do?
Thanks!
This is how I solved a similar problem:
I implemented a token-generator, which generates a one-time token for every single data transfer with the server, so even one for login-data, sending a file etc. This token is generated by a secret algorithm and can be verified server side, since you know how you generate one.
After one token is used, put it in a temporary list for the next X minutes/hours/days (depending on how many data transfers your server can handle). When a user tries to send data with a used token (i.e. the token matches one in the "banned" list), you can be sure that someone's trying to spam you -> mark the account as "spammer" and decide what you wish to do.
The algorithm must produce a different token each time (the best way would be a one-way hash), but you have to assure specific "properties", with which you can proof its authenticity.
So one very simple example:
Your algorithm in the client is generating a number between 1000000000000000000000 and 99999999999999999999999, this number is then multiplied with 12456564 and incremented by 20349.
The server becomes a specific command and data, and the generated token. Now it checks, whether (number - 20349)%12456564 is 0. If it's 0, it was likely generated by your "secret" algorithm.
It's a very basic example but you get the idea…