What are some patters for designing REST API for user-based platform in AWS? - rest

I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.

These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.

Related

Keycloak users - is a good idea to differentiate users by their country?

I'm designing a fairly complex backend and now I have a doubt. Is a good idea in Keycloak to differentiate users in different keycloak groups by their country when I create them during a sign-in for example?
I was thinking that it could be useful to better manage users in the future.
What do you think?
There is no direct solution for such question. It clearly depends on your application. If in the future your application will provide services based on the country of each user it might be good idea as your application might get this information about the user directly from Keycloak.
If you are planning to do some researches about your users it also might be good idea as some statistics might be country related or you would like to get country related outputs (to relocate your cloud instances near to majority of your users etc..)
There might be faster database lookups with such additional information but I don't know if Keycloak currently provides functionality for this. On the other hand, if I will sign up to your service while I am chilling on my holidays on the other side of the world from where I usually live your record will be useless. Therefore this action could bring more issues to implementation of your application while you might not need it at all.
If you have no plans for such functionalities there is simply no reason to do such thing. Present web services tend to store more data then they actually need to. For example in majority of recent database leaks you can see LAST geological coordination's point stored with each user. While these might be unnecessary for precise advertisements targeting and unnecessary users screening, there is really no reason to store last geological coordination of each user. Such information might change with each user login and should be determined in "runtime". If services do not benefit from such data users are under threat for no reason.
You should determine what is needed by your application and what is not. You should never store or expose any additional information's about your users regardless how well your application is secured.

REST: Get query only changeable objects

I'm having a bunch of apis which return several types of data.
All users can query all data by using a GET rest api.
A few users can also change data. What is a common approach when designing REST API, to query only the data that can be changed by the current user but still allow the api to return all data (for display mode).
To explain it further:
The software manages projects. All projects are accessible for all users (also anonymous) via an api (let's call it GET api/projects).
A user has the ability to see a list of all projects he is involved in and which he can edit.
The api should return exactly the same data but limited to the projects he is involed in.
Should I create an additonal parameter, or maybe pass an http header, or what else?
There's no one-size-fits-all answer to this, so I will give you one recommendation that works for some people.
I don't really like creating resources that have 'complex access control'. Instead, I would prefer to create distinct resources for different access levels.
If you want to return limit results for people that have partial access, it might be better to create new resources that reflect this.
I think this might also help a bit thinking about the abstract role of a person who is not allowed to do everything. The abstraction probably doesn't exist per-property, but it exists somewhere as a business rule.

How to store large user-specific data

So I'm in the middle of planning a little web app that will require quite large amounts of data stored on a user level, in one case, the system would take a large object from a system level and make a "user specific" version, a user can have multiple ones of these. Simplest would be to compare it to a form stored in a google spreadsheet, where the user is expected to use the template spreadsheet, then change not only the answers but also the question.
Security wise I am quite OK
In the second case there is requirement to store multiple objects, size about 250k to maybe 3mb, once again on a user specific level, with a potential to move it to a system level so additional users can access it. As an example, say the user can upload pictures, but may not want to share all of them. However, a user may choose to "publish" a small number of them because they are happy with those specific pictures.
What design patterns should I consider using specifically around web apps where the user have decent amounts of data? For example, would it make most sense to use a single large database and have a table that keeps track of resources or create separate tables per user?
I have considered putting it all in a mongo database.
Your approach may be wrong.
If you want to store user based binary data and make it accessible for the user itself or the community, you would need a hierarchic structure like so:
userid1
pic1,pic2,pic3
userid2
pic4,pic5,pic6
community
pic7,pic8
You could then grant read permissions to "community" for all users, and permission for each user to its own directory.
Usually there is nothing wrong using a database to store binary files if you consider partitioning, role permissions and an applicable interface to access the data.
My suggestion is to use a binary repository like Artifactory.
It provides hierarchic structures, simple search queries using HTTP requests and has caching abilities for frequently queried objects.
I also think that http requests are a lot easier to use and also there is an abstraction layer to the data which is more secure.
Artifactory is free.

REST stateless with database

A little backstory
I have to develop a web application for college. This web application has to do with managing different locations using google maps like pinning new locations adding custom descriptions and so on. The login part is done using facebook (login with facebook). The more interesting part would be that the queries (client-server) would have to be done by using REST.
The part that i try to understand
If i use a database to store my user's unique ID, their online status (online/offline) and somehow (didn't settle actually on the idea) to keep a JSON on the server that would contain each user's pinned locations, would all this actually be ok with the REST paradigm ?
I find mixed answers on the internet and i don't know how to think of the statelessness of the application correctly. A session would not be created but the credentials from the database would be necessary for the users to communicate with each other.
The other side of the question
Considering that i'm mistaken and i shouldn't use the database to store the credentials and locations like that, how am i supposed to keep all that data ? I'm thinking something like JSON cached client-side but what if my client changes the computer, wouldn't this mean that he loses all his data? (Also wouldn't this make MVC handicapped by not having a model?) How do i really keep track of all things.
You're making this way too hard on yourself, try to keep it simple since you probably have a deadline. REST is a way of using APIs with HTTP verbs like GET, POST, PUT, and DELETE. It says nothing about how to store the data behind your APIs.
As for storing the data, a database should be fine. Storing it as JSON in the db could work, but in the end you'll have to parse the json every time that you want to use it, so I would suggest that you store it in a DB in such a way that it can be read easily.
For a beginner (especially if you're doing this for a school project), I would definitely suggest that you set up a relational database like Microsoft SQL Database (Microsoft Stack), or a MySQL/PosGres Database (I think this is what they'd use in linux), but if you wanna skip the relational db approach (because it might not be all that "easy" to get going), you can always try a NoSQL database like MongoDB.
Relevant links to help:
http://rest.elkstein.org/ (REST explained)
http://www.restapitutorial.com/lessons/httpmethods.html (REST verbs)
http://en.wikipedia.org/wiki/Relational_database (what is a relational db)
http://en.wikipedia.org/wiki/Database_normalization (Kinda the goal of relational db.. but note you can go too far...http://lemire.me/blog/archives/2010/12/02/over-normalization-is-bad-for-you/)
http://www.mongodb.com/nosql-explained (NoSQL explanation)

neo4j - graph database along with a relational database?

this is a question on best practice, i understand that there are a lot of different options for doing this, but i would like your opinions as to how you would approach solving this problem. Please take it as though performance is critical in this system, in other words scalable.
I have recently found the wonders of graph database, so i came up with a theoretical situation where a company wants to manage it's customers relationships, and in order to do so they are going to use neo4j which is great, and allows for really great management of the customers, different staff members and their relationships, which is all great, however the company now wants to create a web based interface which will need authentication, and anyone in the neo4j database should be able to login to the system in order to see how they are related to other people in the company's database, so each user must have a password/email/id associated with their name.
So my question is, in this case scenario, is it best to store the password_hash/password_salt/id/email in a mysql database and then based on the node look it up on the mysql database. Or is it better to store the password_hash/password_salt/id/email in the hash tables inside the nodes.
Also each store has 1000s of products, and they can be stored in the graph database or i can store the products in the mysql database and then look up the product there, and do the changes there, because the products are not related to each other, so no point in storing them in the graph database, so should they be not stored there to improve performance?
So my question boils down to this: is it best for large projects to use a graph database along with the more common rdms database such as mysql? if not, then what is the point at which you start to use these two database systems?
apologies in advance for my lack of knowledge regarding database terminology.
Graph DB is mainly used for maintaining relations. If app has a graph DB that does not mean that app needs to store everything in Graph DB.
Every node request on Graph is in memory and thus if you have unnecessary properties in your node it will be bloated and may make things slower and take more memory.I usually decide what needs to go in graph and what needs to go in DB by very simple rule.
High level property (that defines the relation and other important properties that defines the node) goes in graph whereas additional information goes in RDMS.
For example in FB may be FBID, Name goes in Graph as it defines the relationship of one node with another. But when user clicks on someones facebook ID, he/she gets to see other users DOB, Age , College .All these can go in RDBMS.
PS: RDMS has another advantage, it can be used for quick analytics. I know with graph also you can do that but i am not sure if its as scalable and easy as RDBMS.
Downside to this approach is : You need to maintain two DBS.
Unless you have a proven case for a two-DB solution, I'd say fewer moving parts would keep you more agile, more able to change things quickly. If later you find a use case that is difficult, then weigh up the cost/ benefit of introducing a second storage. A two-DB architecture is not unheard of, but comes with an overhead.
Specific to security, there is no reason why Neo4j or any other reasonable NOSQL solution couldn't do that: http://spring.neo4j.org/docs#tutorial_security
You should use both in case there is data where it does not make much sense to store it in a graph DB such as neo4j/orientDB (and some data would be better off in a graph DB as opposed to a relational DB). Forcing data on one platform may cause issues with performance/scalability down the line.