Isolate user's data and allow easy connection - saas

I look for the fastest way to:
Isolate user's data, e.g. by foreign key like user_id
Allow easy conect via common SQL engines, e.g. POSTGRESQL, MySQL, BigQuery, ..., Google Sheets
The aim is to allow SaaS customers to export their data fast. I have considered https://github.com/cenit-io/cenit but I don't know if it's a good choice. Perfectly, the solution should provide basic frontend.

Related

Should I have a seperate database to store financial data for each user in my postgreSQL server?

I am creating accounting/invoicing software and my database is in postgreSQL. Should I create a separate database for each user since the data is sensitive financial data? Or is having a user foreign key secure enough? If I am hosting the database on aws I understand that I could have a few db servers across multiple availability zones and regions so that if one is compromised it wouldn't effect everyone even if many users have info stored in a single database. Is this safe enough? Thanks!
In general no. Encrypt the data so that if someone exfiltrates a dump they can't actually use it without the decryption key. If you're worried that someone with admin access can see the user's information then you might want to consider a user-level encryption for all fields related to personally identifiable information.
There are few ways you could go about it but I wouldn’t create a new DB for every customers. It will be too expensive and a pain to maintain and evolve.
To me, this sounds like you are creating a multi-tenant application.
I’d personally use the row-level security feature in Postgres (see this article) or create a separate Schema for each Customer.
You can add an extra layer of protection with encryption at rest. AWS support it (link)

Should visualization tools like tableau or looker be used for multi-tenant systems?

Visualization tools like tableau, looker, apache superset are not supposed to be used for multi tenant products.
For example. A product with 1000's of users would like analytics on their data. This needs to be secure so company A cannot see other company B visualizations. For this to work these tools need to understand if a user has privileges to view the data. This is usually achieved through cookies after the user has logged in
To ensure data is only accessed by authorized users these third party tools should not be used. Instead sticking to Ruby on Rails with d3js, highcharts etc is the best options. The data can be managed a lot easier through the same authentication methods as you login and so the data is secure.
Actually, Looker handles multi-tenant data situation just fine. It is quite a common use case for Looker.
You can bind attributes to users that will force the right SQL to be written to guarantee that the user only sees appropriate data.
https://docs.looker.com/reference/explore-params/access_filter
We've got lots of customers building extranets for their businesses this way.
Disclosure: I work at looker.
The complexity of multi-tenant deployments goes far beyond the setup of some filter:
Data privacy - you are one typo away from a data privacy breach with the filters. You should use the database security and privacy capabilities to isolate your tenants.
Performance - you need to scale the underlying database to handle the load of concurrent users.
Customization - your tenants might need to load and analyze their own custom data. They need custom reports, etc.
Take a look at gooddata.com and their workspaces.
Disclosure: I work at GoodData

What are some patters for designing REST API for user-based platform in AWS?

I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.
These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.

firebase queries and swift

I have a string for eg: "My name is John" stored in Firebase.
How would I query firebase so I can find all the posts in Firebase that have "John" ?
I can search for the first term in a string now using:
DataService.dataService.BASE_REF.child("Posts").child(selectedComment.commentKey).queryOrderedByChild("userComment").queryStartingAtValue(comment).queryEndingAtValue(comment+"\u{F8FF}").observeSingleEventOfType(.Value, withBlock: { (snapshot) in
where comment = "My"
I read about using Elastic search with Firebase but wanted to check if there was an easier way in Firebase before I looked at ElasticSearch/Flashlight for Firebase,
Unfortunately, Firebase doesn't support searching thru content like that (in any language SDK). From a Google Groups Post in July '16:
As a company that understands search, we're also a company that
understands using the best tool for the job. For fuzzy matching and
contains, a NoSQL, realtime data store isn't the correct tool--these
queries would be slow and scale poorly. BigQuery or ElasticSearch are
the right tool for providing useful results in a scalable and robust
manner.
Right now, this involves deploying a small node script to sync your
search results with the realtime data, as explained in the article
with the sample Flashlight lib. In the future, it will become more
"effortless" as we add integrations between Firebase and Cloud
products, particularly Cloud Functions and BigQuery interoperability.
BigQuery is, as I understand it, not specifically designed for user-facing search.
Elasticsearch (specifically, the Firebase plugin Flashlight) is a potential solution, but as you alluded to, it's an incredible amount of overhead (deploying/managing or renting an ES cluster, configuring the plugin, etc.). If content search is an important enough part of your app to justify that time/$, you may want to consider solutions beyond Firebase for your database needs, as it's by far one of the service's weakest areas.
In my opinion, you have a few options beyond Flashlight:
Algolia, a Search-as-a-service provider, does offer integration with Firebase, but I've never used it & so can't offer much more than to say that it exists.
Another alternative might be maintaining a collection of documents you want to search on another service, like AWS Cloud Search
Depending on the stage of your project & your needs, consider other Backends-as-a-Service that support more in terms of querying. E.g., GraphQL-as-a-service backends, like Scaphold.io, Graph.cool, and Reindex are all built on SQL databases, and (I believe) all support multiple types of querying.

How to store large user-specific data

So I'm in the middle of planning a little web app that will require quite large amounts of data stored on a user level, in one case, the system would take a large object from a system level and make a "user specific" version, a user can have multiple ones of these. Simplest would be to compare it to a form stored in a google spreadsheet, where the user is expected to use the template spreadsheet, then change not only the answers but also the question.
Security wise I am quite OK
In the second case there is requirement to store multiple objects, size about 250k to maybe 3mb, once again on a user specific level, with a potential to move it to a system level so additional users can access it. As an example, say the user can upload pictures, but may not want to share all of them. However, a user may choose to "publish" a small number of them because they are happy with those specific pictures.
What design patterns should I consider using specifically around web apps where the user have decent amounts of data? For example, would it make most sense to use a single large database and have a table that keeps track of resources or create separate tables per user?
I have considered putting it all in a mongo database.
Your approach may be wrong.
If you want to store user based binary data and make it accessible for the user itself or the community, you would need a hierarchic structure like so:
userid1
pic1,pic2,pic3
userid2
pic4,pic5,pic6
community
pic7,pic8
You could then grant read permissions to "community" for all users, and permission for each user to its own directory.
Usually there is nothing wrong using a database to store binary files if you consider partitioning, role permissions and an applicable interface to access the data.
My suggestion is to use a binary repository like Artifactory.
It provides hierarchic structures, simple search queries using HTTP requests and has caching abilities for frequently queried objects.
I also think that http requests are a lot easier to use and also there is an abstraction layer to the data which is more secure.
Artifactory is free.