How to store large user-specific data - mongodb

So I'm in the middle of planning a little web app that will require quite large amounts of data stored on a user level, in one case, the system would take a large object from a system level and make a "user specific" version, a user can have multiple ones of these. Simplest would be to compare it to a form stored in a google spreadsheet, where the user is expected to use the template spreadsheet, then change not only the answers but also the question.
Security wise I am quite OK
In the second case there is requirement to store multiple objects, size about 250k to maybe 3mb, once again on a user specific level, with a potential to move it to a system level so additional users can access it. As an example, say the user can upload pictures, but may not want to share all of them. However, a user may choose to "publish" a small number of them because they are happy with those specific pictures.
What design patterns should I consider using specifically around web apps where the user have decent amounts of data? For example, would it make most sense to use a single large database and have a table that keeps track of resources or create separate tables per user?
I have considered putting it all in a mongo database.

Your approach may be wrong.
If you want to store user based binary data and make it accessible for the user itself or the community, you would need a hierarchic structure like so:
userid1
pic1,pic2,pic3
userid2
pic4,pic5,pic6
community
pic7,pic8
You could then grant read permissions to "community" for all users, and permission for each user to its own directory.
Usually there is nothing wrong using a database to store binary files if you consider partitioning, role permissions and an applicable interface to access the data.
My suggestion is to use a binary repository like Artifactory.
It provides hierarchic structures, simple search queries using HTTP requests and has caching abilities for frequently queried objects.
I also think that http requests are a lot easier to use and also there is an abstraction layer to the data which is more secure.
Artifactory is free.

Related

What are some patters for designing REST API for user-based platform in AWS?

I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.
These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.

neo4j - graph database along with a relational database?

this is a question on best practice, i understand that there are a lot of different options for doing this, but i would like your opinions as to how you would approach solving this problem. Please take it as though performance is critical in this system, in other words scalable.
I have recently found the wonders of graph database, so i came up with a theoretical situation where a company wants to manage it's customers relationships, and in order to do so they are going to use neo4j which is great, and allows for really great management of the customers, different staff members and their relationships, which is all great, however the company now wants to create a web based interface which will need authentication, and anyone in the neo4j database should be able to login to the system in order to see how they are related to other people in the company's database, so each user must have a password/email/id associated with their name.
So my question is, in this case scenario, is it best to store the password_hash/password_salt/id/email in a mysql database and then based on the node look it up on the mysql database. Or is it better to store the password_hash/password_salt/id/email in the hash tables inside the nodes.
Also each store has 1000s of products, and they can be stored in the graph database or i can store the products in the mysql database and then look up the product there, and do the changes there, because the products are not related to each other, so no point in storing them in the graph database, so should they be not stored there to improve performance?
So my question boils down to this: is it best for large projects to use a graph database along with the more common rdms database such as mysql? if not, then what is the point at which you start to use these two database systems?
apologies in advance for my lack of knowledge regarding database terminology.
Graph DB is mainly used for maintaining relations. If app has a graph DB that does not mean that app needs to store everything in Graph DB.
Every node request on Graph is in memory and thus if you have unnecessary properties in your node it will be bloated and may make things slower and take more memory.I usually decide what needs to go in graph and what needs to go in DB by very simple rule.
High level property (that defines the relation and other important properties that defines the node) goes in graph whereas additional information goes in RDMS.
For example in FB may be FBID, Name goes in Graph as it defines the relationship of one node with another. But when user clicks on someones facebook ID, he/she gets to see other users DOB, Age , College .All these can go in RDBMS.
PS: RDMS has another advantage, it can be used for quick analytics. I know with graph also you can do that but i am not sure if its as scalable and easy as RDBMS.
Downside to this approach is : You need to maintain two DBS.
Unless you have a proven case for a two-DB solution, I'd say fewer moving parts would keep you more agile, more able to change things quickly. If later you find a use case that is difficult, then weigh up the cost/ benefit of introducing a second storage. A two-DB architecture is not unheard of, but comes with an overhead.
Specific to security, there is no reason why Neo4j or any other reasonable NOSQL solution couldn't do that: http://spring.neo4j.org/docs#tutorial_security
You should use both in case there is data where it does not make much sense to store it in a graph DB such as neo4j/orientDB (and some data would be better off in a graph DB as opposed to a relational DB). Forcing data on one platform may cause issues with performance/scalability down the line.

MongoDB permissions-based modelling problem

I'm trying to model a simple, experimental app as I learn Symfony and Doctrine.
My data model requires some flexibility, so I'm currenty looking into the possibility of using either an EAV model, or document store in MongoDB.
Here's my basic requirements:
Users will be able to store and share their favourite things (TV prog, website, song etc).
The list of possible 'things' a user can store is unknown. For example, a user may want to store their favourite animal.
Users can share their favourite things with other users. However, a user can decide what he / she shares with each other user. For example, a user may share their favourite movie with one user, but not another.
A typical user will log in and view all the favourite things from their list of friends, depending on what his friends have decided to share. The user will also update their own favourite things, which will be reflected when each other users views their own profile. Finally, the user may change which of his friends can see what of his favourite thing.
I've worked a lot with Magento, which uses the EAV model extensively. However, I'm adding another layer of complexity by restricting which users can see what information.
I'm instantly drawn to MongoDB as the schemaless format gives me the flexibility I require. However, I'm not sure how easy (or efficient) it will be to access the data once it's saved. I'm also concerned about how changes to the data will be managed, e.g. a user changes their favourite film.
I'm hoping someone can point me in the right direction. This is purely a demo app I'm building to further my knowledge, but I'm treating it like a real-world app where data access times are super-important.
Modelling this kind of app in a traditional relational DB makes me sweat when I think about the crazy number of joins I'd need to get the data for one user.
Thanks for reading this far, and please let me know if I can provide anymore information.
Regards,
Fish
You need to choose a model based on how you need to access the data.
If you just need to filter out some values when viewing the user profile, a single document for each user would work quite well, with each favorite within that having a list of authorized user/group IDs that is applied in the application code. Both read and write are single operations on a known document in this case, so will be fast.
If you need views across multiple profiles though, your main document should probably be the favorite. You'll need to set up the right indexes, but performance shouldn't be a problem.
Actually, the permissions you describe don't add that much complexity to an EAV schema - as long as attributes can have multiple values the permissions list is just one more attribute.

How do we share data between two different services

I am currently working on a web service which is periodically polled. It does not store its state and is instantiated everytime it is queried. Essentially, it retrieves the state of other external entities e.g. databases and delivers it back to the requester.
Recently, the need to store state as arisen in that
There is the need to continously collect data from a particular source and store the bits that are important/relevant
There is the need to collect the aggregate of a particular data source over a period of time
I came up with the following idea:
My main concern here is the fact that I am using a static class (essentially a global) to share data between the two services. Is there a better way to doing this?
edit: Thanks for the responses thus far. Apologies for the vaguesness of this question: just trying to work out what is the best way to share data across different services and am unsure as to the specifics (i.e. what is required). The platform that I am developing on is the .NET framework and both services are simply WCF services hosted as a Windows service.
The database route sounds like the most conventional way to go - however I am reluctant to go down that path for now (mainly for deployment/setup issues; it introduces the need to create new tables, etc in addition to simply installing the software) for at this point the transfer of relatively small amounts of data. This may of course change in the future and going the database route might be the way to go at that point.
Is there any other way besides adding a database persistance layer?
If you need to collect and aggregate data, you might want to consider using a database between the two layers. Or have I misunderstood something?
You should consider enhancing your question with more requirements: pretty much all options are open here.
Sure - how about data binding? I don't have a lot of information to go on here - about your platform but most sufficiently advanced systems offer it in some form.
You could replace your static shared data with some database representation, with a caching layer (like memcached) between the database and the webservice, so that most of the time the data is available very quickly from the cache, but can be retrieved from the database as needed.
I appreciate that you want to keep the architecture simple. Depending on the magnitude of items you have to look up and there permanency, you might just consider leveraging your file system or a message queue. It sounds like you want a file system, because that sounds the least amount of impact to your design.
If you start dealing with tens of thousands of small files, your directories could get hard to navigate and slow to do file lookups on. I typically shoot for about 1000 - 10000 files per directory, and concoct a routine that can generate a path to the file depending on the file name pattern. Keeping the number of subdirectories even is important, some file systems have a limit on the number of subdirectories in a parent directory.

Storing a list of Categorize URLs - Sqlite DB, XML, or pList? Structure Design?

Hey guys, I want to store a categorized list of URLs. This is an internet radio streaming app, and so I want to have at least three links for each genre:
the free streaming URL with commercials
the premium streaming URL at 128 kbps
the premium streaming URL at 256 kbps
So every genre will have these three URLs.
For the premium streams, there are also 'geo-localized' streaming URLs, or 'mirrors', for specific global areas. For example if I am in the United States, I can choose a closest location of the available mirrors for potentially better streaming quality/reliability.
These URLs can, though I doubt often, change, and so I will want to be able to update them, meaning the storage can't be read only. I don't know exactly how I should store the information, let alone in what type of storage: sqlite db, XML, or property lists. I'm new to all of this so I'm sorry if any of those is stupid for this situation, heh.
As for the structure, I'm not sure how to accomplish that either. I can possibly have separate files/databases, whatever I end up using, for each location, or I could have one big one that is something like:
Rock
Los Angeles
Free stream
Premium Stream - 128kbps
Premium Stream - 256kbps
But I figure the database/file would quickly become huge.
I guess I can also have separate files/databases for the free and premium streams, given that premium users most likely will only want to listen to premium streams (But still have the option of the 128kbps or 256kbps stream, depending on their network reliability). I could then have an option in the settings as to which streams to show; free or premium. This should cut down on the size.
I later want to present these URLs in a table view and navigation controller. The root view will be the list of the genres, and by drilling down into each genre it will show the free or premium streams. The location (such as Los Angeles) will be chosen in the settings, and will not appear in the table view.
I would appreciate your guys' suggestions. I tried to be as clear and specific as I could, sorry if I missed anything. I'm not asking for code, just what your ideas and suggestions are on how to design this persistent data store, and in what to store it, given that I'm new to this.
Thanks!
If I understand you, you want:
A list of genres, each containing a list of locations, each containing a list of qualities, each containing either a set of data (including URL) or just a URL. Either way, you can do this as a property list, and its just getting the the borderline of where you want to be with a property list as opposed to a database.
SQLite is on the iPhone, accessed by the standard C function API, although Core Data is not. SQLite would certainly allow you a lot more structure to your database and your queries.
Either way, you could include some sort of seed ID for the database, and then query an online server to receive just differences and that would reduce the need for transfering a large database over the net - but you'll need to determine just how large your database is before deciding on whether that is worthwhile. Simply compressing the XML file might be all that is needed, since the XML would compress a large amount (probably to ~ 30% of its original size).
Alternatively, you probably only really ned the area-appropriate sections of your database.
Something else to consider is that for compatibility with pre OS 3.0 devices, Core Data (SQLite) isn't really an option.
Besides Core Data's (un)availability, I'm not sure that the data you need to store maps cleanly onto a relational data model. I'd lean towards the plist storage. I think it makes the most sense given your needs.