I've been told that it's bad to expose database internals but I've started noticing lots of relatively high profile sites doing it, e.g. Chartboost and ServerDensity both expose the MongoDB document _id field in their URLs.
Can someone shed some light as to why that's bad to do? The only thing I can think of is that it's bad for SEO because they're not human readable URLs, but is this even true?
By "exposing database internals" I understand stuff like exposing the database server to the internet or letting user run arbitrary queries. This stuff is unquestionably bad. Or, if you somehow expose your database schema, a malicious user can use this to his advantage.
Using object ids in urls is fine. Humans do not memorize urls anyway, and search engines don't care if link to a post is made of post slug or post id.
Even stackoverflow show its database ID-s in URL. It could be surrogate key or natural, anyway you have to identify resource somehow. Basically, every single site use some kind of identification in URL, usually PK. Why do you think they use MongoDb ? It could be even relation database with GUID instead of Long PK
Even if you show someone database schema, nothing will happen, until you are protected from sql-injection.
Related
I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.
These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.
A little backstory
I have to develop a web application for college. This web application has to do with managing different locations using google maps like pinning new locations adding custom descriptions and so on. The login part is done using facebook (login with facebook). The more interesting part would be that the queries (client-server) would have to be done by using REST.
The part that i try to understand
If i use a database to store my user's unique ID, their online status (online/offline) and somehow (didn't settle actually on the idea) to keep a JSON on the server that would contain each user's pinned locations, would all this actually be ok with the REST paradigm ?
I find mixed answers on the internet and i don't know how to think of the statelessness of the application correctly. A session would not be created but the credentials from the database would be necessary for the users to communicate with each other.
The other side of the question
Considering that i'm mistaken and i shouldn't use the database to store the credentials and locations like that, how am i supposed to keep all that data ? I'm thinking something like JSON cached client-side but what if my client changes the computer, wouldn't this mean that he loses all his data? (Also wouldn't this make MVC handicapped by not having a model?) How do i really keep track of all things.
You're making this way too hard on yourself, try to keep it simple since you probably have a deadline. REST is a way of using APIs with HTTP verbs like GET, POST, PUT, and DELETE. It says nothing about how to store the data behind your APIs.
As for storing the data, a database should be fine. Storing it as JSON in the db could work, but in the end you'll have to parse the json every time that you want to use it, so I would suggest that you store it in a DB in such a way that it can be read easily.
For a beginner (especially if you're doing this for a school project), I would definitely suggest that you set up a relational database like Microsoft SQL Database (Microsoft Stack), or a MySQL/PosGres Database (I think this is what they'd use in linux), but if you wanna skip the relational db approach (because it might not be all that "easy" to get going), you can always try a NoSQL database like MongoDB.
Relevant links to help:
http://rest.elkstein.org/ (REST explained)
http://www.restapitutorial.com/lessons/httpmethods.html (REST verbs)
http://en.wikipedia.org/wiki/Relational_database (what is a relational db)
http://en.wikipedia.org/wiki/Database_normalization (Kinda the goal of relational db.. but note you can go too far...http://lemire.me/blog/archives/2010/12/02/over-normalization-is-bad-for-you/)
http://www.mongodb.com/nosql-explained (NoSQL explanation)
The official Specifications probably don't mention this or regard it as "outside the scope of this specification".
Say we have decided to separate the auth and resource servers.
In practice, why would we want the resource and auth servers to access the same database, or why would we want to keep two separate databases - one for each server?
I'm writing this question because of Entity Framework. If I shared the same database between both sites, I figure migrations would conflict even if the auth server only touched account-related tables and the resource server only touched data (POCO) tables. (I actually haven't tried this out yet, but I don't want to waste time experimenting, so I want to hear from someone who has come across this.)
But if I separate the databases, I lose the foreign key relationship between user and the data [s]he owns - but is that even necessary? It feels like it's one of those "we did that just because" practices.
To combine them and use migrations, you'd need to have a DbContext for migrations that includes all the types used by either auth or resource servers, and use that for scaffolding migrations. This seems to be a common practice for those that require multiple contexts and want to use migrations; it's worked well for me so far.
As far as the foreign key constraints, they're just that - constraints. They enforce your business rules (that Foo.Bar must contain a value that exists in Baz.Bar) at the database level. Whether or not this is necessary really depends on your use case. Very few things are done "just because", though many things are done without full understanding - "because I saw it done like this somewhere else". The somewhere else (or wherever they copied it from, etc.) may have a perfectly valid reason, which may or may not apply to your use case.
I'm bulding a small web application using mongodb and just thought if it's a good practice to show mongoIds publicly, in urls for example.
Now I'm using the following url structure for user profiles: http://example.com/user/MONGOID
Does this have any security flaws or is it discouraged in some other way?
The answer depends on many of things...
Using an ID in a URL is generally a bad idea. According to OWASP, it ranks #4 in the top 10 web security vulnerabitiy list. But using it will not ruin your project.
To prevent the security vulnerability, you must either :
Use it only on data that is public (like StackOverflow profiles)
Have some code intercept the request and validate that the user has the rights to see the resource (a profile, a page, a document, etc.)
Using _id It also ties your public URL to the back-end. You will need some conversion if you change database technology. Or maybe you will need to run some changes that will result in the object being destroyed and created again, but with a different _id, like merging databases or something. You don't want your URL to change because of that.
Another thing is that _id does not have a good spatial distribution. It does not make a good sharding key. Being derived from a time stamp, all _id are close together, linear if you will. They will tend to go in the same shard (Mongo will spread them later, but you want a key that has high cardinality).
So I prefer to pay now, and use a id field that is private to the application from the start. You can store it in the _id field if you want, but consider adding another key to your document, index it, and use that in your URLs.
No it does not have security implications.
All the person would be able to do is to guess the Id of some user or to try to go through all Ids to get all users of the system.
Take stackoverflow as an example. They have the same pattern as you: http://stackoverflow.com/users/352959 this is 352959 is you and there is nothing bad with it. The only thing that whenever you will enter this in your browser you will be redirected to http://stackoverflow.com/users/352959/king-julien.
I can try to iterate through these numbers and the next guy is http://stackoverflow.com/users/352960 but all I can found is that this is some john. And surely http://stackoverflow.com/users/1 is the creator of the resource.
this is a question on best practice, i understand that there are a lot of different options for doing this, but i would like your opinions as to how you would approach solving this problem. Please take it as though performance is critical in this system, in other words scalable.
I have recently found the wonders of graph database, so i came up with a theoretical situation where a company wants to manage it's customers relationships, and in order to do so they are going to use neo4j which is great, and allows for really great management of the customers, different staff members and their relationships, which is all great, however the company now wants to create a web based interface which will need authentication, and anyone in the neo4j database should be able to login to the system in order to see how they are related to other people in the company's database, so each user must have a password/email/id associated with their name.
So my question is, in this case scenario, is it best to store the password_hash/password_salt/id/email in a mysql database and then based on the node look it up on the mysql database. Or is it better to store the password_hash/password_salt/id/email in the hash tables inside the nodes.
Also each store has 1000s of products, and they can be stored in the graph database or i can store the products in the mysql database and then look up the product there, and do the changes there, because the products are not related to each other, so no point in storing them in the graph database, so should they be not stored there to improve performance?
So my question boils down to this: is it best for large projects to use a graph database along with the more common rdms database such as mysql? if not, then what is the point at which you start to use these two database systems?
apologies in advance for my lack of knowledge regarding database terminology.
Graph DB is mainly used for maintaining relations. If app has a graph DB that does not mean that app needs to store everything in Graph DB.
Every node request on Graph is in memory and thus if you have unnecessary properties in your node it will be bloated and may make things slower and take more memory.I usually decide what needs to go in graph and what needs to go in DB by very simple rule.
High level property (that defines the relation and other important properties that defines the node) goes in graph whereas additional information goes in RDMS.
For example in FB may be FBID, Name goes in Graph as it defines the relationship of one node with another. But when user clicks on someones facebook ID, he/she gets to see other users DOB, Age , College .All these can go in RDBMS.
PS: RDMS has another advantage, it can be used for quick analytics. I know with graph also you can do that but i am not sure if its as scalable and easy as RDBMS.
Downside to this approach is : You need to maintain two DBS.
Unless you have a proven case for a two-DB solution, I'd say fewer moving parts would keep you more agile, more able to change things quickly. If later you find a use case that is difficult, then weigh up the cost/ benefit of introducing a second storage. A two-DB architecture is not unheard of, but comes with an overhead.
Specific to security, there is no reason why Neo4j or any other reasonable NOSQL solution couldn't do that: http://spring.neo4j.org/docs#tutorial_security
You should use both in case there is data where it does not make much sense to store it in a graph DB such as neo4j/orientDB (and some data would be better off in a graph DB as opposed to a relational DB). Forcing data on one platform may cause issues with performance/scalability down the line.