Display MongoId publicly or not? - mongodb

I'm bulding a small web application using mongodb and just thought if it's a good practice to show mongoIds publicly, in urls for example.
Now I'm using the following url structure for user profiles: http://example.com/user/MONGOID
Does this have any security flaws or is it discouraged in some other way?

The answer depends on many of things...
Using an ID in a URL is generally a bad idea. According to OWASP, it ranks #4 in the top 10 web security vulnerabitiy list. But using it will not ruin your project.
To prevent the security vulnerability, you must either :
Use it only on data that is public (like StackOverflow profiles)
Have some code intercept the request and validate that the user has the rights to see the resource (a profile, a page, a document, etc.)
Using _id It also ties your public URL to the back-end. You will need some conversion if you change database technology. Or maybe you will need to run some changes that will result in the object being destroyed and created again, but with a different _id, like merging databases or something. You don't want your URL to change because of that.
Another thing is that _id does not have a good spatial distribution. It does not make a good sharding key. Being derived from a time stamp, all _id are close together, linear if you will. They will tend to go in the same shard (Mongo will spread them later, but you want a key that has high cardinality).
So I prefer to pay now, and use a id field that is private to the application from the start. You can store it in the _id field if you want, but consider adding another key to your document, index it, and use that in your URLs.

No it does not have security implications.
All the person would be able to do is to guess the Id of some user or to try to go through all Ids to get all users of the system.
Take stackoverflow as an example. They have the same pattern as you: http://stackoverflow.com/users/352959 this is 352959 is you and there is nothing bad with it. The only thing that whenever you will enter this in your browser you will be redirected to http://stackoverflow.com/users/352959/king-julien.
I can try to iterate through these numbers and the next guy is http://stackoverflow.com/users/352960 but all I can found is that this is some john. And surely http://stackoverflow.com/users/1 is the creator of the resource.

Related

How to store one-time data in a MongoDB database?

I am building a personal work/career portfolio web app project, and plan on using MongoDB for my database. (I plan to build the project using MERN stack.) Most of my data is not one-time data (such as education, and work experiences), however I have a few pieces of data (such as my personal summary (the content for my "About Me" section), and skills summary) that are one-time only data (I think "single instance" might be a better fitting term). I would like to store all of the data in a database, and set up an admin-end to manage and edit the data. However, I am not sure how to go about storing the one-time data in my MongoDB database.
One idea I had was to create a collection solely for the one-time data, and only allow the user (me) to update and read the documents in the collection. Another idea I had was placing all of my portfolio data into a single collection called "entries", and giving each "entry" a type (such as "Education", or "Personal Summary"). Then when I retrieve the data from the collection I would gather all the documents with the same value in their type field together. I was thinking of storing each of the types as a constant on my server. However, my biggest concern with both ideas is if they would be considered bad practice of not.
I would be very appreciative if anyone has any advice on how to solve this problem.
I had implemented this a while back on one of my small projects, and again after discussing it over with some professionals I'm in contact with, they said that the best approach would be to create a collection with a single document that contains all the information, like the links, about, etc...
One more thing I, was suggested is that we could use Redis solely for the purpose of storing this type of information as well.
Something that I implemented a long time back similar to the one collection, single doc approach: https://github.com/codelancedevs/Sundar-Clinic/tree/local-backend/src/api/app
Working on a similar approach here: https://github.com/kunalkeshan/Cam-O-Genics-Backend
Hope this is of some help, I'm still learning as to what might be the best approach. Open to any suggestions out there!

In general, would it be redundant to have two GET routes for users (one for ID and one for username)?

I'm building a CRUD for users in my rest API, and currently my GET route looks like this:
get("/api/users/:id")
But this just occured to me: what if a users tries to search for other users via their username?
So I thought about implementing another route, like so:
get("api/users/username/:id")
But this just looks a bit reduntant to me. Even more so if ever my app should allow searching for actual names as well. Would I then need 3 routes?
So in this wonderful community, are there any experienced web developers that could tell me how they would handle having to search for a user via their username?
Obs: if you need more details, just comment about it and I'll promptly update my question 🙃
how they would handle having to search for a user via their username?
How would you support this on a web site?
You would probably have a form; that form would have an input control that would allow the user to provide a user name. When the user submit the form, the browser would copy the form input controls into an application/x-www-form-urlencoded document (as described by the HTTP standard), then substitute that document as the query_part of the form action, and submit the query.
So the resulting request would perhaps look like
GET /api/users?username=GuiMendel HTTP/x.y
You could, of course, have as many different forms as you like, with different combinations of input controls. Some of those forms might share actions, but not necessarily.
so I could just have my controller for GET "/api/users" redirect to an action based on the inputs?
REST doesn't care about "controllers" -- that's an implementation detail; the whole point is that the client doesn't need to know how the server produces a representation of the resource, we just need to know how to ask for it (via the "uniform interface").
Your routing framework might care a great deal, but again that's just another implementation detail hiding behind the facade.
for example, there were no inputs, it would return all users (index), but with the input you suggested, it would filter out only users whose usernames matched the input? Did I get it right?
Yup, that's fine.
From the point of view of a REST client
/api/users
/api/users?username=GuiMendel
These identify different resources; the two resources don't have to have any meaningful relationship with each other at all. The machines don't care (human beings do care, so we normally design our identifiers in such a way that at least some human beings have an easy time of it -- for example, we might optimize our identifiers to make things easy when operators are reading the access logs).

What are some patters for designing REST API for user-based platform in AWS?

I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.
These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.

REST-API design - allow custom IDs

we are designing an API which can be used by marketplaces and onlineshops to create payments for their customers.
To reduce the work the marketplaces and shops have to do to implement our API, we want to give them the ability to use their own user- and contract-IDs rather than storing the IDs we create. It makes it easier for them as they dont have to change/extend their databases. Internally in our database we will still use our own technical IDs. So far we do not run any checks on the custom-IDs (i.e. uniqueness).
My question is, if it is a good idea in general to let the stores & marketplaces use their own IDs, or if it is bad practice. And if our approach makes sense, should we run checks on the IDs we receive by the stores & marketplaces (i.e. uniqueness of a user-ID related to the store)?
Example payload for creating a new user via POST /users/:
{
customUserId: "fancyshopuserid12345",
name: "John",
surName: "Doe"
}
Now the shop can run a GET-request /users/fancyshopuserid12345 to retrieve the new user via our API.
EDIT:
We go with both approaches now.
If he wants to use his own id he does it like in the example above, if he sets false as the value for customUserId we set our internal ID as value.
Personally i think that it's awesome feature!
And i don't see any problems here.
I also think that you don't have validate customers ids, just check that it don't have injection to your persistence layer and it'll be enough.
More over your don't violate any REST conventions - that's why i think it's nice idea...
Well, a cool (RESTful) approach would be to receive URIs instead of custom IDs. That would unfortunately mean that those partner systems would have to publish their own resources in order be able to link to them. This would also solve the unique-ness problem, since you would only have to check whether the URI exists.
If some shop systems are in fact build RESTfully, they may want to actually store a URI instead of id, to be able to navigate seamlessly through their own and your systems. They would only have to add your media-types to their clients, and that's it.
Other than that, sure you can store IDs of third-party systems. I know of a few trading systems that do exactly that, storing all sorts of third-party IDs, of backend systems, of transport layer ids, etc. It is at least not unheard of.

Is it bad to expose database internals?

I've been told that it's bad to expose database internals but I've started noticing lots of relatively high profile sites doing it, e.g. Chartboost and ServerDensity both expose the MongoDB document _id field in their URLs.
Can someone shed some light as to why that's bad to do? The only thing I can think of is that it's bad for SEO because they're not human readable URLs, but is this even true?
By "exposing database internals" I understand stuff like exposing the database server to the internet or letting user run arbitrary queries. This stuff is unquestionably bad. Or, if you somehow expose your database schema, a malicious user can use this to his advantage.
Using object ids in urls is fine. Humans do not memorize urls anyway, and search engines don't care if link to a post is made of post slug or post id.
Even stackoverflow show its database ID-s in URL. It could be surrogate key or natural, anyway you have to identify resource somehow. Basically, every single site use some kind of identification in URL, usually PK. Why do you think they use MongoDb ? It could be even relation database with GUID instead of Long PK
Even if you show someone database schema, nothing will happen, until you are protected from sql-injection.