I'm using MongoMapper and am trying to come up with a way to use an abbreviated field name. I want to keep the keys as plain english but have the field names stored short (e.g. "name" maps to "_n"). I noticed a conversation but it ended up closed (https://github.com/jnunemaker/mongomapper/pull/351) and I was wondering if anybody has an example of how to accomplish this. I'd be extremely grateful for any feedback!
Related to that conversation is another pull request, #353. It hasn't been pulled yet but I'm guessing it likely will be, it simply may have been forgotten by the maintainers.
When pulled, your case will look like:
class User
include MongoMapper::Document
key :name, String, :abbr => :_n
end
I recommended leaving a comment on #353 saying that you were looking for that feature.
Related
In my project I have a Team entity. Its structure looks like that:
id: long; // primary key
teamNumber: string; // unique team number, numeric, but can contain leading zeroes
title: string; // team title
I have a REST-endpoint to get team by id, and the address of this endpoint name pattern looks like that:
/teams/{id}
So, if I want to get a team by its id, for example for id=123, I make the GET-request to:
/teams/123
But, also it is required to have an endpoint to get a team by team number (not id).
And in this case I can't have a REST-endpoint with the same pattern:
/teams/{teamNumber}
Because it conflicts with the
/teams/{id}
I should have another unique address for this (get) request.
My question: what is the best practice to name REST-endpoint addresses for such cases?
Should I have something like:
/teams/team-by-number/{teamNumber}
Or there is a better approach?
If you have 2 unique ways to find these resources, and both can potentially collide (they're both numbers), then yes you will definitely need 2 different namespaces.
Your approach to this is fine.
I would question if you can't just use one of the two, because I can imagine that having 2 unique numeric identifiers can easily confuse people. If this is a hard requirement, then I think you're on the right path.
Well, what you are using is called path parameter. You can also use query parameters. Something like below:
/teams?id=123 or /teams?teamNumber=123
This is just another way. It's up to you.
In this conflicting case, according to the conventions, I'd recommend:
/teams/id/{id}
/teams/team-number/{teamNumber}
Let's say we have two models like this:
User:
_ _id
- name
- email
Company:
- _id
_ name
_ slug
Now let's say I need to connect a user to the company. A user can have one company assigned. To do this, I can add a new field called companyID in the user model. But I'm not sending the _id field to the front end. All the requests that come to the API will have the slug only. There are two ways I can do this:
1) Add slug to relate the company: If I do this, I can take the slug sent from a request and directly query for the company.
2) Add the _id of the company: If I do this, I need to first use the slug to query for the company and then use the _id returned to query for the required data.
May I please know which way is the best? Is there any extra benefit when using the _id of a record for the relationship?
Agree with the 2nd approach. There are several issues to consider when deciding on which field to use as a join key (this is true of all DBs, not just Mongo):
The field must be unique. I'm not sure exactly what the 'slug' field in your schema represents, but if there is any chance this could be duplicated, then don't use it.
The field must not change. Strictly speaking, you can change a key field but the only way to safely do so is to simultaneously change it in all the child tables atomically. This is a difficult thing to do reliably because a) you have to know which tables are using the field (maybe some other developer added another table that you're not aware of) b) If you do it one at a time, you'll introduce race conditions c) If any of the updates fail, you'll have inconsistent data and corrupted parent-child links. Some SQL DBs have a cascading-update feature to solve this problem, but Mongo does not. It's a hard enough problem that you really, really don't want to change a key field if you don't have to.
The field must be indexed. Strictly speaking this isn't true, but if you're going to join on it, then you will be running a lot of queries on it, so you'll need to index it.
For these reasons, it's almost always recommended to use a key field that serves solely as a key field, with no actual information stored in it. Plenty of people have been burned using things like Social Security Numbers, drivers licenses, etc. as key fields, either because there can be duplicates (e.g. SSNs can be duplicated if people are using fake numbers, or if they don't have one), or the numbers can change (e.g. drivers licenses).
Plus, by doing so, you can format the key field to optimize for speed of unique generation and indexing. For example, if you use SSNs, you need to check the SSN against the rest of the DB to ensure it's unique. That takes time if you have millions of records. Similarly for slugs, which are text fields that need to be hashed and checked against an index. OTOH, mongoDB essentially uses UUIDs as keys, which means it doesn't have to check for uniqueness (the algorithm guarantees a high statistical likelihood of uniqueness).
The bottomline is that there are very good reasons not to use a "real" field as your key if you can help it. Fortunately for you, mongoDB already gives you a great key field which satisfies all the above criteria, the _id field. Therefore, you should use it. Even if slug is not a "real" field and you generate it the exact same way as an _id field, why bother? Why does a record have to have 2 unique identifiers?
The second issue in your situation is that you don't expose the company's _id field to the user. Intuitively, it seems like that should be a valuable piece of information that shouldn't be given out willy-nilly. But the truth is, it has no informational value by itself, because, as stated above, a key should have no actual information. The place to implement security is in the query, ensuring that the user doing the query has permission to access the record / specific fields that she's asking for. Hiding the key is a classic security-by-obscurity that doesn't actually improve security.
The only time to hide your primary key is if you're using a poorly thought-out key that does contain useful information. For example, an invoice Id that increments by 1 for each invoice can be used by someone to figure out how many orders you get in a day. Auto-increment Ids can also be easily guessed (if my invoice is #5, can I snoop on invoice #6?). Fortunately, Mongo uses UUIDs so there's really no information leaking out (except maybe for timing attacks on its cryptographic algorithm? And if you're worried about that, you need far more in-depth security considerations than this post :-).
Look at it another way: if a slug reliably points to a specific company and user, then how is it more secure than just using the _id?
That said, there are some instances where exposing a secondary key (like slugs) is helpful, none of which have to do with security. For example, if in the future you need to migrate DB platforms and need to re-generate keys because the new platform can't use your old ones; or if users will be manually typing in identifiers, then it's helpful to give them something easier to remember like slugs. But even in those situations, you can use the slug as a handy identifier for users to use, but in your DB, you should still use the company ID to do the actual join (like in your option #2). Check out this discussion about the pros/cons of exposing _ids to users:
https://softwareengineering.stackexchange.com/questions/218306/why-not-expose-a-primary-key
So my recommendation would be to go ahead and give the user the company Id (along with the slug if you want a human-readable format e.g. for URLs, although mongo _ids can be used in a URL). They can send it back to you to get the user, and you can (after appropriate permission checks) do the join and send back the user data. If you don't want to expose the company Id, then I'd recommend your option #2, which is essentially the same thing except you're adding an additional query to first get the company Id. IMHO, that's a waste of cycles for no real improvement in security, but if there are other considerations, then it's still acceptable. And both of those options are better than using the slug as a primary key.
Second way of approach is the best,That is Add the _id of the company.
Using _id is the best way of practise to query any kind of information,even complex queries can be solved using _id as it is a unique ObjectId created by Mongodb. Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query.
I am developing a generic REST API for my projects and I'm wondering what to do when I have a table/resource with 2 or more primary keys.
For example, lets suppose I have a table named "question" with two primary keys (date and type) and I need to create the resource REST URI. What is the best way to do it following the standard schema api/{resource}/{id}?
Maybe something like: api/question/{:date},{:type}? What is the best way to do it?
Thank you.
I think that what you call multiple primary keys is a composite key. Right?
You have some options.
Use api/questions/dates/:date/types/:type
Maybe, the best alternative for you is:
api/questions/dates/{:date}/types/{:type}
This is more natural to read as a http resource for your case, even if don't make sense have a api/question/dates/{:date} in your application.
Use api/questions/:date/:type/
Another alternative is:
api/questions/:date/:type/
Use query parameter
If it's no a problem for you, instead of return a single object question you can return an array of questions as response using a filter query, like:
api/questions?date=2022-10-27&type=XYZ
Both are not mandatory, but if the user send both, the return will be always an array with a single element. Also this bring some flexibility to your API, because the user can inform just one of them and have some results. You need to check if this behavior it's valid for your case.
You're on the right path, I think you definitely should include both the date and the type in the resource url if that's the only way you can uniquely identify it
api/question/{date}_{type}
This is a good example of when to use a slug. This answer to What is a slug provides a good idea of how you can use your composite primary key in your api design.
with that, you have a few options at your disposal. Which is the best would be a matter of opinion and what suits your needs.
api/question/{:date}/{:type} or api/question/{:key1}/{:key2}/.../{:keyn}
The same pattern could also be applied to the following.
api/question/{:date}_{:type}
api/question/{:date}-{:type}
I do not find it a good idea of having two primary keys for a resource. REST heavily depends on resources and it's representations.
If you are struck into situation where you are ending up with two identifiers for a resource - then redesign your application (may be by creating another key in backend after mapping it to other identifiers) and add these multiple keys as attributes in resource.
Idea is - "keep it simple" if you want to create truly world class REST APIs.
Bonus: You don't need to teach few extra things to clients/developers about something fancy you did with your APIs.
Simple question I'm having trouble finding an answer to..
If I have a REST web service, and my design is not using url parameters, how can I specify two different keys to return the same resource by?
Example
I want (and have already implemented)
/Person/{ID}
which returns a person as expected.
Now I also want
/Person/{Name}
which returns a person by name.
Is this the correct RESTful format? Or is it something like:
/Person/Name/{Name}
You should only use one URI to refer to a single resource. Having multiple URIs will only cause confusion. In your example, confusion would arise due to two people having the same name. Which person resource are they referring to then?
That said, you can have multiple URIs refer to a single resource, but for anything other than the "true" URI you should simply redirect the client to the right place using a status code of 301 - Moved Permanently.
Personally, I would never implement a multi-ID scheme or redirection to support it. Pick a single identification scheme and stick with it. The users of your API will thank you.
What you really need to build is a query API, so focus on how you would implement something like a /personFinder resource which could take a name as a parameter and return potentially multiple matching /person/{ID} URIs in the response.
I guess technically you could have both URI's point to the same resource (perhaps with one of them as the canonical resource) but I think you wouldn't want to do this from an implementation perspective. What if there is an overlap between IDs and names?
It sure does seem like a good place to use query parameters, but if you insist on not doing so, perhaps you could do
person/{ID}
and
personByName/{Name}
I generally agree with this answer that for clarity and consistency it'd be best to avoid multiple ids pointing to the same entity.
Sometimes however, such a situation arises naturally. An example I work with is Polish companies, which can be identified by their tax id ('NIP' number) or by their national business registry id ('KRS' number).
In such case, I think one should first add the secondary id as a criterion to the search endpoint. Thus users will be able to "translate" between secondary id and primary id.
However, if users still keep insisting on being able to retrieve an entity directly by the secondary id (as we experienced), one other possibility is to provide a "secret" URL, not described in the documentation, performing such an operation. This can be given to users who made the effort to ask for it, and the potential ambiguity and confusion is then on them, if they decide to use it, not on everyone reading the documentation.
In terms of ambiguity and confusion for the API maintainer, I think this can be kept reasonably minimal with a helper function to immediately detect and translate the secondary id to primary id at the beginning of each relevant API endpoint.
It obviously matters much less than normal what scheme is chosen for the secret URL.
I'm looking for a recommendation on how best to implement MongoDB foreign key ObjectId fields. There seem to be two possible options, either containing the nested _id field or without.
Take a look at the fkUid field below.
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} }
OR
{'_id':ObjectId('4ee12488f047051590000000'), 'fkUid':ObjectId('4ee12488f047051590000001')} }
Any recommendations would be much appreciated.
I'm having a hard time coming up with any possible advantages for putting an extra field "layer" in there, so I would personally just store the ObjectId directly in fkUid.
I suggest to use default dbref implementation, that is described here http://www.mongodb.org/display/DOCS/Database+References and is compatible with most of specific language drivers.
If your question is about the naming of the field (what you have in the title), usually the convention is to name it after the object to which it refers.
The both ways that you have mentioned are one of the same meaning. But they have different kind of usages.
Storing fkUid like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001')} an object has it's own pros. Let me give an example, Suppose there is a website where users can post images and view images posted by other users as well. But when showing the image the website also shows the name/username of the user. By using this way you also can store the details like 'fkUid':{'_id':ObjectId('4ee12488f047051590000001'), username: 'SOME_X'}. When you are getting details from the db you don't have to send a request again to get the username for the specific _id.
Where as in the second way 'fkUid':ObjectId('4ee12488f047051590000001')} } you have to send another request to the server only for getting the name/username and nothing else is useful from the same object.