How does one generate a slug from user comments dynamically? - mongodb

I was reading an article on Mongo site where by they mention adding a slug to every user comment. http://docs.mongodb.org/manual/use-cases/storing-comments/
What I am stuck on is how to generate a slug dynamically?
Any tips?

You do this within an area of your comment creation known as before_save. This is basically an event that occurs after you have the information for the comment but you have not saved yet.
This slug is just a unique identifier, you don't have to use the one they provided and infact the one they provide might not be best for storage, whereby they use the date and time and a bit on the end to make it unique.
I personally make a slug out of the _ids of the current and previous documents and then separate with /, it works and sorts well also it's easy to use pre-fixed regexes on since it is just the string representation of the OjectId so less guess work needed.

Related

Seem like i can't handle response from mongodb when using hyphen in field name

I didn't see any recommendation about using hyphen in field name at all
Even with #serialName it still didn't work
#SerialName("created-date")
val created_date: String,
but It worked fine with underscore (now i'm using it)
The reason i used it in the first place is because I have used a few api and most of them used hyphen and i just want to follow the common name.
If anyone know why please kindly tell me. I might be missing any docs or sth
There is a page in MongoDB documentation, I'm putting a shortcut for restrictions based on field names https://docs.mongodb.com/manual/reference/limits/#mongodb-limit-Restrictions-on-Field-Names.
MongoDB can store various different field names even you can have "space" in field name. It is not a problem for MongoDB, but once your application receives MongoDB output, it should be deserialized. I have never used kotlinx.serialization before; hence I'm just guessing. What if the problem might be coming from serialization/deserialization process. You better check kotlinx.serialization, maybe something is there.

Is it possible to prevent the reading and/or setting of a field value with DBIx Class?

I'm working in a project that uses Catalyst and DBIx::Class.
I have a requirement where, under a certain condition, users should not be able to read or set a specific field in a table (e.g. the last_name field in a list of users that will be presented and may be edited by the user).
Instead of applying the conditional logic to each part of the project where that table field is read or set, risking old or new cases where the logic is missed, is it possible to implement the logic directly in the DBIx::Class based module, to never return or change the value of that field when the condition is met?
I've been trying to find the answer, and I'm still reading, but I'm somewhat new to DBIx::Class and its documentation. Any help would be highly appreciated. Thank you!
I‘d use an around Moose method modifier on the column accessor generated by DBIC.
This won‘t be a real security solution as you can still access data without the Result class, for example when using HashRefInflator.
Same for calling get_column.
Real security would be at the database level with column level security and not allowing the database user used by the application to fetch that field.
Another solution I can think of is an additional Result class for that table that doesn‘t include the column, maybe even defaulting to it and only use the one including the column when the user has a special role.

REST API - string or numerical identifier in URL

We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.

What is a secure way to prevent a user from updating specific fields in a MongoDB document?

I am trying to prevent users from updating certain fields in a mongodb object for which they are allowed to edit every other field. For instance the user should be able to edit/add/remove all fields except the field "permissions". My current approach is to test each key the user is trying to "$set" and see if it starts with the substring "permissions" (to cover dot notation). Example in python:
def sanitize_set(son):
return {"$set": {k: v for k, v in son.get("$set", {}).items()
if not k.startswith("permissions")}}
This approach is beautifully simple and seems to work. I wanted to reach out to the community to see if anyone else has tackled this issue before or sees obvious flaws in my approach. Thank you,
Joshua
Without seeing some example data with an explanation of what should/shouldn't be updatable - it's hard to say for sure, but the way I would prevent this would be to not allow the user to directly supply the fields they will be updating. For example say you had a function called update_employee which updated information in an employee document. If you implement it like this:
update_employee(employee):
db.employees.update({_id: session.user_id}, {$set: employee})
Whatever gets passed in as the employee object is what will be updated. Instead you could create the update object using the values passed in like so:
update_employee(employee):
updatedEmployee = {
email: employee.email,
address: employee.address,
phone: employee.phone
}
db.employees.update({_id: session.user_id}, {$set: updatedEmployee})
This way you have complete control over what is being updated in your database. So if an extra field (such as salary) is passed in, it will be ignored.
Since (as far as I know) does not have a field lock, what you can do in this case is create a routine to pick up the specific document, present it to the user in any way you wish, but simply only show the fields they are allowed to edit.
You can present the entire JSON representation to the user (editor) and have a routine which simply does not allow changes to the fields that are locked. In other words if you dont want field {"name": "Sam"} to be edited even if the editor changes this value to {"name": "Joe"} just kick it out before updating and only update fields which are allowed to be edited. Since it is all done in memory before an actual update (upsert) you have total control over what is being edited and what is not.
If you follow a scheme which does have a prefix say e_address where you have decided any field with e_ allows editing, the job is that much easier programmatically.
Even in user-defined roles I have not seen any possibility of locking specific fields in a collection. (I could be wrong here.)
The programming constructs here are simple though.
A. Pick up field to memory
B. Editor does editing
C. Only update fields which are allowed to be edited. Any other changes just ignore.
(I kept this answer generic as I do not use Python, though the construct should apply to any language.)

Benefits of RESTful URL

What are the benefits of
http://www.example.com/app/servlet/cat1/cat2/item
URL
over
http://www.example.com/app/servlet?catid=12345
URL
Could there be any problems if we use first URL because initially we were using the first URL and change to second URL. This is in context of large constantly changing content on website. Here categories can be infinite in number.
In relation to a RESTful application, you should not care about the URL template. The "better" one is the one that is easier for the application to generate.
In relation to indexing and SEO, sorry, but it is unlikely that the search engines are going to understand your hypermedia API to be able to index it.
To get a better understanding in regards to the URLs, have a look at:
Is That REST API Really RPC? Roy Fielding Seems to Think So
Richardson Maturity Model
One difference is that the second URL doesn't name the categories, so the client code and indeed human users need to look up some category name to number mapping page first, store those mappings, use them all the time, and refresh the list when previously unknown categories are encountered etc.. Given the first URL you necessarily know the categories even if the item page doesn't mention them (but the site may still need a list of categories somewhere anyway).
Another difference is that the first format encodes two levels of categorisation, whereas the second hides the number of levels. That might make things easier or harder depending on how variable you want the depth to be (now or later) and whether someone inappropriately couples code to 2-level depth (for example, by parsing the URLs with a regexp capturing the categories using two subgroups). Of course, the same problem could exist if they couple themselves to the current depth of categories listed in a id->category-path mapping page anyway....
In terms of SEO, if this is something you want indexed by search engines the first is better assuming the category names are descriptive of the content under them. Most engines favor URLs that match the search query. However, if category names can change you likely need to maintain 301 redirects when they do.
The first form will be better indexed by search engines, and is more cache friendly. The latter is both an advantage (you can decrease the load on your server) and a disadvantage (you aren't necessarily aware of people re-visiting your page, and page changes may not propagate immediately to the users: a little care must be taken to achieve this).
The first form also requires (somewhat) heavier processing to get the desired item from the URL.
If you can control the URL syntax, I'd suggest something like:
http://www.example.com/app/servlet/cat1/cat2/item/12345
or better yet, through URL rewrite,
http://www.example.com/cat1/cat2/item/12345
where 12345 is the resource ID. Then when you access the data (which you would have done anyway), are able to do so quickly; and you just verify that the record does match cat1, cat2 and item. Experiment with page cache settings and be sure to send out ETag (maybe based on ID?) and Last-Modified headers, as well as checking If-Modified-Since and If-None-Match header requests.
What we have here is not a matter of "better" indexing but of relevancy.
And so, 1st URL will mark your page as a more relevant to the subject (assuming correlation between page/cat name and subject matter).
For example: Let`s say we both want to rank for "Red Nike shoes", say (for a simplicity sake) that we both got the same "score" on all SEO factors except for URL.
In 1st case the URL can be http://www.example.com/app/servlet/shoes/nike/red-nice
and in the second http://www.example.com/app/servlet?itemid=12345.
Just by looking on both string you can intuitively sense which one is more relevant...
The 1st one tells you up-front "Heck yes, I`m all about Red Nike Shoes" while the 2nd one kinda mumbles "Red Nike Shoes? Did you meant item code 12345?"
Also, Having part of the KW in the URL will help you get more relevancy and also it can help you win "long-tail" goals without much work. (just having KW in URL can sometimes be enough)
But the issue goes even deeper.
The second type of URL includes parameters and those can (an 99.9% will) lead to duplicated content issue. When using parameters you`ll have to deal with questions like:
What happens for non-existent catid?
Is there a parameter verification? (and how full proof is it?)
and etc.
So why choose the second version? Because sometime you just don`t have a choice... :)