How to implement Wordpress-like Permalink - content-management-system

I was thinking about a building a CMS, and I want to implement the wordpress-like permalink for my posts. How do I do that?
I mean, How do I define the custom url structure for my pages?

What language are you using? I'm assuming that you are thinking about PHP (given your reference to word press). You have a few options:
Mod-Rewrite
Router
In my opinion, the best option is to find a modern web framework that provides good routing functionality. Furthermore, look at modifying an existing CMS (many exist; you seem to have heard of word press).

I'd recommend creating links that pass in a URL parameter such as ..."http://...PostID?123&CatID=232&..." so that when the person clicks on that particular link, you can parse the parameters in the URL, and get the exact post based on id, or even do further filtering by passing in other fields as needed.

If you want to build the whole thing yourself, first understand what a front controller is, as it really addresses the underlying issue of how do you execute the same code for different URLs. With this understanding, there are two ways to attack the problem with this design pattern: URL rewriting or physical file generation.
URL Rewriting
With URL rewriting, you would need intercept the requested URL and send it onto your front controller. Typically this is accomplished at the web server level, although some application servers also act as web servers. With Apache, as others have posted, you would use mod_rewrite with a rule that looks something like this:
RewriteRule ^/(.*) /path/to/front/controller.ext [E=REQUEST_URI:%{REQUEST_URI},QSA,PT,NS]
With this rule, the path originally requested with be sent to the front controller as a variable called "REQUEST_URI". Note, I'm not sure the right syntax in PHP to access it. In the front controller hash (e.g. MD5) this value and use it to lookup the record from a database - take into account whatever hashing algorithm you use will produce duplicates. The hash is necessary if you allow URLs over whatever the max column size is in your database for varchar data, assuming you can't search on CLOBs.
Physical File Generation
Physical file generation would create a file that maps to the permanent URL you're imagining. So you'd write something that creates/renames the file at time it's posted. This removes the need for storing a hash and instead you place information about the post you want to serve inside that file (i.e. ID of the post) and pass that along to the front controller.
Recommendation
My preference is the URL rewriting approach, so you don't have to worry about writing dynamic code files out at runtime. That said, if you want something with less magic, or you're expecting a lot of requests, the physical file generation is the way to go because it's more obvious and requires the server to do less work.

Related

In general, would it be redundant to have two GET routes for users (one for ID and one for username)?

I'm building a CRUD for users in my rest API, and currently my GET route looks like this:
get("/api/users/:id")
But this just occured to me: what if a users tries to search for other users via their username?
So I thought about implementing another route, like so:
get("api/users/username/:id")
But this just looks a bit reduntant to me. Even more so if ever my app should allow searching for actual names as well. Would I then need 3 routes?
So in this wonderful community, are there any experienced web developers that could tell me how they would handle having to search for a user via their username?
Obs: if you need more details, just comment about it and I'll promptly update my question 🙃
how they would handle having to search for a user via their username?
How would you support this on a web site?
You would probably have a form; that form would have an input control that would allow the user to provide a user name. When the user submit the form, the browser would copy the form input controls into an application/x-www-form-urlencoded document (as described by the HTTP standard), then substitute that document as the query_part of the form action, and submit the query.
So the resulting request would perhaps look like
GET /api/users?username=GuiMendel HTTP/x.y
You could, of course, have as many different forms as you like, with different combinations of input controls. Some of those forms might share actions, but not necessarily.
so I could just have my controller for GET "/api/users" redirect to an action based on the inputs?
REST doesn't care about "controllers" -- that's an implementation detail; the whole point is that the client doesn't need to know how the server produces a representation of the resource, we just need to know how to ask for it (via the "uniform interface").
Your routing framework might care a great deal, but again that's just another implementation detail hiding behind the facade.
for example, there were no inputs, it would return all users (index), but with the input you suggested, it would filter out only users whose usernames matched the input? Did I get it right?
Yup, that's fine.
From the point of view of a REST client
/api/users
/api/users?username=GuiMendel
These identify different resources; the two resources don't have to have any meaningful relationship with each other at all. The machines don't care (human beings do care, so we normally design our identifiers in such a way that at least some human beings have an easy time of it -- for example, we might optimize our identifiers to make things easy when operators are reading the access logs).

Yesod forms with page flow

Certain forms are too complicated to have them fit on one page. If, for example, a form involves large amounts of structured data, such as picking locations on a map, scheduling events in a calendar widget, or having certain parts of a form change depending on earlier input, it is of value to be able to break up a certain form over multiple pages.
This is easy to do with dynamic web pages and Javascript, as one would simply create a tab widget with different pages, and the actual submitted form would contain the whole tab widget and all of its input fields, yielding a single POST request for the entire operation.
Sometimes, however, it takes a long time to generate certain input fields; they might even be computationally intensive even after the page has been generated, taxing the low-end computer user's browser. Additionally, it becomes difficult or impossible to create forms that adapt themselves based on earlier input.
It therefore becomes necessary to split up a certain form over multiple full page requests.
This can prove to be difficult, especially since the first page of a form will POST to /location/a, which will issue a redirect to /location/b and requested as GET by the client. Passing the stored form data from POST /location/a to GET /location/b is where the difficulty lies.
Erwin Vervaet, the creator of Spring Web Flow (A subproject of the Spring framework, mostly known for its dependency injection capabilities) once wrote a blog article demonstrating this functionality in said framework, and also comparing it to the Lift Web Framework which implemented similar functionality. He then presents a challenge to other web frameworks, which is further described in a later article.
How would Yesod face this problem, especially considering its stateless REST-based nature?
Firstly, there's no pre-built solution to this in existence yet (that I'm aware of at least). And I'm not familiar with how the other frameworks mentioned solve the problem. So what I say here is pretty much conjecture. I'm fairly certain it would work, however.
The crux of the issue here is encoding page A's POST parameters into the GET request for page B. The simplest way to do that would be to stick page A's POST parameters into a session variable. However, doing so would break navigation pretty thoroughly: back/forward wouldn't work at all as described.
So we come back to REST: we need to encode the POST parameters into the request itself. That really means putting the information in either the requested path, or the query string. And the query string probably makes the most sense.
I'd be concerned about putting the raw POST parameters into the query string, as that would allow any proxy server to easily snoop the contents. So I'd like to leverage the existing cryptography from clientsession. In other words, we'll stick a signed, encrypted version of the previous form submission in a query string parameter.
To make it a bit more concrete:
User goes to page A via GET.
User submits page A via POST.
Server validates form submission, gets a value, serializes it, encrypts/hashes it.
User is redirected to page B as a GET, with a query string parameter containing the encrypted/hashed value from page A.
Continue this process as many times as desired.
On the final page, you can decrypt the query string parameter and have all of the form submissions.
This looks like it would be a fun add-on package to write if anyone's interested.

How do you handle multiple files in a form submission using Apache2::Upload?

I'm writing a small web application using Perl, HTML::Mason and Apache.
I've been using Mason's usual <%args> method for receiving 'normal' form parameters, and Apache2::Upload for receiving files.
However, I want to write a page that allows a user to upload multiple files, and I'd like to take advantage of HTML5's multiple attribute to input fields. This will look to the server as though there were multiple file inputs in the form with the same name.
The interface for Apache2::Upload doesn't seem to directly support this, allowing you instead to just get the data for a file with a particular parameter name. The documentation alludes to using APR::Request::Param::Table, but I can't find any documentation for doing that.
Please note that I'm not interested in answers that involve adding extra file input fields with different names. This is trivial to handle on the server, and my question doesn't involve front-end scripting at all.
Use the multiple attribute (in the form as you described) and then, after submission, call the Apache request object's upload method. That will give you a list of Apache2::Upload instances.
Good luck!

Comparing data with RESTful API

For a website I am working on defining a RESTful API. I believe I got it (mostly) correct using proper resource URIs and proper use of GET/POST/UPDATE/DELETE.
However there is one point where I can't quite figure out what the proper way to do it "in" REST would be - comparison of lists.
Let's say I have a bookstore and a customer can have a wishlist. The wishlist consists of books (their full Book record, i.e. name, synopsis, etc) and a full copy of the list exists on the client. What would be a good way to design the RESTful API to allow a client to query the correctness of its local wishlist (i.e. get to know what books have been added/removed on the wishlist on the server side)?
One option would be to just download the full wishlist from the server and compare it locally. However this is quite a large amount of data (due to the embedded content) and this is a mobile client with a low-bandwidth connection, so this would cause a lot of problems.
Another option would be to download not the whole wishlist (i.e. not including book infos) but only a list of the books' identifiers. This would be not much data (compared to the previous option) and the client could compare the lists locally. However to get the full book record for newly added books, a REST call would have to be made for every single new book. Again, as this is a mobile client with bad network connectivity, this could be problematic.
A third option and my favorite, would be that the client sends its list of identifiers to the server and the server compares it to the wishlist and returns what books were removed and the data for books that were added. This would mean a single roundtrip and only the necessary amount of data. As the wishlist size is estimated to be less than 100 entries, sending just the IDs would be a minimal amount of data (~0.5kb). However I don't know what kind of call would be appropriate - it can't be GET as we are sending data (and putting it all in the URL does not feel right), it can't be POST/UPDATE as we do not change anything on the server. Obviously it's not DELETE either.
How would you implement this third option?
Side-question: how would you solve this problem (i.e. why is option 3 stupid or what better, simple solutions may there be)?
Thank you.
P.S.: A fourth option would be to implement a more sophisticated protocol where the server keeps track of changes to the list (additions/deletes) and the client can e.g. query for changes based on a version identifier or simply a timestamp. However I like the third option better as implementation-wise it is much more simpler and less error-prone on both client and server.
There is nothing in HTTP that says that POST must update the server. People seem to forget the following line in RFC2616 regarding one use of POST:
Providing a block of data, such as the result of submitting a
form, to a data-handling process;
There is nothing wrong with taking your client side wishlist and POSTing to a resource whose sole purpose is to return a set of differences.
POST /Bookstore/WishlistComparisonEngine
The whole concept behind REST is that you leverage the power of the underlying HTTP protocol.
In this case there are two HTTP headers that can help you find out if the list on your mobile device is stale. An added benefit is that the client on your mobile device probably supports these headers natively, which means you won't have to add any client side code to implement them!
If-Modified-Since: check to see if the server's copy has been updated since your client first retrieved it
Etag: check to see if a unique identifier for your client's local copy matches that which is on the server. An easy way to generate the unique string required for ETags on your server is to just hash the service's text output using MD5.
You might try reading Mark Nottingham's excellent HTTP caching tutorial for information on how these headers work.
If you are using Rails 2.2 or greater, there is built in support for these headers.
Django 1.1 supports conditional view processing.
And this MIX video shows how to implement with ASP.Net MVC.
I think the key problems here are the definitions of Book and Wishlist, and where the authoritative copies of Wishlists are kept.
I'd attack the problem this way. First you have Books, which are keyed by ISBN number and have all the metadata describing the book (title, authors, description, publication date, pages, etc.) Then you have Wishlists, which are merely lists of ISBN numbers. You'll also have Customer and other resources.
You could name Book resources something like:
/book/{isbn}
and Wishlist resources:
/customer/{customer}/wishlist
assuming you have one wishlist per customer.
The authoritative Wishlists are on the server, and the client has a local cached copy. Likewise the authoritative Books are on the server, and the client has cached copies.
The Book representation could be, say, an XML document with the metadata. The Wishlist representation would be a list of Book resource names (and perhaps snippets of metadata). The Atom and RSS formats seem good fits for Wishlist representations.
So your client-server synchronization would go like this:
GET /customer/{customer}/wishlist
for ( each Book resource name /book/{isbn} in the wishlist )
GET /book/{isbn}
This is fully RESTful, and lets the client later on do PUT (to update a Wishlist) and DELETE (to delete it).
This synchronization would be pretty efficient on a wired connection, but since you're on a mobile you need to be more careful. As #marshally points out, HTTP 1.1 has a lot of optimization features. Do read that HTTP caching tutorial, and be sure to have your web server properly set Expires headers, ETags, etc. Then make sure the client has an HTTP cache. If your app were browser-based, you could leverage the browser cache. If you're rolling your own app, and can't find a caching library to use, you can write a really basic HTTP 1.1 cache that stores the returned representations in a database or in the file system. The cache entries would be indexed by resource names, and hold the expiration dates, entity tag numbers, etc. This cache might take a couple days or a week or two to write, but it is a general solution to your synchronization problems.
You can also consider using GZIP compression on the responses, as this cuts down the sizes by maybe 60%. All major browsers and servers support it, and there are client libraries you can use if your programming language doesn't already (Java has GzipInputStream, for instance).
If I strip out the domain-specific details from your question, here's what I get:
In your RESTful client-server application, the client stores a local copy of a large resource. Periodically, the client needs to check with the server to determine whether its copy of the resource is up-to-date.
marshally's suggestion is to use HTTP caching, which IMO is a good approach provided it can be done within your app's constraints (e.g., authentication system).
The downside is that if the resource is stale in any way, you'll be downloading the entire list, which sounds like it's not feasible in your situation.
Instead, how about re-evaluating the need to keep a local copy of the Wishlist in the first place:
How is your client currently using the local Wishlist?
If you had to, how would you replace the local copy with data fetched from the server?
What have you done to minimize your client's data requirements when building its Wishlist view(s) and executing business logic?
Your third alternative sounds nice, but I agree that it doesn't feel to RESTfull ...
Here's another suggestion that may or may not work: If you keep a version history of of your list, you could ask for updates since a specific version. This feels more like something that can be a GET operation. The version identifiers could either be simple version numbers (like in e.g. svn), or if you want to support branching or other non-linear history they could be some kind of checksums (like in e.g. monotone).
Disclaimer: I'm not an expert on REST philosophy or implementation by any means.
Edit: Did you ad that PS after I loaded the question? Or did I simply not read your question all the way through before writing an answer? Sorry. I still think the versioning might be a good idea, though.

Adding pages "on the fly" with a CMS system

I am in the process of building a website content management system for one of my clients. It's a highly customized system, so I cannot use any "of the shelve" solution.
I need to allow my client to add pages to the website on the fly. I have two options here:
(1) Create a database driven page in the format of www.mycompany.com/page.aspx?catID=5&pageID=3 (query the database with the category and page ID's, grab the data and show it on the page) - or -
(2) Allow the management system to create static pages, something like www.mycompany.com/company/aboutus.aspx and www.mycompany.com/company/company_history.aspx , etc.
I believe that, while the former is much easier to implement, the latter is a better both for the user AND for Google.
My questions are (finally): (1) Would you agree that the latter is a better solution, and (2) What is the best way to implement such a solution? Should I create and update each file using the FileSystem (i.e. - the site's management system requires the user to supply a page/file name, page title and content, and creates the page on the fly based on these parameters)? Is there a better way?
Thank you!
It's entirely possible to have database driven pages with nice URLs. StackOverflow itself is a great example - this question's URL is http://stackoverflow.com/questions/1119274/adding-pages-on-the-fly-with-a-cms-system, but the page is built from the database, not static HTML.
I would use the first solution, but mask the addresses using a custom request handler. Basically, give each of your pages a unique string ID (such as about-us) and then, with your request handler that takes all requests, find this particular page in the database and render it.
See this article for some additional info (found it when googling for custom http handlers in ASP.NET.) In that article, it has the following handler added:
<add verb="*" path="*.piechart" type="PieChartHandler"/>
You would probably want to catch all paths (*), excluding certain media paths used for CSS, images and JavaScript.
More resources:
Custom HTTP Handler
HttpHandler in ASP.Net
I'd stay clear of static pages if I where you. Dynamic Data, MVC and some good planning should take you a long way!
What you need to do is to create some or many templates that each view/controller in mvc can use. Let whoever is responsible for the content handle it through dynamic data entities.
I would use the first idea, but work out a better URL scheme. If the system doesn't provide nice URLs (without ?), you'll have trouble getting the search engines to parse the whole site. Also using numbers instead of words make it hard on users to pass around URLs.
If you start to have performance problems you could add caching that would generate static pages from time to time. I would avoid doing that until you have to; caching can cause many headaches along the way to getting it right.
Although the existing advice is more-or-less sound, the commentators have failed to consider one factor which, admittedly, you haven't given much detail on. Are these pages that they'll edit once they're built, or a they one-shot creations? If the latter, your plan of generating static pages isn't quite so bad as they suggest. Why bother even having to think about database schemas and caching, when you can just serve flat content.
It will probably make for pretty lifeless, end-of-the-road pages, but if that's what you want ...