Multiple 301 redirects in one line - redirect

This is probably easy for people who deal with these regularly, but I'm not sure what kind of code I will need to use to achieve what I want to. I know how to redirect individual URLs to other URLs, but when it comes to redirecting multiple at once I can't do it.
Basically I set up my site structure kinda bad when I built my website. I have a bunch of URLs named:
crafting-alchemist-level-1-10.php
all in the root directory, where alchemist-level-1-10 is the page name and crafting is the site section. I have about 50 of these URLs and I would like to put them all in a /crafting directory with the crafting- cut off the file names.
I could do this individually but there must be a way to do all with a single line. Is there?
These URL redirects need to be compatible with any parameters after the .php too.

Use mod_rewrite in your .htaccess
RewriteEngine On
RewriteRule ^(.*)/(.*)/(.*)$ $1-$2-$3.php
For more information (you will need to customize it a bit):
http://httpd.apache.org/docs/current/rewrite/intro.html#regex
EDIT
This will rewrite one/two/three to one-two-three.php.

Related

Robots.txt for application

Is it possible for an application within a website to have its own robots.txt file?
For example, I have a site running under http://www.example.com and this has its robots.txt file.
We then have a seperate site running as an application under this domain: http://www.example.com/website-app
Is it possible to keep the robots.txt file seperate for the application or do I need to put all the stuff for the application into the main root robots.txt?
The robots.txt file needs to reside in /robots.txt, there is no way to tell the crawler that it can be found anywhere else (like for favicons for example). So if you can you should add this to your root robots.txt (or put your application on a subdomain instead where it can have its own file).
If you want to control specific pages individually you can use <meta>-tags instead, as described at robotstxt.org. Since this needs to be put on every page it will have the crawler visit (but not index) at least one page, but it won't follow to other pages (unless you tell it to). For a small application in a subdirectory this might be an ok solution.

how can I override robots in a sub folder

I have a sub-domain for testing purposes. I have set robots.txt to disallow this folder.
Some of the results are still showing for some reason. I thought it may be because I hadn't set up the robots.txt originally and Google hadn't removed some of them yet.
Now I'm worried that the robots.txt files within the individual joomla sites in this folder are causing Google to keep indexing them. Ideally I would like to stop that from happening because I don't want to have to remember to turn robots.txt back to follow when they go live (just in case).
Is there a way to override these explicitly with a robots.txt in a folder above this folder?
As far as a crawler is concerned, robots.txt exists only in the site's root directory. There is no concept of a hierarchy of robots.txt files.
So if you have http://example.com and http://foo.example.com, then you would need two different robots.txt files: one for example.com and one for foo.example.com. When Googlebot reads the robots.txt file for foo.example.com, it does not take into account the robots.txt for example.com.
When Google bot is crawling example.com, it will not under any circumstances interpret the robots.txt file for foo.example.com. And when it's crawling foo.example.com, it will not interpret the robots.txt for example.com.
Does that answer your question?
More info
When Googlebot crawls foo.com, it will read foo.com/robots.txt and use the rules in that file. It will not read and follow the rules in foo.com/portfolio/robots.txt or foo.com/portfolio/mydummysite.com/robots.txt. See the first two sentences of my original answer.
I don't fully understand what you're trying to prevent, probably because I don't fully understand your site hierarchy. But you can't change a crawler's behavior on mydummysite.com by changing the robots.txt file at foo.com/robots.txt or foo.com/portfolio/robots.txt.

rewrite/redirect url from subdomain to domain with wildcard?

I just moved a site from a sub folder to the html root and i need to have some sort of htaccess rule to rewrite all urls so old links to the site still work
The old path is www.domain.com/stage/*
and should point now to www.domain.com/*
how can I achieve this?
Not sure what type of Webserver are you using, should look something like
rewrite ^(.+)/stage/(.*)$ $1/$2;
Let me know if you need any refinements or if you need it for a specific Webserver

How to add RESTful type routes in Jekyll

The root of the site http://example.com correctly identifies index.html and renders it. In a similar manner, I want, http://example.com/foo to fetch foo.html present in the root of the directory. The site that uses this functionality is www.zachholman.com. I've seen his code in Github. But still I'm not able to find how it is done. Please help.
This feature is actually available in Jekyll. Just add the following line to your _config.yml:
permalink: pretty
This will enable links to posts and pages without .html extension, e.g.
/about/ instead of /about.html
/YYYY/MM/DD/my-first-post/ instead of YYYY-MM-DD-my-first-post.html
However, you lose the ability to customize permalinks... and the trailing slash is pretty ugly.
Edit: The trailing slash seems to be there by design
It's actually the server that needs adjusting, not jekyll. Be default, jekyll is going to produces files with .html extensions. There may be a way around that, but it's unlikely that you really want to do go that route. Instead, you need to let your web server know that you want those files served when a URL is called with the file's basename (and no extension).
If your site is served via an Apache web server you can enable the "MultiViews" option. In most cases, you can do that be creating an .htaccess file at your site root with the following line:
Options +MultiViews
With this option enabled, when Apache receives a request for:
http://example.com/foo
It will serve the file:
/foo.html
Note that the Apache server must be setup to allow the option to be set in the htaccess file. If not, you would need to do it in the Apache config file itself. If your site is hosted on another web server, you'll need to look for an equivalent setting.

htaccess vs hard coded

When creating a CMS which would you recommend?
Making a htaccess dynamically create the pages based on ?pg=name
or
Making a FTP connection to auto create each file on the fly? This means when a new page is created/edited/deleted the admin, when saved, would ftp into the site and create the page.
Pros and Cons
"Pro" Less files means less space
"Con" More continually overhead for apache to redirect
"Con" More space taken
"Pro" Less work to find file sense its created and only once loaded when changed
ALright, let me clarify. Which is the better option.
create index.php and have all htaccess redirect to it sending ?pg=name and then get the content from database
have an admin automatically ftp into a site when content is created/edited/delete and create the page so when the person types the page in its hard coded
Without a doubt the best way to go for your CMS is using Apache mod_rewrite. This way you have more flexibility in the future for changing the way that you want URLs displayed, and it expedites the creation of new content so that it doesn't have to be uploaded via FTP every time.
If you have to use FTP to use your CMS, I'm afraid it won't be very scalable, which is one of the benefits of a CMS.
Your 'better option' is 1. Stick to mod_rewrite.
If you want to, you can mix those options - use htaccess for nice names for your pages, rewriting them to ?pg=name and then load data from file or database.