Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
487 views
in Technique[技术] by (71.8m points)

.htaccess mod-rewrite regex apache confusion results in 10k 404's per day

I have reviewed the many questions posted here related to .htaccess, apache, mod-rewrite and regex, but I'm just not getting it. I tried a few different things but either I am over complicating things or making beginner mistakes. Regardless, I've been at it a few days now and have completely scrambled things somewhere as the 10000 404's per day are showing.

My site

I have a WordPress site which contains over 23,000 posts broken down into just over 1200 categories. The site features streaming video files, industry news, show reviews, movies, phpbb forums, etc. and is structured like this:

  • site / base categories ( 0 and a-z) / sub categories (series name) / posts (episode name .html )for all streaming media episodes
  • site / movies / post title.html for all streaming movies
  • site / news / posttitle.html
  • site / reviews / posttitle.html
  • site / page.html for assorted pages
  • site / forums

Permalink structure is /%category%/%postname%.html

I have am using the Yoast Wordpress SEO plugin and have the option to append a trailing slash enabled for directories and categories.

here is the current .htaccess

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>

    # END WordPress

My examples

From our old site structure we have many inbound links using "/episode title/". This is wrong. We need these incoming links to redirect to /watch-anime/letter, number or symbol only 1 character long/series title/episode title.html

/one-piece-episode-528?/

should be

/watch-anime/o/one-piece/?one-piece-episode-528?.html

A mistake I made caused this problem... "/watch-anime/letter/series title/episode title/" to "/watch-anime/letter/series title/episode title.html". So, we need to remove trailing slash from single posts and add .html

/watch-anime?/w?/welcome-to-the-nhk?/welcome-to-the-nhk-episode-14?/

should be

/watch-anime?/w?/welcome-to-the-nhk?/welcome-to-the-nhk-episode-14?.html

The same mistake caused this problem when combined with the old site structure issue... "/episode title.html" needs to be "/watch-anime/letter/series title/episode title.html"

/one-piece-episode-528?.html

needs to be

/watch-anime/o/one-piece/?one-piece-episode-528?.html

As you can see, I've made a mess of things between migrating the sites post structure and my attempts to fix it. I am now asking for any help you can provide in getting a proper .htaccess file that will take care of these 301 redirects.

Thanks for any assistance you can provide!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't know if RewriteMap work with .htaccess files, but anyway here's my solution for virtual host, which should work flawlessly.

Create a RewriteMap file. See here for more information. This is a very simple text file with: first, the wrong URL without the '/', then one space (at least) and then the right url, like this:

one-piece-episode-528? /watch-anime/o/one-piece/?one-piece-episode-528?.html
dexter-season-6-episode-1 /watch-interesting-stuff/d/dexter/dexter-season-6-episode-1.html
breaking-bad-full-season-3 /watch-interesting-stuff/b/breaking-bad/?breaking-bad-full-season-3.html

and so on.

convert this simple text file into hash map. For example:

httxt2dbm -i mapanime.txt -o mapanime.map

Now declare it in your vhost:

RewriteMap mapanime 
    dbm:/pathtofile/mapanime.map

So all in all your vhost should look like:

<VirtualHost *>
    RewriteEngine On
    RewriteMap mapanime 
        dbm:/pathtofile/mapanime.map
    # don't touch the URL, but try to search if it exists in mapanime
    RewriteRule /([^/]*)/$ - [QSA,NC,E=VARANIME:${mapanime:$1|notfound}]
    # if VARANIME not empty *and*
    #   VARANIME different from "notfound":
    RewriteCond %{ENV:VARANIME} ^(notfound|)$
    # then redirect it to the right URL:
    # QSA = query string append
    # R = redirect, 301 = definitive redirect
    # L = last = don't go further
    RewriteRule . %{ENV:VARANIME} [QSA,R=301,L]
</VirtualHost>

Hope this helps.

I don't see a simpler solution, but I'm pretty sure this one will work.

If it doesn't work: read my usual "two hints", and add the rewrite log in your question.

Two hints:

Please try to use the RewriteLog directive: it helps you to track down such problems:

# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On

My favorite tool to check for regexp:

http://www.quanetic.com/Regex (don't forget to choose ereg(POSIX) instead of preg(PCRE)!)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...