Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
985 views
in Technique[技术] by (71.8m points)

url rewriting - How to replace underscore to dash with Nginx

I'm using Nginx for the first time ever, and got basically no knowledge of it.

I need to replace "_" with "-" in 100+ URL. I figured there must be an easy way to do this with Nginx, but can't find anything on Google.

Thanks!

Edit :

My url are for example : http://www.mywebsite.com/this_category/page1.php

I need this to become : http://www.mywebsite.com/this-category/page1.php

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

No, there's not an easy way to do this, but the rewrite engine can nonetheless be coerced into doing it, assuming you can put a reasonable cap on the number of dashes you need to convert in a single url (or even if you don't, see the end of the answer.)

Here's how I'd do it (tested code):

rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
rewrite ^([^_]*)_(.*)$ $1-$2;

The four rewrites respectively translate the first 8, 4, 2, and 1 underscores in the url to dashes. The number of underscores in each rule are decreasing powers of 2 on purpose. This block is the most efficient set of rules that will translate from 0 up to 15 occurrences of underscore in a single url, using all 16 combinations of either matching or not matching each individual rule.

You will also notice that I used [^_]* on every group except the last one, in every rule. This avoids having the regexp engine perform unneeded backtracking in the case of non matches. Basically, having nine universal stars .* in a regexp causes O(n9) complexity (which is quite bad) in the "worst case", which is a non match, which would actually be your most frequent case. (I can recommend this book for those who wish to really understand how a regexp is actually executed by the underlying library.)

For this reason, if you can put a smaller limit on the number of dashes than 15, I would recommend taking away the first rule, or the first two. The last three rules alone will translate up to 7 underscores; the last two will translate up to 3.

Finally, you didn't mention redirecting the user to the new url. (As opposed to just serving the content both at the underscored url and at the correct one, which is usually frowned upon by the search engine nuts. Just FYI.) If that's what you need, you will have to put those rewrites into a special location that is triggered on the presence of an underscore in the url, and that redirects the user to the new url at the end of the four rewrites:

location ~ _ {
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
  rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
  rewrite ^([^_]*)_(.*)$ $1-$2;
  rewrite ^ $uri permanent;
}

This also adds the benefit of traslating an unlimited number of underscores in a single url, at the expense of more that one redirect to the user's browser.

HTH ;-P


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...