Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
97 views
in Technique[技术] by (71.8m points)

ios - Remove HTML Tags from an NSString on the iPhone

There are a couple of different ways to remove HTML tags from an NSString in Cocoa.

One way is to render the string into an NSAttributedString and then grab the rendered text.

Another way is to use NSXMLDocument's -objectByApplyingXSLTString method to apply an XSLT transform that does it.

Unfortunately, the iPhone doesn't support NSAttributedString or NSXMLDocument. There are too many edge cases and malformed HTML documents for me to feel comfortable using regex or NSScanner. Does anyone have a solution to this?

One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.

For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:

<IMG SRC = "foo.gif" ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:

-(NSString *) stringByStrippingHTML {
  NSRange r;
  NSString *s = [[self copy] autorelease];
  while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    s = [s stringByReplacingCharactersInRange:r withString:@""];
  return s;
}

I have this declared as a category os NSString.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...