Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
329 views
in Technique[技术] by (71.8m points)

jsoup line feed

We're using Jsoup.clean(String, Whitelist) to process some input, and it appears that Jsoup is adding an extraneous line break just prior to acceptable tags. I've seen a few people post this issue around the internet, but haven't been able to track down a solution.

For instance, let's say we have a very simple string with some bold tags within it, like so:

String htmlToClean = "This is a line with <b>bold text</b> within it."                                                                                                                                                       
String returnString =  Jsoup.clean(htmlToClean, Whitelist.relaxed());
System.out.println(returnString);

What comes out of the call to the clean() method is something like so:

This is a line with 
<b>bold text</b> within it. 

Notice that extraneous " " appended just prior to the opening bold tag. I can't seem to track down in the source where this is being appended (although admittedly I'm new to Jsoup).

Has anyone encountered this problem, and better yet, have found some way to avoid this extra, unwanted character to be appended to the string in this way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Hmm... have not seen any options for this.

If you parse the html in Document you have some output settings:

Document doc = Jsoup.parseBodyFragment(htmlToClean);
doc.outputSettings().prettyPrint(false);

System.out.println(doc.body().html());

With prettyPrint off you'll get the following output: This is a line with <b>bold text</b> within it.

Maybe you can write your own clean() method, since the implemented one useses Document's (there' you can disable prettyPrint):

Orginal methods:

public static String clean(String bodyHtml, Whitelist whitelist) {
    return clean(bodyHtml, "", whitelist);
}

public static String clean(String bodyHtml, String baseUri, Whitelist whitelist) {
    Document dirty = parseBodyFragment(bodyHtml, baseUri);
    Cleaner cleaner = new Cleaner(whitelist);
    Document clean = cleaner.clean(dirty);
    return clean.body().html();
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...