This question is very similar in nature to this one, but for java instead of python.
<body.content>
<block class="lead_paragraph">
<p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>
</block>
<block class="full_text">
<p>LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.</p>
</block>
What I'm trying to do is extract the text of the sentence without all the xml formatting, using jsoup.
So I'm looking for
LEAD: Two police officers responding to a reported robbery at a Brooklyn tavern early yesterday were themselves held up by the robbers, who took their revolvers and herded them into a back room with patrons, the police said.
UPDATE
In fact my situation is a bit different though, because I've got some additional XML formatting which I'd like to keep, i.e. <PERSON>
<block?class="full_text">
<p>SCHEINMAN</PERSON>--<PERSON>Alan</PERSON>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love, <PERSON>Roni</PERSON>, <PERSON>Sandy</PERSON>, <PERSON>Jarret</PERSON>, <PERSON>Greg</PERSON>, <PERSON>Kate</PERSON>, and <PERSON>Auden Gray</PERSON></p>
</block></body.content></body></nitf>
The ideal output would be:
SCHEINMAN</PERSON>--<PERSON>Alan</PERSON>. Happy Birthday. Thirteen years, many tears. Loving memories of your smile, humor, and laughter comfort us. You are always in our hearts. Love, <PERSON>Roni</PERSON>, <PERSON>Sandy</PERSON>, <PERSON>Jarret</PERSON>, <PERSON>Greg</PERSON>, <PERSON>Kate</PERSON>, and <PERSON>Auden Gray</PERSON>
My attempt so far:
BufferedReader br = new BufferedReader(new FileReader(filename));
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null)
{
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String everything = sb.toString();
Document doc = Jsoup.parse(everything);
String link = doc.select("block.full_text").text();
System.out.println(link);
}
finally
{
br.close();
}
See Question&Answers more detail:
os