I have a HTML which looks like this:
<h1>Title</h1>
<p>Some additional content, can be multiple, various tags</p>
<h2><a id="123"></a>Foo</h2>
<p>Some additional content, can be multiple, various tags</p>
<h3><a id="456"></a>Bar</h3>
Now, for each anchor with id, I want to find out the header hierarchy, e.g. for the anchor with id="123"
I would like to get something like [{level: 1, title: "Title"}, {level: 2, title: "Foo"}]
, similarly for anchor with id="456"
, I would like to get [{level: 1, title: "Title"}, {level: 2, title: "Foo"}, {level: 3, title: "Bar"}]
.
My code looks like this so far:
const linkModel: IDictionary<ILinkModelEntry> = {};
const $ = cheerio.load(html);
$("a").each((_i, elt) => {
const anchor = $(elt);
const id = anchor.attr().id;
if (id) {
const parent = anchor.parent();
const parentTag = parent.prop("tagName");
let headerHierarchy: any[] = [];
if (["H1", "H2", "H3", "H4", "H5", "H6"].includes(parentTag)) {
let level = parseInt(parentTag[1]);
headerHierarchy = [{level, text: parent.text()}];
level--;
while (level > 0) {
const prevHeader = parent.prev("h" + level);
const text = prevHeader.text();
headerHierarchy.unshift({level, text});
level--;
}
}
linkModel["#" + id] = {originalId: id, count: count++, headerHierarchy};
}
});
What am I doing wrong, since
const prevHeader = parent.prev("h" + level);
const text = prevHeader.text();
always returns an empty string (i.e. ""
)?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…