javascript - Get every item from markdown list under headings

Question

Welcome To Ask or Share your Answers For Others

javascript - Get every item from markdown list under headings

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

javascript - Get every item from markdown list under headings

Here is example markdown file

# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar

I have this code so far:

const markdown = await fs.promises.readFile(path.join(__dirname, 'test.md'), 'utf8');
const regexp = /^#{1,6} (.*)[.
]*[*-+] (.*)/gm;
const result = markdown.matchAll(regexp);
console.log([...result].map(m => m.slice(1)));

[
  [ 'First List', 'Hello World' ],
  [ 'Second List', 'item' ],
  [ 'Third List', 'item 1' ],
  [ 'Inside Nested', 'foo' ]
]

First issue is that it's only grabbing first item, Second is that if item is multiline it will only grab first line, and finally it doesn't include Another List because there text in between heading and list.

I'm pretty new to regexp and not sure if current regexp I have is even safe to use.

So basically I want to find every list in markdown file put it items in an array then look if there a heading any were above and not another list of some kind and then put that heading in a beginning of that array (all thought it's not necessary have to be in that format, could be object too, I just thought array would be simpler)

Desired result:

[
  ['First List', 'Hello World', 'Lorem', 'foo'],
  ['Second List', 'item'],
  ['Third List', 'item 1
part of item 1', 'item 2'],
  ['Another List', 'ITEM'],
  ['Inside Nested', 'foo', 'bar']
]

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-19T04:07:41+0000

You can try this regex:

/(?<=#{1,6} (.*)
(?:(?!#).*
)*)(?=[+*-] (.*(?:
(?![#+*-]).+)?))/g

Basically it's matching all the 0 width characters and test if there's list item (e.g. * item) in front of it and any title (e.g. # Title) before it and put both of them in separate groups. Whatever in between them doesn't matter, unless it's another title.

You can see the test cases here

The matchAll result will be

[
    ["", "First List", "Hello World"],
    ["", "First List", "Lorem"],
    ["", "First List", "foo"],
    ["", "Second List", "item"],
    ["", "Third List", "item 1
part of item 1"],
    ["", "Third List", "item 2"],
    ["", "Another List", "ITEM"],
    ["", "Inside Nested", "foo"],
    ["", "Inside Nested", "bar"]
]

Since you cannot make a regex that has dynamic amount matching groups, you need to group them together manually.

And here's the full example:

const markdown = `
# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar
`;

const regexp = /(?<=#{1,6} (.*)
(?:(?!#).*
)*)(?=[+*-] (.*(?:
(?![#+*-]).+)?))/g;
const matches = [...markdown.matchAll(regexp)];
const result = matches.reduce((acc, cur) => {
    const [title, item] = cur.slice(1);
    const target = acc.find(e => e[0] === title);
    if(target) {
        target.push(item);
    } else {
        acc.push([title, item]);
    }
    return acc;
}, []);
console.log(result);

Categories

javascript - Get every item from markdown list under headings

javascript - Get every item from markdown list under headings

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags