Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

javascript - Get every item from markdown list under headings

Here is example markdown file

# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar

I have this code so far:

const markdown = await fs.promises.readFile(path.join(__dirname, 'test.md'), 'utf8');
const regexp = /^#{1,6} (.*)[.
]*[*-+] (.*)/gm;
const result = markdown.matchAll(regexp);
console.log([...result].map(m => m.slice(1)));
[
  [ 'First List', 'Hello World' ],
  [ 'Second List', 'item' ],
  [ 'Third List', 'item 1' ],
  [ 'Inside Nested', 'foo' ]
]

First issue is that it's only grabbing first item, Second is that if item is multiline it will only grab first line, and finally it doesn't include Another List because there text in between heading and list.

I'm pretty new to regexp and not sure if current regexp I have is even safe to use.

So basically I want to find every list in markdown file put it items in an array then look if there a heading any were above and not another list of some kind and then put that heading in a beginning of that array (all thought it's not necessary have to be in that format, could be object too, I just thought array would be simpler)

Desired result:

[
  ['First List', 'Hello World', 'Lorem', 'foo'],
  ['Second List', 'item'],
  ['Third List', 'item 1
part of item 1', 'item 2'],
  ['Another List', 'ITEM'],
  ['Inside Nested', 'foo', 'bar']
]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can try this regex:

/(?<=#{1,6} (.*)
(?:(?!#).*
)*)(?=[+*-] (.*(?:
(?![#+*-]).+)?))/g

Basically it's matching all the 0 width characters and test if there's list item (e.g. * item) in front of it and any title (e.g. # Title) before it and put both of them in separate groups. Whatever in between them doesn't matter, unless it's another title.

You can see the test cases here

The matchAll result will be

[
    ["", "First List", "Hello World"],
    ["", "First List", "Lorem"],
    ["", "First List", "foo"],
    ["", "Second List", "item"],
    ["", "Third List", "item 1
part of item 1"],
    ["", "Third List", "item 2"],
    ["", "Another List", "ITEM"],
    ["", "Inside Nested", "foo"],
    ["", "Inside Nested", "bar"]
]

Since you cannot make a regex that has dynamic amount matching groups, you need to group them together manually.

And here's the full example:

const markdown = `
# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar
`;

const regexp = /(?<=#{1,6} (.*)
(?:(?!#).*
)*)(?=[+*-] (.*(?:
(?![#+*-]).+)?))/g;
const matches = [...markdown.matchAll(regexp)];
const result = matches.reduce((acc, cur) => {
    const [title, item] = cur.slice(1);
    const target = acc.find(e => e[0] === title);
    if(target) {
        target.push(item);
    } else {
        acc.push([title, item]);
    }
    return acc;
}, []);
console.log(result);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...