Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
393 views
in Technique[技术] by (71.8m points)

javascript - 如何在JavaScript正则表达式中访问匹配的组?(How do you access the matched groups in a JavaScript regular expression?)

I want to match a portion of a string using a regular expression and then access that parenthesized substring:

(我想使用正则表达式匹配字符串的一部分,然后访问带括号的子字符串:)

var myString = "something format_abc"; // I want "abc"

var arr = /(?:^|s)format_(.*?)(?:s|$)/.exec(myString);

console.log(arr);     // Prints: [" format_abc", "abc"] .. so far so good.
console.log(arr[1]);  // Prints: undefined  (???)
console.log(arr[0]);  // Prints: format_undefined (!!!)

What am I doing wrong?

(我究竟做错了什么?)


I've discovered that there was nothing wrong with the regular expression code above: the actual string which I was testing against was this:

(我发现上面的正则表达式代码没有任何问题:我要针对的实际字符串是:)

"date format_%A"

Reporting that "%A" is undefined seems a very strange behaviour, but it is not directly related to this question, so I've opened a new one, Why is a matched substring returning "undefined" in JavaScript?

(报告“%A”未定义似乎是一种非常奇怪的行为,但它与该问题没有直接关系,因此我打开了一个新的代码, 为什么匹配的子字符串在JavaScript中返回“未定义”?)

.

(。)


The issue was that console.log takes its parameters like a printf statement, and since the string I was logging ( "%A" ) had a special value, it was trying to find the value of the next parameter.

(问题是console.logprintf语句一样接受其参数,并且由于我正在记录的字符串( "%A" )具有特殊值,因此它试图查找下一个参数的值。)

  ask by nickf translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can access capturing groups like this:

(您可以像这样访问捕获组:)

 var myString = "something format_abc"; var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g; var match = myRegexp.exec(myString); console.log(match[1]); // abc 

And if there are multiple matches you can iterate over them:

(如果存在多个匹配项,则可以对其进行迭代:)

 var myString = "something format_abc"; var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g; match = myRegexp.exec(myString); while (match != null) { // matched text: match[0] // match start: match.index // capturing group n: match[n] console.log(match[0]) match = myRegexp.exec(myString); } 

Edit: 2019-09-10 (编辑:2019-09-10)

As you can see the way to iterate over multiple matches was not very intuitive.

(如您所见,迭代多个匹配项的方法不是很直观。)

This lead to the proposal of the String.prototype.matchAll method.

(这导致了String.prototype.matchAll方法的提议。)

This new method is expected to ship in the ECMAScript 2020 specification .

(这种新方法有望在ECMAScript 2020规范中提供 。)

It gives us a clean API and solves multiple problems.

(它为我们提供了一个简洁的API,并解决了多个问题。)

It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.

(它已开始登陆主流浏览器和JS引擎,例如Chrome 73 + / Node 12+和Firefox 67+。)

The method returns an iterator and is used as follows:

(该方法返回一个迭代器,其用法如下:)

 const string = "something format_abc"; const regexp = /(?:^|\s)format_(.*?)(?:\s|$)/g; const matches = string.matchAll(regexp); for (const match of matches) { console.log(match); console.log(match.index) } 

As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings.

(当它返回一个迭代器时,我们可以说它是惰性的,这在处理特别大量的捕获组或非常大的字符串时非常有用。)

But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from method:

(但是,如果需要,可以使用传播语法Array.from方法将结果轻松转换为Array:)

function getFirstGroup(regexp, str) {
  const array = [...str.matchAll(regexp)];
  return array.map(m => m[1]);
}

// or:
function getFirstGroup(regexp, str) {
  return Array.from(str.matchAll(regexp), m => m[1]);
}

In the meantime, while this proposal gets more wide support, you can use the official shim package .

(同时,尽管此建议得到了更广泛的支持,但您可以使用官方的shim软件包 。)

Also, the internal workings of the method are simple.

(而且,该方法的内部工作很简单。)

An equivalent implementation using a generator function would be as follows:

(使用生成器功能的等效实现如下所示:)

function* matchAll(str, regexp) {
  const flags = regexp.global ? regexp.flags : regexp.flags + "g";
  const re = new RegExp(regexp, flags);
  let match;
  while (match = re.exec(str)) {
    yield match;
  }
}

A copy of the original regexp is created;

(原始正则表达式的副本已创建;)

this is to avoid side-effects due to the mutation of the lastIndex property when going through the multple matches.

(这是为了避免在进行多次匹配时由于lastIndex属性的突变而产生的副作用。)

Also, we need to ensure the regexp has the global flag to avoid an infinite loop.

(另外,我们需要确保regexp具有全局标志,以避免无限循环。)

I'm also happy to see that even this StackOverflow question was referenced in the discussions of the proposal .

(我也很高兴看到在提案讨论中甚至提到了这个StackOverflow问题。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...