Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

javascript - Get string between 2 words, that contain this words inside him too

I have strings, and i want to find in them 2 words: 'start' and 'end'.

'start' and 'end' always come together (maybe i will have another characters between them, but if i have 'start', i will have 'end' too).

I try to do with regEx source that find the first 'start' and than his own 'end', and it will return the correct substring.


examples of strings: [i wrote in this examples index for every couple of 'start' and 'end' just for clarity (in the real strings i will not have this indexes)- the answer always between index (1)]

  1. something start something_needed end something // print 'something_needed'
  2. start(1) something start(2) something end(2) something end(1) start something end // print 'something start(2) something end(2) something'
  3. start(1) something start(2) start(3) something end(3) something start(4) end(4) something end(2) something end(1) something start(5) something end(5) // print 'something start**(2) start(3) something end(3) something start(4) end(4) something end(2) something'

This is my solution in Javascript, but i prefer the answer in regEx only.

i find all the start, and after that all the end, and than- for every start: count++, for every end: count--. when count == 0, it the position of the correct end.

function getStartEnd(str) {
    str = " "+str+" ";
    var start = matchPosArr(str, /[ds
,()[]{}]+START+(?=[ds
,()[]{}])/gi);
    var end = matchPosArr(str, /[ds
,()[]{}]+END+(?=[ds
,()[]{}])/gi);
    var count = 0;  // counter
    var si = 0;     // index of start array
    var ei = 0;     // index of end array
    var isStart = false;
    while (true) {
        if (ei >= end.length) {
            alert('error');
            break;
        }
        else if (si >= start.length) {
            ei++;
            count--;
            if (count == 0) {
                ei--;
            }
        }
        else if (start[si] > end[ei]) {
            ei++;
            count--;
        }
        else if (start[si] < end[ei]) {
            si++;
            count++;
        }
        if (count == 0 && isStart==true) {
            break;
        }
        isStart = true;
    }
    return str.substring(start[0]+("start ".length),end[ei]);
}
function matchPosArr(str, regEx) {
    var pos = []; 
    while ((match = regEx.exec(str)) != null) {
        pos.push(match.index);
    }
    return pos;
}

alert( getSelectFrom(str) );
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a possible solution from Matching Nested Constructs in JavaScript, Part 2.

Example usage:

matchRecursiveRegExp("START text START text END text more END text", "START", "END");

// (c) 2007 Steven Levithan <stevenlevithan.com>
// MIT License

/*** matchRecursiveRegExp
Accepts a string to search, a left and right format delimiter
as regex patterns, and optional regex flags. Returns an array
of matches, allowing nested instances of left/right delimiters.
Use the "g" flag to return all matches, otherwise only the
first is returned. Be careful to ensure that the left and
right format delimiters produce mutually exclusive matches.
Backreferences are not supported within the right delimiter
due to how it is internally combined with the left delimiter.
When matching strings whose format delimiters are unbalanced
to the left or right, the output is intentionally as a
conventional regex library with recursion support would
produce, e.g. "<<x>" and "<x>>" both produce ["x"] when using
"<" and ">" as the delimiters (both strings contain a single,
balanced instance of "<x>").

examples:
matchRecursiveRegExp("test", "\(", "\)")
returns: []
matchRecursiveRegExp("<t<<e>><s>>t<>", "<", ">", "g")
returns: ["t<<e>><s>", ""]
matchRecursiveRegExp("<div id="x">test</div>", "<div\b[^>]*>", "</div>", "gi")
returns: ["test"]

*/
function matchRecursiveRegExp (str, left, right, flags) {
varf = flags || "",
g = f.indexOf("g") > -1,
x = new RegExp(left + "|" + right, "g" + f),
l = new RegExp(left, f.replace(/g/g, "")),
a = [],
t, s, m;

do {
t = 0;
while (m = x.exec(str)) {
if (l.test(m[0])) {
if (!t++) s = x.lastIndex;
} else if (t) {
if (!--t) {
a.push(str.slice(s, m.index));
if (!g) return a;
}
}
}
} while (t && (x.lastIndex = s));

return a;
}
document.write(matchRecursiveRegExp("something start something_needed end something", "start", "end") + "<br/>");
document.write(matchRecursiveRegExp("start something start something end something end start something end", "start", "end")+ "<br/>");
document.write(matchRecursiveRegExp("start something start start something end something start end something end something end something start something end", "start", "end")+ "<br/>");

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...