Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.9k views
in Technique[技术] by (71.8m points)

string - Equivalent pattern to "[-x7FxC2-xF4][x80-xBF]*" in Lua 5.1

When answering this question, I wrote this code to iterate over the UTF-8 byte sequence in a string:

local str = "KORYTNA?KA"
for c in str:gmatch("[-x7FxC2-xF4][x80-xBF]*") do 
    print(c) 
end

It works in Lua 5.2, but in Lua 5.1, it reports an error:

malformed pattern (missing ']')

I recall in Lua 5.1, the string literal xhh is not supported, so I modified it to:

local str = "KORYTNA?KA"
for c in str:gmatch("[-127194-244][128-191]*") do 
    print(c) 
end

But the error stays the same, how to fix it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I highly suspect, this happens because of in the pattern. Basically, string that holds your pattern null-terminates before it should and, in fact, what lua regex engine is parsing is: [. That's clearly wrong pattern and should trigger the error you're currently getting.

To prove this concept I made little change to pattern:

local str = "KORYTNA?KA"
for c in str:gmatch("[x0-x7FxC2-xF4][x80-xBF]*") do 
    print(c) 
end

That compiled and ran as expected on lua 5.1.4. Demonstration

Note: I have not actually looked what pattern was doing. Just removed by adding x. So output of modified code might not be what you expect.

Edit: As a workaround you might consider replacing with \0 (to escape null-termination) in your second code example:

local str = "KORYTNA?KA"
for c in str:gmatch("[\0-127194-244][128-191]*") do 
    print(c) 
end

Demo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...