Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

regex - How to do CamelCase split in python

What I was trying to achieve, was something like this:

>>> camel_case_split("CamelCaseXYZ")
['Camel', 'Case', 'XYZ']
>>> camel_case_split("XYZCamelCase")
['XYZ', 'Camel', 'Case']

So I searched and found this perfect regular expression:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

As the next logical step I tried:

>>> re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")
['CamelCaseXYZ']

Why does this not work, and how do I achieve the result from the linked question in python?

Edit: Solution summary

I tested all provided solutions with a few test cases:

string:                 ''
AplusKminus:            ['']
casimir_et_hippolyte:   []
two_hundred_success:    []
kalefranz:              string index out of range # with modification: either [] or ['']

string:                 ' '
AplusKminus:            [' ']
casimir_et_hippolyte:   []
two_hundred_success:    [' ']
kalefranz:              [' ']

string:                 'lower'
all algorithms:         ['lower']

string:                 'UPPER'
all algorithms:         ['UPPER']

string:                 'Initial'
all algorithms:         ['Initial']

string:                 'dromedaryCase'
AplusKminus:            ['dromedary', 'Case']
casimir_et_hippolyte:   ['dromedary', 'Case']
two_hundred_success:    ['dromedary', 'Case']
kalefranz:              ['Dromedary', 'Case'] # with modification: ['dromedary', 'Case']

string:                 'CamelCase'
all algorithms:         ['Camel', 'Case']

string:                 'ABCWordDEF'
AplusKminus:            ['ABC', 'Word', 'DEF']
casimir_et_hippolyte:   ['ABC', 'Word', 'DEF']
two_hundred_success:    ['ABC', 'Word', 'DEF']
kalefranz:              ['ABCWord', 'DEF']

In summary you could say the solution by @kalefranz does not match the question (see the last case) and the solution by @casimir et hippolyte eats a single space, and thereby violates the idea that a split should not change the individual parts. The only difference among the remaining two alternatives is that my solution returns a list with the empty string on an empty string input and the solution by @200_success returns an empty list. I don't know how the python community stands on that issue, so I say: I am fine with either one. And since 200_success's solution is simpler, I accepted it as the correct answer.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As @AplusKminus has explained, re.split() never splits on an empty pattern match. Therefore, instead of splitting, you should try finding the components you are interested in.

Here is a solution using re.finditer() that emulates splitting:

def camel_case_split(identifier):
    matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
    return [m.group(0) for m in matches]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...