Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

regex - python re.split lookahead pattern

I'm trying re.split to get BCF#, BTS# and LAC, CI from logfile with the header and regular structure inside:

==================================================================================
RADIO NETWORK CONFIGURATION IN BSC:
                                                         E P  B
                                      F                  T R  C D-CHANNEL  BUSY
                      AD OP           R  ET- BCCH/CBCH/  R E  S O&M LINK  HR  FR
 LAC   CI         HOP ST STATE  FREQ  T  PCM ERACH       X F  U NAME  ST
                                                                         /GP
===================== == ====== ==== == ==== =========== = = == ===== == === ===

BCF-0010  FLEXI MULTI  U WO                                   2 LM10  WO
10090 31335 BTS-0010  U WO                                                0   0
 KHAKHAATT070D    BB/- 
                                                                               7
              TRX-001  U WO      779  0 1348 MBCCH+CBCH    P  0
              TRX-002  U WO      659  0 1348                  1
              TRX-003  U WO      661  0 1348                  2
              TRX-004  U WO      670  0 1348                  0
              TRX-005  U WO      674  0 1348                  1
 10090 31336 BTS-0011  U WO                                                0   0
 KHAKHAATT200D    BB/- 
                                                                               7
              TRX-006  U WO      811  0 1348 MBCCH+CBCH    P  2
              TRX-009  U WO      845  0 1349                  2
              TRX-010  U WO      819  0 1349                  0
              TRX-011  U WO      823  0 1349                  1
              TRX-012  U WO      836  0 1349                  2
 10090 31337 BTS-0012  U WO                                                0   0
 KHAKHAATT340D    BB/- 
                                                                               5
              TRX-013  U WO      799  0 1349 MBCCH+CBCH    P  0
              TRX-014  U WO      829  0 1349                  1
              TRX-017  U WO      831  0 1302                  2
              TRX-018  U WO      834  0 1302                  1
              TRX-019  U WO      853  0 1302                  0
              TRX-020  U WO      858  0 1302                  2
              TRX-021  U WO      861  0 1302                  1

BCF-0020  FLEXI MULTI  U WO                                   0 LM20  WO
 10090 30341 BTS-0020  U WO                                                0   0
 KHAKHABYT100G    BB/- 
                                                                               1
              TRX-001  U WO       14  0 1856 MBCCH+CBCH    P  0
              TRX-002  U WO       85  0 1856                  1
 10090 30342 BTS-0021  U WO                                                0   0
 KHAKHABYT230G    BB/- 
                                                                               1
              TRX-003  U WO        4  0 1856 MBCCH+CBCH    P  2
              TRX-004  U WO       12  0 1856                  0
 10090 30343 BTS-0022  U WO                                                0   0
 KHAKHABYT340G    BB/- 
                                                                               1
              TRX-005  U WO       20  0 1856 MBCCH+CBCH    P  1
              TRX-006  U WO       22  0 1856                  2
 10090 30345 BTS-0025  U WO                                                0   0
 KHAKHABYT100D    BB/- 
                                                                               5
              TRX-007  U WO      793  0 1856 MBCCH+CBCH    P  0
              TRX-008  U WO      851  0 1856                  1
              TRX-009  U WO      834  0 1857                  2
              TRX-010  U WO      825  0 1857                  1
 10090 30346 BTS-0026  U WO                                                0   0
 KHAKHABYT230D    BB/- 
                                                                               4
              TRX-011  U WO      803  0 1857 MBCCH+CBCH    P  2
              TRX-012  U WO      860  0 1857                  0
              TRX-013  U WO      846  0 1857                  1
              TRX-014  U WO      844  0 1857                  2
              TRX-015  U WO      828  0 1857                  0
              TRX-016  U WO      813  0 1857                  1
 10090 30347 BTS-0027  U WO                                                0   2
 KHAKHABYT340D    BB/- 
                                                                               5
              TRX-017  U WO      801  0 1352 MBCCH+CBCH    P  2
              TRX-018  U WO      857  0 1352                  0
              TRX-019  U WO      840  0 1352                  1
              TRX-020  U WO      838  0 1352                  0
              TRX-021  U WO      836  0 1352                  1
              TRX-022  U WO      823  0 1352                  2
              TRX-023  U WO      821  0 1352                  0
              TRX-024  U WO      817  0 1352                  1

=======================================================================================

with code:

def GetTheSentences(infile):
    with con:
       cur = con.cursor()
       cur.execute("DROP TABLE IF EXISTS eei")
       cur.execute("CREATE TABLE eei(BCF INT, BTS INT PRIMARY KEY) ")
    with open(infile) as fp:
        for result_1 in re.split('BCF-', fp.read(), flags=re.UNICODE):
            BCF = result_1[:4]
            for result_2 in re.compile("(?=BTS-)").split(result_1):    
                rec = re.search('TRX-',result_2)
                if rec is not None:
                    BTS = result_2[4:8]
                    print BCF + "," + BTS

I need to split result_1 in BTS-related parts including 13th characters before "BTS-" ("10090 31335 BTS-0010") using regex lookahead and split to result_3 for each TRX but have no success.

Please support!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Python's re.split() doesn't split on zero-length matches.

Therefore re.compile("(?=BTS-)").split(result_1) will never split your string. You need to find a solution without re.split() or use the new regex module.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...