Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
214 views
in Technique[技术] by (71.8m points)

python - splitting regex result in python3

Friends,

thanks for your help in this thread, but due to my limited knowledge in python, I am unable to solve my problem. So, here is the full version of my intention.

I will be very happy if someone show me the way.

The input file

--------------------------------- potentials ----------------------------------
-------------------------------------------------------------------------------
 1.  Ni  type=1   np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ni1.pot
 2.  Ni  type=2   np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ni2.pot
 3.  Ni  type=3   np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ni3.pot
 4.  Ni  type=4   np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ni4.pot
 5.  Mn  type=5   np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn1.pot
 6.  Mn  type=6   np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn2.pot
 7.  Mn  type=7   np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn3.pot
 8.  Mn  type=8   np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn4.pot
 9.  Mn  type=9   np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn5.pot
 10. Mn  type=10  np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn6.pot
 11. Mn  type=11  np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn7.pot
 12. Mn  type=12  np=1001 r1=1.0E-05  rnp=-1.68149622  pfile=Mn8.pot
 13. Ge  type=13  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ge1.pot
 14. Si  type=14  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Si1.pot
 15. Ge  type=15  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ge2.pot
 16. Si  type=16  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Si2.pot
 17. Ge  type=17  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ge3.pot
 18. Si  type=18  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Si3.pot
 19. Ge  type=19  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Ge4.pot
 20. Si  type=20  np=1001 r1=1.0E-05  rnp=-1.35602175  pfile=Si4.pot
-------------------------------------------------------------------------------
------------------------------------- CPA -------------------------------------
-------------------------------------------------------------------------------
 1.  cpasite=5 nsubl=4 cpatypes=3,4,5,6
 2.  cpasite=6 nsubl=2 cpatypes=7,8
 3.  cpasite=7 nsubl=2 cpatypes=9,10
 4.  cpasite=8 nsubl=2 cpatypes=11,12
 5.  cpasite=9 nsubl=2 cpatypes=13,14
 6.  cpasite=12 nsubl=6 cpatypes=15,16,17,18,19,20

I have used the code:

#!/usr/bin/python3

import re
f1=open("file.str","r")
pattern3=r'(d+).s*(.*)s+ type=(d+).* pfile=(.*)'
pattern4=r'(d+). s* cpasite=(.*)s* nsubl=(.*)s* cpatypes=(.*)'
count=[]; atype=[]; apots=[]; files=[]
xx=[];ckomp=[]; csubl=[]; sites=[];xx2=[]
slist=[]
for line in f1:
  match3=re.search(pattern3,line)
  match4=re.search(pattern4,line)
  if match3:
    count.append(int(match3.group(1)))
    atype.append((match3.group(2)))
    apots.append((match3.group(3)))
    files.append(match3.group(4))
  if match4:
    xx.append(match4.group(1))
    xx2.append(match4.group(2))
    ckomp.append(match4.group(3))
    sites.append(match4.group(4))

print(sites)
print(files)
print(count)

which yeilds result:

$ python tryeos.py 
['3,4,5,6', '7,8', '9,10', '11,12', '13,14', '15,16,17,18,19,20']
['Ni1.pot', 'Ni2.pot', 'Ni3.pot', 'Ni4.pot', 'Mn1.pot', 'Mn2.pot', 'Mn3.pot', 'Mn4.pot', 'Mn5.pot', 'Mn6.pot', 'Mn7.pot', 'Mn8.pot', 'Ge1.pot', 'Si1.pot', 'Ge2.pot', 'Si2.pot', 'Ge3.pot', 'Si3.pot', 'Ge4.pot', 'Si4.pot']
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Which is correct. But I am not sure how to group them.

The problem is, I have 20 atom(count), the sites shows which atoms are together(e.g. 3,4,5, and 6 is together, so is 7 and 8, and also '15,16,17,18,19,20'). files are the name of the atom.

So, the intended output should be:

#count 1-2 are not grouped in sites, so, they are alone
   group=1
   atom=Ni1.pot

   group=2
   atom=Ni2.pot

#count 3-6 are grouped together
   group=5
   atom=Ni3.pot, Ni4.pot, Mn1.pot, Mn2.pot

#count 7 &8 is grouped
   group=6
   atom=Mn3.pot, Mn4.pot

and so on.

Can I get some help on achieving this?

NB the group= is not important. this can be any integer. for my practice, I usually put it equal to .

after njzk2's answer I tried to impliment that, as:

  for indices in sites:
    indices = map(int, indices.split(','))
    atoms = []
    for cnt in indices:
      i = count.index(cnt)
      atoms.append(files[i])
      del files[i]
      del count[i]
    print str(atoms)
    for f in files:
      print f 

its taking only the first group, with an error:

$ python tryeos.py 
['Ni3.pot', 'Ni4.pot', 'Mn1.pot', 'Mn2.pot']
Ni1.pot
Ni2.pot
Mn3.pot
Mn4.pot
Mn5.pot
Mn6.pot
Mn7.pot
Mn8.pot
Ge1.pot
Si1.pot
Ge2.pot
Si2.pot
Ge3.pot
Si3.pot
Ge4.pot
Si4.pot
Traceback (most recent call last):
  File "tryeos.py", line 28, in <module>
    i = count.index(cnt)
ValueError: 3 is not in list
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your structure is probably not the most optimized for this problem, but you can still format your output this way:

for indices in sites:
    indices = map(int, indices.split(','))
    atoms = []
    for cnt in indices:
        i = count.index(cnt)
        atoms.append(files[i])
        del files[i]
        del count[i]
    # A group of atoms
    print str(atoms)
for f in files:
    # A single file
    print f

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...