Take a look at list.index
(also documented here):
parts = my_str.split(' ')
try:
port_index = parts.index('Port')
except ValueError:
pass # Port name not found
else:
port_name = ' '.join(parts[port_index:port_index + 2])
You can of course do more advanced processing. For example, grab a sequence of uppercased words optionally preceded by a single of
:
def find_name(sentence):
"""
Get the port name or None.
"""
parts = sentence.split(' ')
try:
start = parts.index('Port')
except ValueError:
return None
else:
if start == len(parts) - 1:
return None
end = start + 1
if parts[end] == 'of':
end = end + 1
while end < len(parts) and parts[end][0].isupper():
end += 1
if end == start + 1 or (end == start + 2 and parts[start + 1] == 'of'):
return None
return ' '.join(parts[start:end])
Of course you can do the same thing with regex:
pattern = re.compile(r'Port(?:s+of)?(s+[A-Z]S+)+')
match = pattern.search(my_str)
print(match.group())
This regex will not properly match non-latin uppercase letters. You may want to investigate the solutions here for sufficiently foreign port names.
Both of the solutions here will work correctly for the following two test cases:
'Strong winds may disrupt operations at the Port of Rotterdam on July 5'
'Strong winds may disrupt operations at the Port of Fos-sur-Mer on July 5'
'Strong winds may disrupt operations at Port Said on July 5'
You can likely improve the search further, but this should give you the tools to get a solid start. At some point, if the sentences become complex enough, you may want to use natural language processing of some kind. For example, look into the nltk package.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…