As other answers have mentioned, GPE for the pre-trained Spacy model is for countries, cities and states. However, there is a workaround and I'm sure several approaches can be used.
One approach: You could add a custom tag to the model. There is a good article on Towards Data Science that could help you do that. Gathering training data for this could be a hassle as you would need to tag cities/countries per their respective location in the sentence. I quote the answer from Stack Overflow:
Spacy NER model training includes the extraction of other "implicit" features, such as POS and surrounding words.
When you attempt to train on single words, it is unable to get generalized enough features to detect those entities.
An easier workaround to this could be the following:
Install geonamescache
pip install geonamescache
Then use the following code to get the list of countries and cities
import geonamescache
gc = geonamescache.GeonamesCache()
# gets nested dictionary for countries
countries = gc.get_countries()
# gets nested dictionary for cities
cities = gc.get_cities()
The documentation states that you can get a host of other location options as well.
Use the following function to get all the values of a key with a certain name from a nested dictionary (obtained from this answer)
def gen_dict_extract(var, key):
if isinstance(var, dict):
for k, v in var.items():
if k == key:
yield v
if isinstance(v, (dict, list)):
yield from gen_dict_extract(v, key)
elif isinstance(var, list):
for d in var:
yield from gen_dict_extract(d, key)
Load up two lists of cities
and countries
respectively.
cities = [*gen_dict_extract(cities, 'name')]
countries = [*gen_dict_extract(countries, 'name')]
Then use the following code to differentiate:
nlp = spacy.load("en_core_web_sm")
doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')
for ent in doc.ents:
if ent.label_ == 'GPE':
if ent.text in countries:
print(f"Country : {ent.text}")
elif ent.text in cities:
print(f"City : {ent.text}")
else:
print(f"Other GPE : {ent.text}")
Output:
City : Tempe
Other GPE : AZ
Country : United States
Country : United States
City : Tempe
Other GPE : AZ
Country : United States
City : Tempe
Other GPE : AZ
Country : United States