While doing a regex pattern match, we get the content which has been a match. What if I want the pattern which was found in the content?
See the below example:
>>> import re
>>> r = re.compile('ERP|Gap', re.I)
>>> string = 'ERP is integral part of GAP, so erp can never be ignored, ErP!'
>>> r.findall(string)
['ERP', 'GAP', 'erp', 'ErP']
but I want the output to look like this : ['ERP', 'Gap', 'ERP', 'ERP']
Because if I do a group by and sum on the original output, I would get the following output as a dataframe:
ERP 1
erp 1
ErP 1
GAP 1
gap 1
But what if I want the output to look like
ERP 3
Gap 2
in par with the keywords I am searching for?
MORE CONTEXT
I have a keyword list like this: ['ERP', 'Gap']
. I have a string like this: "ERP, erp, ErP, GAP, gap"
I want to take count of number of times each keyword has appeared in the string. Now if I am doing a pattern matching, I am getting the following output: [ERP, erp, ErP, GAP, gap]
.
Now if I want to aggregate and take a count, I am getting the following dataframe:
ERP 1
erp 1
ErP 1
GAP 1
gap 1
While I want the output to look like this:
ERP 3
Gap 2
See Question&Answers more detail:
os