regex - How to parse data effectively with python

Question

Welcome To Ask or Share your Answers For Others

regex - How to parse data effectively with python

posted Feb 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - How to parse data effectively with python

Here is my code for extracting fields that i want to.
But, I don't think it works effectively because extracting is depends on count of fields.
Surely It's not important in small data however, I want to know better way.
So I want to extract at once or more effectively
Sorry for my stupidity.

import re

data="""
Message-ID: <[email protected]>
Received: from 125.209.x.x (net58.219.x-x.host.lt-nn.net [91.219.x.x])
 by crcvmail15.google.com with ESMTP id +844Q-zuS122aEqk5CZDZg
 for <[email protected]>;
Received: from 125.209.x.x (net58.219.x-18.host.lt-nn.net [91.219.x.x])
 by crcvmail15.google.com with ESMTP id +844Q-zuS122aEqk5CZDZg
 for <[email protected]>;
 Tue, 22 Dec 2020 11:20:58 -0000
From: "test"<[email protected]>
To: [email protected]
Subject:example email
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
"""

def searchHeader(field):
    form = re.search(r'('+field+'W+(.*?)
)',data)
    if form:
        print(form.group())

fields = ['From','To','Cc','Subject','Message-ID','Date','(Return-Path|Reply-To)']
for field in fields:
    res = searchHeader(field)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-16T17:45:35+0000

Depending on your definition of "effective" you can make use of named capture groups:

(?P<field>^[w-]+): *(?P<value>[sS]+?)(?=^[w-]+: *|)

(?P<field>^[w-]+) - name a capture group "field" and capture everything from the beginning of the line which is a w char or - dash.
: * - capture a colon followed by optional spaces.
(?P<value>[sS]+?) - name a capture group "value" and capture everything (including newlines). If you enable the dotall modifier then .+? could be used in place of [sS]+?. This ensures we capture the multiline values which can be found after Received:.
(?=^[w-]+: *|) - continue capturing the "value" until we hit a new "field" or the end of the string.

https://regex101.com/r/rBBRfM/1

You can see performance stats in the upper right at regex101.

Categories

regex - How to parse data effectively with python

regex - How to parse data effectively with python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags