Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

shell - Fetch imap body message by telnet

I know that to get all message body, this is the command:

[imap_code] UID FETCH [uid] BODY.PEEK[TEXT]

Thus I get the entire message body. But I need to exclude the part of the attachments. I want only message wrote from sender, text and/or html.

Is there a way?

This is a full raw html mail with attachment

http://pastebin.com/FMEQdLM3

I would like to get only

<div dir="ltr">This is the message body<div><ul><li>one</li><li>two</li></ul></div></div>

or plain text if there isn't html version

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Messages are laid out in an arbitrary tree of parts, with parent items being of the multipart/* or message/rfc822 type, and children being of other types. The FETCH BODY[...] lets arbitrarily extract any of these parts.

Unfortunately, there is no standard layout for messages. You can fetch the BODYSTRUCTURE item to get the MIME layout of a message, but it is very difficult to parse by eye.

That being said, there's a few common message layouts that will get you most of the way.

The easiest is a message with just one body, either text/html or text/plain. Just fetch BODY[TEXT].

The next is multi-format, with both text/html and text/plain. Its MIME structure generally looks like this:

+ multipart/alternative   [TEXT]
|- text/plain             [1]
- text/html              [2]

In this case you want to fetch BODY[2].

If the message is single-body, with attachments, it will look something like this:

+ multipart/mixed or multipart/related  [TEXT]
|- text/html or text/plain              [1]
|- image/jpg                            [2]
| ...
- image/gif

In this case you want BODY[1].

Last is both of these: multi-format body with attachments. It will tend to look something like:

+ multipart/mixed or multipart/related  [TEXT]
|-+ multipart/alternative               [1]
| |- text/plain                         [1.1]
| - text/html                          [1.2]
|- image/jpeg                           [2]
|- image/gif                            [3]
|...
- image/png

In this case, you probably want BODY[1.2]. Your sample message is of this type.


In addition, the bodies may be encoded in Quoted-Printable or Base64 encoding. Unfortunately, Baseline IMAP does not provide any way for the server to decode this for you. Quoted-Printable can be mostly read if the message is ascii, but will have lots of `=` escapes throughout the body. If it's base64, you're not going to be able to decipher it by eye. The [BINARY IMAP extension](https://www.rfc-editor.org/rfc/rfc3516) can help with this, but this is not widely deployed.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...