Package pyzmail :: Module parse
[hide private]
[frames] | no frames]

Module parse

source code

Useful functions to parse emails

Classes [hide private]
  MailPart
Data related to a mail part (aka message content, attachment or embedded content in an email)
  PyzMessage
Inherit from email.message.Message.
  PzMessage
Old name and interface for PyzMessage.
Functions [hide private]
str
_friendly_header(header)
Convert header returned by email.message.Message.get() into a user friendly string.
source code
 
decode_mail_header(value, default_charset='us-ascii')
Decode a header value into a unicode string.
source code
list
get_mail_addresses(message, header_name)
retrieve all email addresses from one message header
source code
None or unicode
get_filename(part)
Find the filename of a mail part.
source code
 
_search_message_content(contents, part)
recursive search of message content (text or HTML) inside the structure of the email.
source code
dict
search_message_content(mail)
search of message content (text or HTML) inside the structure of the mail.
source code
list
get_mail_parts(msg)
return a list of all parts of the message as a list of MailPart.
source code
tuple
decode_text(payload, charset, default_charset)
Try to decode text content by trying multiple charset until success.
source code
PyzMessage
message_from_string(s, *args, **kws)
Parse a string into a PyzMessage object model.
source code
PyzMessage
message_from_file(fp, *args, **kws)
Read a file and parse its contents into a PyzMessage object model.
source code
PyzMessage
message_from_bytes(s, *args, **kws)
Parse a bytes string into a PyzMessage object model.
source code
PyzMessage
message_from_binary_file(fp, *args, **kws)
Read a binary file and parse its contents into a PyzMessage object model.
source code
Variables [hide private]
  quoted = '"(?:\\\\[^\\r\\n]|[^\\\\"])*"'
  email_address_re = re.compile(r'^(?:[a-zA-Z0-9_!#\$%&\'\*\+/=\...
a regex that match well formed email address (from perlfaq9)
  _line_end_re = re.compile(r'\r\n|\n\r|\n|\r')
  __package__ = 'pyzmail'
  invalid_chars_in_filename = '\x00\x01\x02\x03\x04\x05\x06\x07\...
  invalid_windows_name = ['CON', 'PRN', 'AUX', 'NUL', 'COM1', 'C...
Function Details [hide private]

_friendly_header(header)

source code 

Convert header returned by email.message.Message.get() into a user friendly string.

Py3k email.message.Message.get() return header.Header() with charset set to charset.UNKNOWN8BIT when the header contains invalid characters, else it return str as Python 2.X does

Parameters:
  • header (str or email.header.Header) - the header to convert into a user friendly string
Returns: str
the converter header

decode_mail_header(value, default_charset='us-ascii')

source code 

Decode a header value into a unicode string. Works like a more smarter python u"".join(email.header.decode_header() function

Parameters:
  • value (str) - the value of the header.
  • default_charset (str) - if one charset used in the header (multiple charset can be mixed) is unknown, then use this charset instead.
    >>> decode_mail_header('=?iso-8859-1?q?Courrier_=E8lectronique_en_Fran=E7ais?=')
    u'Courrier \xe8lectronique en Fran\xe7ais'

get_mail_addresses(message, header_name)

source code 

retrieve all email addresses from one message header

Parameters:
  • message (email.message.Message) - the email message
  • header_name (str) - the name of the header, can be 'from', 'to', 'cc' or any other header containing one or more email addresses
Returns: list
a list of the addresses in the form of tuples [(u'Name', 'addresse@domain.com'), ...]
>>> import email
>>> import email.mime.text
>>> msg=email.mime.text.MIMEText('The text.', 'plain', 'us-ascii')
>>> msg['From']=email.email.utils.formataddr(('Me', 'me@foo.com'))
>>> msg['To']=email.email.utils.formataddr(('A', 'a@foo.com'))+', '+email.email.utils.formataddr(('B', 'b@foo.com'))
>>> print msg.as_string(unixfrom=False)
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Me <me@foo.com>
To: A <a@foo.com>, B <b@foo.com>
<BLANKLINE>
The text.
>>> get_mail_addresses(msg, 'from')
[(u'Me', 'me@foo.com')]
>>> get_mail_addresses(msg, 'to')
[(u'A', 'a@foo.com'), (u'B', 'b@foo.com')]

get_filename(part)

source code 

Find the filename of a mail part. Many MUA send attachments with the filename in the name parameter of the Content-type header instead of in the filename parameter of the Content-Disposition header.

Parameters:
  • part (inherit from email.mime.base.MIMEBase) - the mail part
Returns: None or unicode
the filename or None if not found
>>> import email.mime.image
>>> attach=email.mime.image.MIMEImage('data', 'png')
>>> attach.add_header('Content-Disposition', 'attachment', filename='image.png')
>>> get_filename(attach)
u'image.png'
>>> print attach.as_string(unixfrom=False)
Content-Type: image/png
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="image.png"
<BLANKLINE>
ZGF0YQ==
>>> import email.mime.text
>>> attach=email.mime.text.MIMEText('The text.', 'plain', 'us-ascii')
>>> attach.add_header('Content-Disposition', 'attachment', filename=('iso-8859-1', 'fr', u'Fran\xe7ais.txt'.encode('iso-8859-1')))
>>> get_filename(attach)
u'Fran\xe7ais.txt'
>>> print attach.as_string(unixfrom=False)
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename*="iso-8859-1'fr'Fran%E7ais.txt"
<BLANKLINE>
The text.

_search_message_content(contents, part)

source code 

recursive search of message content (text or HTML) inside the structure of the email. Used by search_message_content()

Parameters:
  • contents (dict) - contents already found in parents or brothers parts. The dictionary will be completed as and when. key is the MIME type of the part.
  • part (inherit email.mime.base.MIMEBase) - the part of the mail to look inside recursively.

search_message_content(mail)

source code 

search of message content (text or HTML) inside the structure of the mail. This function is used by get_mail_parts() to set the is_body part of the MailParts

Parameters:
  • mail (inherit from email.message.Message) - the message to search in.
Returns: dict
a dictionary of the form {'text/plain': text_part, 'text/html': html_part} where text_part and html_part inherite from email.mime.text.MIMEText and are respectively the text and HTML version of the message content. One part can be missing. The dictionay can aven be empty if none of the parts math the requirements to be considered as the content.

get_mail_parts(msg)

source code 

return a list of all parts of the message as a list of MailPart. Retrieve parts attributes to fill in MailPart object.

Parameters:
  • msg (inherit email.message.Message) - the message
Returns: list
list of mail parts
>>> import email.mime.multipart
>>> msg=email.mime.multipart.MIMEMultipart(boundary='===limit1==')
>>> import email.mime.text
>>> txt=email.mime.text.MIMEText('The text.', 'plain', 'us-ascii')
>>> msg.attach(txt)
>>> import email.mime.image
>>> image=email.mime.image.MIMEImage('data', 'png')
>>> image.add_header('Content-Disposition', 'attachment', filename='image.png')
>>> msg.attach(image)
>>> print msg.as_string(unixfrom=False)    
Content-Type: multipart/mixed; boundary="===limit1=="
MIME-Version: 1.0
<BLANKLINE>
--===limit1==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
<BLANKLINE>
The text.
--===limit1==
Content-Type: image/png
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="image.png"
<BLANKLINE>
ZGF0YQ==
--===limit1==--
>>> parts=get_mail_parts(msg)
>>> parts
[MailPart<*text/plain charset=us-ascii len=9>, MailPart<image/png filename=image.png len=4>]
>>> # the star "*" means this is the mail content, not an attachment 
>>> parts[0].get_payload().decode(parts[0].charset)
u'The text.'
>>> parts[1].filename, len(parts[1].get_payload())
(u'image.png', 4)

decode_text(payload, charset, default_charset)

source code 

Try to decode text content by trying multiple charset until success. First try charset, else try default_charset finally try popular charsets in order : ascii, utf-8, utf-16, windows-1252, cp850 If all fail then use default_charset and replace wrong characters

Parameters:
  • payload (str) - the content to decode
  • charset (str or None) - the first charset to try if != None
  • default_charset (str or None) - the second charset to try if != None
Returns: tuple
a tuple of the form (payload, charset)
  • payload: this is the decoded payload if charset is not None and payload is a unicode string
  • charset: the charset that was used to decode payload If charset is None then something goes wrong: if payload is unicode then invalid characters have been replaced and the used charset is default_charset else, if payload is still byte string then nothing has been done.

message_from_string(s, *args, **kws)

source code 

Parse a string into a PyzMessage object model.

Parameters:
  • s (str) - the input string
Returns: PyzMessage
the PyzMessage object

message_from_file(fp, *args, **kws)

source code 

Read a file and parse its contents into a PyzMessage object model.

Parameters:
  • fp (text_file) - the input file (must be open in text mode if Python >= 3.0)
Returns: PyzMessage
the PyzMessage object

message_from_bytes(s, *args, **kws)

source code 

Parse a bytes string into a PyzMessage object model. (Python >= 3.2)

Parameters:
  • s (bytes) - the input bytes string
Returns: PyzMessage
the PyzMessage object

message_from_binary_file(fp, *args, **kws)

source code 

Read a binary file and parse its contents into a PyzMessage object model. (Python >= 3.2)

Parameters:
  • fp (binary_file) - the input file, must be open in binary mode
Returns: PyzMessage
the PyzMessage object

Variables Details [hide private]

email_address_re

a regex that match well formed email address (from perlfaq9)
Value:
re.compile(r'^(?:[a-zA-Z0-9_!#\$%&\'\*\+/=\?\^`\{\}~\|-]+(?:\.[a-zA-Z0\
-9_!#\$%&\'\*\+/=\?\^`\{\}~\|-]+)*|"(?:\\[^\r\n]|[^\\"])*")@(?:[a-zA-Z\
0-9_!#\$%&\'\*\+/=\?\^`\{\}~\|-]+(?:\.[a-zA-Z0-9_!#\$%&\'\*\+/=\?\^`\{\
\}~\|-]+)*|\[(?:\\\S|[!-Z\^-~])*\])$')

invalid_chars_in_filename

Value:
'''\x00\x01\x02\x03\x04\x05\x06\x07\x08\t
\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\
\x1d\x1e\x1f<>:"/\\|?*\\%\''''

invalid_windows_name

Value:
['CON',
 'PRN',
 'AUX',
 'NUL',
 'COM1',
 'COM2',
 'COM3',
 'COM4',
...