Manitou-Mail Home

6. mdx plugins

The functionalities of manitou-mdx can be extended by writing Perl modules that are hooked to manitou-mdx as plugins. A plugin module may be a self-contained piece of code, but more often it's used as an interface to an external program, such as a spam filter or a converter that extracts text from a complex file format. Also, plugins may use external Perl modules from the CPAN archive or other sources.

Here are some examples of what plugins can do:

6.1. Plugin types

Plugins can be called at different stages of the mail processing, depending on what kind of functionality they provide. There are four types of plugins:
  • incoming_preprocess_plugins are called before a message is opened and analyzed by manitou-mdx, and have read and write access to the raw mail file. For example, a sample spamassassin client is implemented as an incoming_preprocess_plugins, because SA expects a raw mail file, as does clamav.

  • incoming_mimeprocess_plugins are called after the message has been parsed by manitou-mdx, and before it's put into the database. These plugins have read/write access to the parsed form of the message (perl MIME object with decoded parts).

  • incoming_postprocess_plugins are called after a message has been commited into the database and all associated actions have been carried out.

  • outgoing_plugins are called before passing an outgoing message to the mail system. They can modify the message, for example by adding a digital signature or additional headers.

  • maintenance_plugins are called periodically by the mdx to carry out special maintenance tasks such as archiving files or running statistics. Maintenance plugins perform tasks that could be assigned to crontab scripts, except that they have access to an already opened database connection and current configuration options, and they're not run concurrently with the other mdx functions, such as importing or sending messages, so that no database access contention occurs.

6.2. Declaration

Plugins are declared in the configuration file, through these directives:
      # in the [common] section of the configuration
      plugins_directory = /path/to/plugins

      # in a [mailbox] section (the plugin will be called only for mails to that mailbox)
      incoming_preprocess_plugins = plugin1(arg1,arg2,...) \
      plugin2(arg1,arg2,arg3...) \
      plugin3 \
      ...

      incoming_postprocess_plugins: same syntax than incoming_preprocess_plugins

      incoming_mimeprocess_plugins: same syntax than incoming_preprocess_plugins

      outgoing_plugins: same syntax than incoming_preprocess_plugins

      maintenance_plugins: the declaration begins with a time frequency specification, which is either:
          o an interval expressed as a number of hours. Example: 10h. That will make the plugin run every 10 hours, starting 10 hours after the program launch.
          o an interval expressed as a number of minutes. Example: 100mn.
          o a point in time. Example 12h10 will make the plugin run every day at 12h10. The hour range is 0 to 23, 0 being midnight.
          o a point in time every hour. Example: *:20 will make the plugin run at 0:20, 1:20, 2:20 and so on. 

      They are not tied to a mailbox, and thus should appear in the [common] section.

Invocation of maintenance plugins are serialized: two plugins never run at the same time. Also, mail import and export is stopped when a plugin runs. Example:

[common]
plugins_directory = /usr/local/manitou/plugins
maintenance_plugins = 5h vacuum \
		      23:30 reporting

[mailbox@example.com]
#  plugins for that mailbox
incoming_preprocess_plugins = spamc("tag for spam")
incoming_postprocess_plugins = guesslang("en","fr") \
                               anotherplugin(arg1,arg2,...)
outgoing_plugins = gpgsign

[mailbox2@example.com]
incoming_preprocess_plugins = spamc("another tag for spam")

Note that a declaration containing multiple plugins must be splitted across multiple lines, each line except the last one being ended by a backslash.

6.3. Included plugins

The distribution archive include those sample plugins:

  • spamc: an incoming_preprocess plugin that calls the SpamAssassin client (spamc) on incoming messages to determine their spam probability and add a user-defined tag when positive.

  • spam_learner: a maintenance plugin that runs sa-learn at regular intervals to feed the spamassassin Bayes database with new legitimate and spam messages. It handles the re-training of the spam filter in a way that requires no other action from the end-user than moving messages to or from the trashcan.

  • mswordindexer: a mime-process plugin that pass MS-word .doc attachments on to wvWare, convert them to Unicode text and add their words to the full-text index of the body. This allows the user interface's search engine to retrieve those messages based on the contents of their attachments.

  • html_indexer: a mime-process plugin that pass the HTML contents from a message to the full-text indexer, allowing the search engine to retrieve these contents. The code requires the HTML::TreeBuilder and HTML::FormatText packages from the CPAN Perl Archive.

  • pdfindexer: an incoming_mimeprocess plugin that converts attached pdf files to text and feed them to the full text indexer. The conversion requires an external program, such as the pdftotext utility.

  • bayes_classify: a mime-process plugin that does automatic mail classification according to words contained in the message. The so-called "naive bayesian algorithm" has become popular for its efficiency in spam filtering, but can also be used more generally for automated classification: a Wikipedia entry explains this in detail. Read the case study on automatic language detection for a step by step introduction on how to set up and use the classifier plugin in Manitou-Mail.

  • attach_uploader: an outgoing plugin that replaces large attachments by a web link to the contents before sending a message. Such attachments are not welcome in Internet mail, and sometimes rejected, because they bog down mail servers, eat people's mailboxes quotas, and may cause trouble when being fetched.

    When this plugin is enabled, large attachments (the inline maximum size is configurable) are uploaded to an FTP server instead of being sent along with the mail.

    The plugin let Manitou-Mail users just attach files as usual and let manitou-mdx care whether they should be part of the message or taken outside.

  • attach_uploader_ssh: an outgoing plugin similar to attach_uploader except that it uses SSH instead of FTP to upload the files to the remote server, allowing the use of encryption and public/private key pairs for authentication.

  • tnef_decoder: an incoming_mimeprocess plugin that decodes the TNEF format produced by Outlook and replaces the application/ms-tnef part by the original attached contents. The code requires the Convert::TNEF module from CPAN.

A plugin is activated by installing the corresponding perl module file (with a .pm suffix) in the plugins directory, and declaring it in the configuration file, optionally with arguments, like this:

[common]
# plugins_directory = /usr/share/perl5/Manitou/Plugins
# learn new spam every 2 hours
maintenance_plugins = 2h spam_learner

[mailbox@company.com]
# these plugins will be called only for that mailbox
incoming_preprocess_plugins = spamc("tag for spam")
outgoing_plugins = gpgsign
incoming_postprocess_plugins = guesslang("en","fr") \
                               anotherplugin(arg1,arg2,...)

6.4. Developer information

6.4.1. Architecture

A plugin named 'myplugin' should provide a myplugin.pm file declaring a myplugin package matching this structure:

package myplugin;

sub init {
  my $dbh=shift;
  my @args=@_;	# user-supplied arguments

  # optionally do initialization stuff using $dbh and @args

  bless {}      # create the perl object and returns a reference to it
}

sub finish {
  # do optional cleaning
  1;
}

sub process {
  my ($self, $context) = @_;

  # $self=class pointer

  # $context=pointer to a hash set up by the caller and that contains those keys:
  #   stage: string whose value is "preprocess", "mimeprocess", "postprocess", "outgoing" or "maintenance"
  #   dbh: the DBI connection object pointing to the mail database
  #   filename: full path of the mailfile (undef if $context->{stage} is "outgoing" or "maintenance")
  #   mail_id: an integer containing the unique id of the mail (undef if the stage is "preprocess" or "maintenance")
  #   mimeobj: the MIME object containing the mail (undef if the stage is "preprocess" or "maintenance")
  #   notice_log: a pointer to a function to log a notice message (to be called with the message as argument)
  #   error_log: a pointer to a function to log an error message (to be called with the message as argument)

  # The context is also used to communicate results to the caller and to the other plugins
  # along the chain. The keys that are going to be interpreted are:
  #   tags= perl array of tag names that should be assigned to the message.
  #   action: undef if no particular action (insert the incoming message as new)
  #     "discard" to discard an incoming message, "trash" to trash it


  # Process the mail...
  1;
}

1;   # the package should evaluate to 1

The plugin's name must not contain any non-ascii nor punctuation or space characters. Digits are allowed except for the first character of the name.

6.4.2. Loading and initialization

When manitou-mdx is launched, it searches the configuration file for all plugins and includes each corresponding perl module file once, using perl's require command. It means that in the example configuration shown, it would load these files from /usr/local/manitou/plugins: guesslang.pm, anotherplugin.pm, spamc.pm, gpgsign.pm (note that spamc will be instantiated twice). For each reference to a plugin, a new plugin perl class is instantiated, and its init() function called with the arguments from the config file along with a database handle. For the example configuration, the following instantiations would occur:

$pl1 = guesslang::init($dbh, "en","fr");
$pl2 = anotherplugin::init($dbh, arg1,arg2,...);
$pl3 = spamc::init($dbh, "tag for spam");
$pl4 = gpgsign::init($dbh);
$pl5 = spamc::init($dbh, "another tag for spam");

In the example above, since spamc is used twice, two separate spamc objects are instantiated and spamc::init is called on each of them. Variables used inside the spamc module may or may not be shared between different instances of a module: it is the responsibility of the plugin's writer to make the right choice between private variables inside the object ($self->var) or variables shared at the package level (my $var).

The user-supplied arguments for the init function are passed verbatim from the configuration file, and thus are to be considered as perl text. (technically, the call to the plugin's init function is embedded inside a perl eval call). Thus those user supplied arguments should be expressed as they would be inside a perl program, and every special character should be quoted according to perl syntax rules. Examples:

# @ sign quoted as per perl strings requirements
incoming_preprocess_plugins=spamc("tag\@cf")

# Simple quotes can be also used to avoid perl variables interpolation
# inside strings
incoming_preprocess_plugins=spamc('tag@cf$')

# On the other hand, we may want to use perl's interpreter to evaluate
# variables: here we're passing the value of the MAIL_ARCH_DIR environment
# variable.
incoming_postprocess_plugins=archivemsg($ENV{MAIL_ARCH_DIR})

6.4.3. Processing

6.4.3.1. Incoming mail
Each time a new message file appears in the mailfiles_directory associated to an identity, manitou-mdx carries out these actions:
  1. It changes the file suffix (the part at the right of the dot) from .received to .pid.processing

  2. If there are incoming_preprocess_plugins, it calls their process() functions, in the order of their declaration in the configuration file. The plugins are allowed to modify the mailfile if needed.

  3. If one of the plugins has marked the mail as "to be discarded", the mailfile suffix is changed to .discarded and the processing of this message is stopped.

  4. The mailfile is opened, parsed and put into a perl MIME object.

  5. If there are incoming_mimeprocess_plugins, their process() functions are called, in the order of their declaration in the configuration file. Plugins are allowed to modify the MIME object in memory in every way they see fit.

  6. Once again, if one of the plugins has marked the mail as "to be discarded", the mailfile suffix is changed to .discarded and the processing of this message is stopped.

  7. manitou-mdx does its own processing of the message, using the MIME object possibly modified by the plugins previously run. Then if no error has occured, it inserts it into the database and commits the transaction.

  8. It changes the mailfile's suffix from .pid.processing to .processed, or to .error if an error has occured during the previous step.

  9. If there are incoming_postprocess_plugins, it calls them in the order of their declaration in the configuration file. Once all the plugins have been called, it commits whatever changes have been made in the database.

6.4.3.2. Outgoing mail

When a message is ready to be sent (its status has the "Outgoing" bit and not the "Sent" bit), manitou-mdx does the following:

  1. If there are outgoing_plugins attached to the identity of the message, their process functions are called with the outgoing mail built as a perl MIME object in memory, and the unique mail identifier in the database (mail_id).

  2. Once all plugins have been run, the message will be passed from the MIME object in memory (possibly modified by the plugins) to the local mailer. Note that it is the plugin's responsibility to update the message in the database to reflect whatever change it makes in the MIME object in memory.

6.4.3.3. Mailbox import

When importing a mailbox with manitou-mdx (using the --mboxfile option), incoming_mimeprocess_plugins are called if:

  • the --mailbox option is also specified in the command line with the value for an identity.

  • the configuration has a declaration for that identity.

  • that declaration includes incoming_mimeprocess_plugins lines.

Other kinds of plugins are not called. Each mail contained in the mailbox file goes through steps 4 to 7 of the incoming mail process described above.