MHonArc: Performance Tips


Introduction

This documents is a guide on how to improve the performance of MHonArc.

The first two sections of this document cover the DOs and DONT's. The DOs provides things you can do to improve the performance of mhonarc. The DON'Ts provides things you should avoiding doing that decrease the performance of mhonarc. There is no requirement that you must follow all, if any of, the DOs and DON'Ts. Depending on your needs and goals, sometimes accepting a loss in performance is required to achieve a particular goal.


DOs

Break up a large archive into a set of smaller ones

MHonArc performance degrades as an archive gets larger. Therefore, a common performance improvement practice is use a sequence of MHonArc archives to comprise the complete archive of a mailing list. The smaller archives are generally organized by time period, like by month.

An example of this practice is provided by mharc, where archives are organized by monthly, or yearly, time periods to avoid performance problems.

Minimize page layout settings

The more resource variables you use in page layout settings, the more processing time is required to render layout. Avoid unnecessary uses of resource variables.

Use FASTTEMPFILES

Use FASTTEMPFILES. Make sure take notice of the security implications before enabling this resource.

Use FIELDORDER

Use the FIELDORDER resource to define which header fields you want to show in message pages. Avoid using the special field value "-extra-". Only essential header fields should be specified.

Use MAXSIZE

Use MAXSIZE to set a limit on the size of your archive. As mentioned earlier, MHonArc performance degrades as an archive gets larger.

If you need to keep older message around, then see: Break up a large archive into a set of smaller ones. Also see the KEEPONRMM resource.

Use MIMEINCS and/or MIMEEXCS

MIMEINCS and MIMEEXCS allows you to explicitly define which media-types you will allow in your archive. Excluding media-types helps reduce message processing overhead, and it can improve the security of your archive.

Use QUIET

QUIET disables informational diagnostics when processing. Error and warning diagnostics will still get printed.

Set TSLICE to smallest range possible

If you do not use $TSLICE$, set the TSLICE resource to "0:0:0" to avoid unnecessary page edits when messages are added to an archive.

If you do choose to use $TSLICE$, set TSLICE to the smallest range you plan on using to minimize the amount of processing overhead.

See Also: Don't use $TSLICE$.


DON'Ts

Don't do real-time archive updates

It is tempting to update an archive right when a message arrives, but if mail traffic volume is high, this can cause a bottleneck and a queuing up of multiple mhonarc process waiting to lock the archive. Even if general traffic is not high, a burst of incoming mail can cause problems.

It is recommended to update an archive on a well-defined periodic basis. It avoids lock queuing problems and minimizes the overhead of invoking the perl interpreter for each incoming message. Facilities like cron (standard on Unix-like systems) can be used to invoke mhonarc on a periodic basis.

Cron-like invocation also make archive administration easier since it is easier to disable archive updates when administration tasks are required. Raw, incoming messages can still be queued up until administrative tasks are complete in order to avoid message bounces.

Don't use message specifications in resource variables

Many message content-related resource variables (e.g. $SUBJECT$, $FROM$, $DATE$, etc), can take a message specification argument. If a specification is provided that references a message that is not the current message, MHonArc must resolve the specification to reference the proper message information to expand the variable.

Some message specifications are more costly than others, they include: TEND, TNEXTTOP, TPARENT, TPREVTOP, TTOP.

Don't use DEFINEDERIVED

DEFINEDERIVED allows you to define extra files to generate for each message. The MHonArc documentation shows how this resource can be used to provide frame-based navigation. Although frame-based navigation seems "cool", avoid it. It normally does not provide any effective improvement in archive navigation and it can be prohibitive in some cases: non-frame-aware browsers and people with disabilities.

Don't use FIELDSTORE

Only use FIELDSTORE if you have a real need for it.

By default, this resource is nil.

Don't use FOLREFS

Disable FOLREFS unless you find it useful. The normal next and previous thread links may be sufficient.

Also, FOLREFS is not subject-aware like the thread links are. Therefore, FOLREFS can provide reader confusion when no follow-ups are listed but the thread index shows follow-ups (due to same message subject).

NOTE:

If you do decide to use $TSLICE$ in message pages, then you should definitely disable FOLREFS since it would be redundant, and probably inconsistent.

Don't use MAILTO

Disabling MAILTO may provide little, to negligible, performance gain, but if you do not care about email address linking, then no need to keep this resource enabled.

Don't use certain MIMEFILTERS filters options

MIMEFILTERS is used to register filters for media-types. Many of the filters provided with MHonArc support a myriad of options to customize their behavior. However, some of these options increase processing overhead. The following highlights filter options to avoid, or consider, from a performance perspective:

Filter options to avoid: The following options will degrade performance:

Filter options to consider: The following options will improve performance:

Don't use MODIFYBODYADDRESSES

Disabling MODIFYBODYADDRESSES may not be an option if you are protecting your archive from address harvesters.

Don't use MODTIME

MODTIME, when enabled, causes each message file modification time to be equal to the date of the message. By default, this resource is disabled.

NOTE:

If you use a search indexer on your archive, enabling MODTIME may actually improve overall performance. Some search indexers key off of a file's modification time to determine if the file needs to be re-indexed. With MODTIME enabled, if a message file is edited by MHonArc (due to new messages being added or EDITIDX), the file will not be unnecessarily re-indexed.

Also, some search indexers may key of the file modification time for purposes of date ordering in search results. If this type of functionality is desired, you will need to enable MODTIME.

Don't use MULTIPG

MULTIPG causes MHonArc indexes to be printed across multiple pages, requiring more processing work versus a single page. If following the advice provided in, "Break up a large archive into a set of smaller ones," using MULTIPG is generally not necessary.

Don't use OTHERINDEXES

OTHERINDEXES provides the ability to generate alternate indexes. Unless the users of your archive have a real need for alternate indexes from the main and thread already provided, avoid using this resource.

For example, many believe an author index is useful, along with the date and thread indexes. However, this may be a subjective perception versus knowing the real reading habits of archive readers. If there are definite needs for alternate navigational services, sometimes a search engine (if you are already using one) can indirectly provide these services.

Don't use SAVERESOURCES if you specify resource settings every time

It is common practice to specify resource settings (especially RCFILE) each time mhonarc is invoked. This is generally done to make administration easier since alternate invocations are not required when an archive is first created versus when it is updated.

If you specify resource settings every time, disable SAVERESOURCES to avoid the unnecessary saving of resource settings to the database.

Don't use SUBJECTTHREADS

When SUBJECTTHREADS is enabled (the default), MHonArc will examine message subjects when computing threads. It is still common for some mail composition software to not include a message-id reference when a user replies to a message.

Subject-based detection adds extra processing overhead during thread computation. If you know messages in your archive define References and/or In-Reply-To header fields for message replies, then disable SUBJECTTHREADS.

Don't use $TSLICE$

The $TSLICE$ resource variable generates a slice of the thread relative to the current message. $TSLICE$ is not part of MHonArc's default resource values, but many users like to include it within message pages as an additional (useful) navigational aid.

The following is an example of what a thread slice may look like:

The "next" and "previous" thread links are already provided within message pages, which may be sufficient for your needs.

See Also: Set TSLICE to smallest range possible.

Don't use USINGLASTPG

If you have disabled MULTIPG, then you do not need to worry about USINGLASTPG. If you choose to use MULTIPG, then disable USINGLASTPG if you can.

If you need to have a links to the last page of an index listing, alternatives to using $PG(LAST)$ can be implemented. For example, under Unix, a post-processing task can create/update a fixed-named symbolic link to the last index page, with archive pages referencing the symbolic link instead of using $PG(LAST)$.


Character Encodings

MHonArc provides robust support for dealing with a variety of character encodings. For an overview of how textual data is processed by MHonArc, see the TEXTENCODE resource.

NOTE:

With respect to email, the term character sets (or charsets for short) is used when discussing character encodings. Both terms will be used interchangeably since the technical differences between the two terms is not relevant for this document.

Some charsets may incur a greater cost to performance. If your archive comprises of only English messages — US-ASCII charset — then there are no performance issues. But if your archive has non-English messages, especially Asian-based encodings, there can be noticeable performance hits.

The following are suggestions you may follow to minimize the performance impacts of charset processing:

Avoid conversion if you do not need it

If your archive will only contain messages in a single encoding, then avoid unnecessary conversion processing. The following resource settings define the absolute minimum in text processing and causes archive messages to be rendered in the default locale of the web browser:

<!-- DECODEHEADS can be used to improve resource variable
     expansion.  See DECODEHEADS resource for more information.  -->
<DecodeHeads>

<!-- Only convert HTML specials -->
<CharsetConverters override>
plain;          mhonarc::htmlize;
default;        -decode-
</CharsetConverters>

If your locale is a non-English, non-Latin-1 one, you may need to specify the locale explicitly in archive pages; the default locale of web browsers may not match the locale of the archive. For example, if your locale is Polish (ISO-8859-2), then something like the following resource settings can be used:

<!-- 
     The following resource settings are just the default settings
     for each resource but with the appropriate <meta http-equiv>
     tag added.
  -->
<DefineVar chop>
HTTP-EQUIV
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
</DefineVar>

<IdxPgBegin>
<html>
<head>
<title>$IDXTITLE$</title>
$HTTP-EQUIV$
</head>
<body>
<h1>$IDXTITLE$</h1>
</IdxPgBegin>

<TIdxPgBegin>
<html>
<head>
<title>$TIDXTITLE$</title>
$HTTP-EQUIV$
</head>
<body>
<h1>$TIDXTITLE$</h1>
</TIdxPgBegin>


<MsgPgBegin>
<html>
<head>
<title>$SUBJECTNA$</title>
<link rev="made" href="mailto:$FROMADDR$">
$HTTP-EQUIV$
</head>
<body>
</MsgPgBegin>

Use the latest version of Perl

Perl 5.8, and later, provides built-in support for character encodings, with UTF-8 supported internally and the Encode module providing character encoding conversion facilities. MHonArc will leverage such features if available and applicable to improve performance.

If using older versions of Perl, MHonArc still provides robust character encoding support, but performance is not as good.

Use TEXTENCODE

If your archive will contain data in multiple encodings, consider using the TEXTENCODE resource. The TEXTENCODE resource allows you to convert all message data into a single encoding, simplifying subsequent processing done by MHonArc. The most common usage of TEXTENCODE is to normalize all message data to UTF-8 (Unicode). See the utf-8-encode.mrc example resource file on how to encode all text to UTF-8.

CAUTION:

Although most modern web browsers support UTF-8, not all search engines do. If you use a search engine, or plan to use one, verify that UTF-8 is supported.


$Date: 2006/06/10 02:42:58 $
MHonArc
Copyright © 2005, Earl Hood, mhonarc@mhonarc.org