dwww Home | Show directory contents | Find package

INTRODUCTION

This is the development area for OpenDKIM's statistics collection system.

In earlier versions, the statistics system gathered numerous details about
messages and their DKIM signatures in order to produce a report about
the observed implementation of DKIM on the Internet.  The data recorded
included the number of signatures, percent of signature failures, first-party
vs. third-party signatures, ADSP statistics, etc.  These data were collected
and submitted to the Internet Engineering Task Force as part of a
requirement to advance DKIM to Draft Standard status, and this work was
very successful.

As of v2.5.0, the statistics system has been repurposed to support ongoing
research and development into open DKIM-based reputation services.  The data
collected has been drastically reduced to include only information about
message arrival times, From domains, signing domains, sending IP addresses,
and whether or not the message was reported as spam.

You are invited, but certainly not required, to provide this
information to The Trusted Domain Project.  The code that does the reporting is
open source and included in this package, so you can easily verify that
nothing other than the above is being revealed in the data thus submitted.
Participants that submit data will be invited to be part of an experimental
reputation system based on DKIM that should begin to appear in the near future.

This directory contains information and software for generating, receiving
and processing such reports, including commands for creating a MySQL database
to store the reports for later query, and a program that can receive reports
generated by opendkim-stats to put such data directly into that database.


INSTALLATION

These instructions are for setting up your own reputation database and a
feed to it.  For instructions about providing a feed to an external service
only, skip to the EXTERNAL FEEDS section.

This system can be made to work with any SQL-based system, but the provided
scripts and documentation presume MySQL.

1.      Compile OpenDKIM using the "--enable-stats" option:

        % ./configure --enable-stats
        % make

        If you wish to apply local extensions to the statistics reporting
        system, see the EXTENSIONS section below.

2.      Install MySQL.  Create credentials for an "opendkim" user, optionally
        with some password.  Create a database called "opendkim".  Note that
        this does NOT refer to a "real" UNIX user and password.  The user you
        create must be granted SELECT and INSERT access to that table.

3.      From inside the MySQL client, source the "mkdb.mysql" script.
        This will create the tables required to store the statistics reports.

4.      Install the "opendkim-stats" program someplace.

5.      Configure OpenDKIM to have statistics enabled, and begin reporting
        them to a file someplace.  This involves the "Statistics" setting.
        See the opendkim(8) and opendkim.conf(5) man pages for details.

6.      Restart opendkim.

7a.     To get a human-readable form of the recorded statistics, use:

                opendkim-stats /path/to/stats/file

        ...using, of course, the path to the statistics file you configure
        in opendkim.conf.  You can reset the contents of that file at any
        time by simply removing it or copying /dev/null on top of it.  The
        filter will create/append to the file on the next received message.

7b.     To translate records in the recorded statistics file into SQL
        insert operations, use:

                opendkim-importstats /path/to/stats/file

        ...again using the path you specified in opendkim.conf.  You will
        probably also need one or more of the following:

                -d dbname       database name (default "opendkim")
                -p dbpasswd     database user's password (no default)
                -s scheme       database scheme (default "mysql")
                -u dbuser       database user (default "opendkim")

        Append "-r" to this to remove the statistics file on completion.

        You can also use the provided opendkim-genstats script to generate
        useful reports from the accumulated data.  Contribution of other
        reports you find useful would be welcome.

8.      To participate in The Trusted Domain Project's data collection work,
        ask for the submission address you should use and then execute
        the following command:

                opendkim-reportstats -register

        This will generate a GPG signing key pair and send the public key
        to The Trusted Domain Project so your signatures can be verified.
        Upon receipt of your key, we will add it to our key ring, which
        will cause our system to begin trusting your reports.  We will notify
        you when this has happened.

        Once this is done, you can arrange to have your cron job execute this
        command:

                opendkim-reportstats -sendstats

        This will send a signed copy of the current statistics data to The
        Trusted Domain Project and add the ".old" suffix to the filename.
        The filter will automatically start a new file when the next message
        is received..

        When your data copy is received, The OpenDKIM project will use the
        aforementioned "opendkim-importstats" tool to import your data into
        the central database.  You can view regularly generated reports,
        including your data, at:

                http://www.opendkim.org/stats/report.html


EXTERNAL FEEDS

If you are interested in producing statistics for the purpose of exporting
them to a data aggregator other than yourself, follow steps 1 and 5, 6 and 8
above only.


FILE FORMAT

The format of the file written by the opendkim filter is described here.
If demand appears, a stable API for accessing it will be provided.  Until
then, application developers are advised not to rely on this information being
stable between versions.

A statistics file consists of lines of ASCII data which are delimited from
each other by a single LF (ASCII 10).

Empty lines or lines beginning with other than alphabetic characters are
ignored completely.

A line in the file that begins with a capital letter identifies the type
of record it represents.  The first such line also implicitly ends the global
values section.

There are currently these record types:

        M       identifies a message
        S       identifies a signature
        U       updates a message

A message record is a tab-separated, ordered sequence of fields, as follows:

        MTA-provided job/envelope ID (string)
        reporter (string; defaults to hostname)
        first domain found in the From: header field (string)
        SMTP client IP address (string)
        UNIX timestamp of message receive time
        message size, in bytes
        signature count
        ATPS status (-1 = not checked, 0 = no, 1 = yes)
        spam status (-1 = not checked, 0 = no, 1 = yes)

A signature record implicitly references the preceding message record.
There may be more than one signature record per message; there could also be
none.  As above, a signature record is a tab-separated, ordered sequence of
fields, as follows:

        domain of the signature
        pass (0 = no, 1 = yes)
        failed due to "bh" mismatch (0 = no, 1 = yes)
        "l=" tag value (-1 = not present)
        error code from signature
        DNSSEC value (see DKIM_DNSSEC_* constants from dkim.h)

Fields for which no value is known or appropriate should be represented as
"-" in the file.


UPDATES

As of v2.5.0, a "spam" column is present in the table tracking per-message
information.  When a batch of new message information is sent, this field
is typically populated with a spam value of 0 (not spam).  An update record
can be sent later that changes the value in this column.

The fields in this case are:

        MTA-provided job/envelope ID (string)
        reporter (string; defaults to hostname)
        UNIX timestamp of message receive time (0 if not known)
        spam indicator (-1 = not tested, 0 = not spam, 1 = spam)

The first three fields must match those given for the corresponding "M"
(new message) record at some time in the past.  Where the timestamp is 0,
the latest record matching the first two is updated.  The "spam" column for
that message will be replaced with the value found in the fourth field.


EXTENSIONS

For the purpose of allowing local experimentation, it is possible to
extend the statistics reporting to include data about a message not covered by
the basic schema distributed with OpenDKIM.

Extension data are collected and stored during execution of the Lua "final"
script and then written out to the stats file associated with that message.
These items are written on "X" lines in the form "name value" (i.e. name and
value separated by whitespace).  opendkim-importstats will insert these into
your database using "name" as the name of the column to be updated and "value"
as the value to be placed there.

Choosing the "--enable-statsext" flag at build configuration time adds support
for this.  The mechanism for recording these extra statistics is the
odkim.stats() function, detailed in the opendkim-lua(3) man page.

In this way, one can create any supplementary message-specific columns as
is desirable and populate them whatever way is appropriate for each.

Participants that find particular additional columns produce interesting
correlations are encouraged to share them with the OpenDKIM community.


CONVERTING TO THE IPADDRS SCHEMA

In version 2.3.0 of OpenDKIM, the schema changed to add a new "ipaddrs"
table, moving that data from each row of the "messages" table (causing
duplication).  The opendkim-importstats tool in 2.3.0 expects the new schema
and is not back-compatible with prior releases.

To convert your existing databases, use these steps:

1) Run the stats/mkdb.mysql script from inside your MySQL client to create
   the required new table.  Existing tables will not be altered.

2) Run the following additional MySQL commands (lines are wrapped here for
   readability but each represents a single command):

        a) LOCK TABLES messages WRITE, ipaddrs WRITE;
        b) ALTER TABLE messages ADD COLUMN ip INT UNSIGNED AFTER ipaddr;
        c) INSERT INTO ipaddrs (addr, firstseen) 
                SELECT DISTINCT ipaddr, MIN(msgtime) FROM messages
                GROUP BY ipaddr;
        d) UPDATE messages
                SET ip = (SELECT id FROM ipaddrs WHERE addr = messages.ipaddr);
        e) ALTER TABLE messages MODIFY COLUMN ip INT UNSIGNED NOT NULL,
                DROP COLUMN ipaddr,
                ADD CONSTRAINT FOREIGN KEY(ip) REFERENCES ipaddrs(id)
                ON DELETE CASCADE;
        f) UNLOCK TABLES;

   The various ALTER TABLE commands may require a rebuild of the "messages"
   table.  This can take a large amount of time if the table is large.

Generated by dwww version 1.14 on Thu Jan 23 03:26:25 CET 2025.