User Tools

Site Tools


sql_analysis

This is an old revision of the document!


Presence and count of specific mail headers

This query extracts, for each of the header fields from a fixed list, the number of occurrences of all distinct values of the field.

SELECT
 split_part(substr(LINES, 1+POSITION(E'\n'||FIELD IN LINES), 200), E'\n', 1) AS ct,
 COUNT(*)
FROM header, (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD)
WHERE POSITION(E'\n'||FIELD IN LINES)>0
GROUP BY split_part(substr(LINES, 1+POSITION(E'\n'||FIELD IN LINES), 200), E'\n', 1)

Sample output:

            ct             | count 
---------------------------+-------
 Importance: high          |    32
 Importance: High          |   130
 Importance: low           |     1
 Importance: Medium        |     3
 Importance: normal        |    44
 Importance: Normal        |   802
 Precedence: bulk          | 19987
 Precedence: bulk          |     1
 Precedence: fm-user       |     3
 Precedence: junk          |     3
 Precedence: list          |  4026
 Priority: non-urgent      |    11
 Priority: Non-Urgent      |     1
 Priority: normal          |   464
 Priority: urgent          |   603
 Priority: Urgent          |    12
 X-MSMail-Priority: High   |   121
 X-MSMail-Priority: Low    |     1
 X-MSMail-Priority: Lowest |    11
 X-MSMail-Priority: Medium |     3
 X-MSMail-Priority: Middle |     8
 X-MSMail-Priority: Normal |   923
 X-Priority: 0             |     1
 X-Priority: 1             |   102
 X-Priority: 1 (High)      |     2
 X-Priority: 1 (Highest)   |   120
 X-Priority: 2             |     6
 X-Priority: 2 (High)      |     1
 X-Priority: 3             |  2892
 X-Priority: 3 (Normal)    |   635
 X-Priority: 5             |    14
 X-Priority: Normal        |     5

Duplicate messages

This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.

SELECT h1.mail_id FROM header h1, header h2 WHERE h1.lines=h2.lines AND h1.mail_id > h2.mail_id

Messages with specific attachment types

To retrieve all messages containing pdf files or any image file:

SELECT DISTINCT mail_id FROM attachments WHERE content_type='application/pdf' OR content_type LIKE 'image/%';
sql_analysis.1322655072.txt.gz · Last modified: 2011/11/30 12:11 by daniel