User Tools

Site Tools


sql_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
sql_analysis [2011/11/30 12:11] danielsql_analysis [2016/07/21 04:00] – [Presence and count of specific mail headers] daniel
Line 8: Line 8:
 FROM header, (values ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) as h(field) FROM header, (values ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) as h(field)
 WHERE position(E'\n'||field IN lines)>0 WHERE position(E'\n'||field IN lines)>0
-group by split_part(substr(lines, 1+position(E'\n'||field in lines), 200), E'\n', 1)+group by 1
 </code> </code>
  
Line 48: Line 48:
  X-Priority: Normal        |     5  X-Priority: Normal        |     5
 </code> </code>
 +
 +This alternative implementation uses regular expressions and differs in that it doesn't limit header values to 200 characters or any other fixed length.
 +<code sql>
 +select FIELD||':'||arr[1], count(*)
 + from (select FIELD, regexp_matches(lines, '(?:^|\n)' || FIELD || ':\s*([^\n]*)\n', 'g') as arr
 +       from header, 
 +         (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'),
 +             ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD)
 +             where strpos(lines,FIELD)>0) l
 +GROUP BY 1 ORDER BY 1
 +</code>
 +
 +In this version, the '' strpos(lines,FIELD)>0 '' condition is not essential: it's introduced only as a first-pass filter to eliminate the headers that don't contain anywhere any of the searched fields.
 ====== Duplicate messages ====== ====== Duplicate messages ======
 This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate. This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.
Line 59: Line 72:
 select distinct mail_id FROM attachments WHERE content_type='application/pdf' OR content_type like 'image/%'; select distinct mail_id FROM attachments WHERE content_type='application/pdf' OR content_type like 'image/%';
 </code> </code>
 +
 +====== Messages sent or received today  ======
 +<code>
 +select mail_id from mail where msg_date>=date_trunc('day',now());
 +</code>
 +
  
sql_analysis.txt · Last modified: 2019/01/24 17:55 by daniel