User Tools

Site Tools


sql_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
sql_analysis [2012/01/28 02:55] danielsql_analysis [2016/07/21 04:00] – [Presence and count of specific mail headers] daniel
Line 8: Line 8:
 FROM header, (values ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) as h(field) FROM header, (values ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'), ('X-MSMail-Priority'), ('X-MS-Priority')) as h(field)
 WHERE position(E'\n'||field IN lines)>0 WHERE position(E'\n'||field IN lines)>0
-group by split_part(substr(lines, 1+position(E'\n'||field in lines), 200), E'\n', 1)+group by 1
 </code> </code>
  
Line 48: Line 48:
  X-Priority: Normal        |     5  X-Priority: Normal        |     5
 </code> </code>
 +
 +This alternative implementation uses regular expressions and differs in that it doesn't limit header values to 200 characters or any other fixed length.
 +<code sql>
 +select FIELD||':'||arr[1], count(*)
 + from (select FIELD, regexp_matches(lines, '(?:^|\n)' || FIELD || ':\s*([^\n]*)\n', 'g') as arr
 +       from header, 
 +         (VALUES ('X-Priority'), ('Importance'), ('Precedence'), ('Priority'),
 +             ('X-MSMail-Priority'), ('X-MS-Priority')) AS h(FIELD)
 +             where strpos(lines,FIELD)>0) l
 +GROUP BY 1 ORDER BY 1
 +</code>
 +
 +In this version, the '' strpos(lines,FIELD)>0 '' condition is not essential: it's introduced only as a first-pass filter to eliminate the headers that don't contain anywhere any of the searched fields.
 ====== Duplicate messages ====== ====== Duplicate messages ======
 This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate. This query finds each message that share the exact same headers than another message with a lower mail_id, which means that it's a duplicate.
sql_analysis.txt · Last modified: 2019/01/24 17:55 by daniel