This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
sql_analysis [2016/07/21 06:00] daniel [Presence and count of specific mail headers] |
sql_analysis [2016/10/04 18:17] daniel [Duplicate messages] |
||
---|---|---|---|
Line 67: | Line 67: | ||
</code> | </code> | ||
+ | A stricter version, comparing the md5 hashes of bodies (text and html parts) in addition to the headers: | ||
+ | <code sql> | ||
+ | select b2.mail_id from body b1, body b2, header h1, header h2 | ||
+ | where b1.mail_id < b2.mail_id | ||
+ | and h1.mail_id = b1.mail_id | ||
+ | and h2.mail_id = b2.mail_id | ||
+ | and md5(h1.lines) = md5(h2.lines) | ||
+ | and md5(b1.bodytext) is not distinct from md5(b2.bodytext) | ||
+ | and md5(b1.bodyhtml) is not distinct from md5(b2.bodyhtml); | ||
+ | </code> | ||
+ | |||
+ | The ''IS NOT DISTINCT'' comparator behaves as expected when ''bodytext'' or ''bodyhtml'' is NULL, as opposed | ||
+ | to the simple equality operator, for which ''NULL=NULL'' is false | ||
====== Messages with specific attachment types ====== | ====== Messages with specific attachment types ====== | ||
To retrieve all messages containing pdf files or any image file: | To retrieve all messages containing pdf files or any image file: |