FAQ.Duplicates.txt 11 KB


  1. #!/bin/cat
  2. $Id: FAQ.Duplicates.txt,v 1.25 2021/10/06 20:21:13 gilles Exp gilles $
  3. This documentation is also available online at
  4. https://imapsync.lamiral.info/FAQ.d/
  5. https://imapsync.lamiral.info/FAQ.d/FAQ.Duplicates.txt
  6. =======================================================================
  7. Imapsync tips about duplicated messages issues.
  8. =======================================================================
  9. Questions answered in this FAQ are:
  10. Q. Without imapsync, I made several copies that partially failed and it
  11. ended with many duplicates/triplicates messages or more. Can I clean
  12. up the account with imapsync and how?
  13. Q. How does imapsync identify messages and duplicates?
  14. Q. How can I know if imapsync will generate duplicates on a second run?
  15. Q: I found multiple copies, duplicates, when I run imapsync twice or
  16. more. What the hell is happening?
  17. Q. imapsync calculates 479 messages in a folder but only transfers 400
  18. messages. What is happening?
  19. Q. imapsync doesn't synchronize duplicates by default but I want to.
  20. How can I synchronize duplicates?
  21. Q. How can I remove duplicates in a lone account?
  22. Now the questions again with their answers.
  23. =======================================================================
  24. Q. Without imapsync, I made several copies that partially failed and it
  25. ended with many duplicates/triplicates messages or more. Can I clean
  26. up the account with imapsync and how?
  27. R. Yes.
  28. See the Q/R "How can I remove duplicates in a lone account?" below.
  29. =======================================================================
  30. Q. How does imapsync identify messages and duplicates?
  31. R. Imapsync by default identify messages by their headers "Message-Id"
  32. and "Received". Usually, for a given message, "Message-Id" appears one
  33. time while multiple "Received" headers are common.
  34. For imapsync, messages with the same "Message-Id" and "Received" headers
  35. are consider identical, ie, duplicates.
  36. =======================================================================
  37. Q. How can I know if imapsync will generate duplicates on a second run?
  38. R. To see if imapsync will generate duplicates on a second run, start
  39. a second run with --dry option added. With --dry, imapsync will
  40. show whether it would mistakenly copy messages again, but without
  41. really copying them:
  42. imapsync ... --dry
  43. The final stats should also show a positive value for the line
  44. "Messages skipped:" since most of the skipped messages are skipped
  45. because they are already on host2. Example of final stats:
  46. ++++ Statistics
  47. Transfer started on : Thu Aug 31 04:28:32 2017
  48. Transfer ended on : Thu Aug 31 04:28:44 2017
  49. Transfer time : 11.7 sec
  50. Folders synced : 1/1 synced
  51. Messages transferred : 0
  52. Messages skipped : 1555
  53. =======================================================================
  54. Q: I found multiple copies, duplicates, when I run imapsync twice or
  55. more. What the hell is happening?
  56. R0. First, some explanations to understand the issue.
  57. Normally and by default, imapsync doesn't generate duplicates.
  58. So, if it does generate duplicates it means a problem occurs
  59. with message identification. It happens sometimes with IMAP
  60. servers changing the "Message-Id" header line or one or more
  61. of the "Received:" header lines in the header part of messages.
  62. By default, Imapsync uses "Message-Id" header line and
  63. "Received:" header lines to identify messages on both sides.
  64. R1. This solution is R3 simplified.
  65. A quick practical solution is to change the way imapsync
  66. identify messages that works most of the time. But since
  67. you're reading this because you encountered duplicates issue,
  68. let's check this solution in a safe way.
  69. First use the same commmand with additionnal options:
  70. imapsync ... --useheader "Message-Id" --dry
  71. The previous command does nothing real but it will show you
  72. if imapsync handles duplicates in a better way.
  73. The criterium is to search at the end of the sync for a line
  74. like this one:
  75. Messages skipped : 1555
  76. where 1555 is an example but reflects mostly the number
  77. of all messages already transferred.
  78. If you end with:
  79. Messages skipped : 0
  80. then don't go on, it means imapsync is still suffering to
  81. identify messages.
  82. If you end with many messages skipped then it's very
  83. good and now you can safely resync the mailbox
  84. and get rid of the dupplicates messages on host2 with:
  85. imapsync ... --useheader "Message-Id" --delete2duplicates
  86. End of the problem!
  87. R2.
  88. A second solution is to use option --useuid.
  89. With option --useuid, imapsync doesn't use header lines
  90. to identify and compare messages in folders.
  91. Instead of some headers, --useuid tell imapsync to use
  92. the imap UIDs given by imap servers on both sides.
  93. To avoid duplicates on next runs, imapsync uses a local cache
  94. where it keeps UIDs already transferred.
  95. imapsync ... --useuid
  96. There is an issue when --useuid is not used the first time.
  97. A big issue with --useuid is that it doesn't generate duplicates if
  98. used from the first time but it does generate duplicates after a previous
  99. run without --useuid (because it then uses a different method to identify
  100. the messages).
  101. A solution? Two solutions.
  102. The easiest is --delete2 if you are permitted to use it.
  103. Option --delete2 removes messages on host2
  104. that are not on host1. So, with --delete2 you go for resyncing all
  105. messages again. All previously transferred messages are deleted,
  106. but also messages previously there without imapsync.
  107. So --useuid --delete2 is an easy way to remove duplicates but it
  108. is not suitable in all contexts. The good context is that the host2
  109. account must be considered as a strict replication of the host1
  110. account, ie, host2 not active yet.
  111. A second solution, better if R3 works (see R3 below), is to build
  112. the cache before using --useuid
  113. First sync:
  114. imapsync ... --useheader "Message-Id" --addheader --usecache
  115. Next syncs:
  116. imapsync ... --useuid
  117. imapsync ... --useuid
  118. ...
  119. R3.
  120. Best way if you can follow it.
  121. Multiple copies of the emails on the destination server. Some IMAP
  122. servers (Domino for example) change some headers for each message
  123. transferred. All messages are transferred again and again each time you
  124. run imapsync. This is bad of course. The explanation is that imapsync
  125. considers messages are not the same on each side, default headers used
  126. to identify the messages have changed.
  127. You can look at the headers found by imapsync by using the --debug
  128. option (and search for the message on both part), Header lines from
  129. the source server begin with a "FH:" prefix, Header lines from the
  130. destination server begin with a "TH:" prefix. Since --debug is very
  131. verbose I suggest to isolate a email in a specific folder in case you
  132. want to forward me the output.
  133. A way to avoid this problem is by using option --useheader with
  134. a different set than the default ones used by imapsync.
  135. The default set is equivalent to:
  136. imapsync ... --useheader "Message-Id" --useheader "Received"
  137. The problem now is that what can be used instead of Message-Id
  138. and Received lines? Often standalone Message-Id works:
  139. imapsync ... --useheader "Message-Id"
  140. Once imapsync does not generate duplicates, the previous duplicates
  141. can be deleted with option --delete2duplicates
  142. imapsync ... --useheader "Message-Id" --delete2duplicates
  143. Another good way toward a solution is to isolate two or three messages
  144. in a BUG folder and send me the --debug output by email to
  145. gilles@lamiral.info
  146. imapsync ... --debug --folder BUG
  147. I will take a close look at the log and modify imapsync to fix
  148. this faulty duplicate behavior.
  149. Remark. (Trick found by Tomasz Kaczmarski)
  150. Option --useheader "Message-Id" asks the server to send only header
  151. lines beginning with "Message-Id". Some (buggy) servers send the whole
  152. header (all lines) instead of the "Message-Id" line. In that case, a
  153. trick to keep the --useheader filtering behavior is to use
  154. --skipheader with a negative lookahead pattern:
  155. imapsync ... --skipheader "^(?!Message-Id)"
  156. Read it as "skip every header except Message-Id".
  157. =======================================================================
  158. Q. imapsync calculates 479 messages in a folder but only transfers 400
  159. messages. What is happening?
  160. R1. Unless --useuid is used, imapsync considers a header part
  161. of a message to identify a message on both sides.
  162. By default the header part used is lines "Message-Id:" "Message-ID:"
  163. and "Received:" or specific lines depending on --useheader
  164. --skipheader. Whole header can be set by --useheader ALL
  165. Consequences:
  166. 1) Duplicate messages on host1 (identical header) are not transferred.
  167. The result is that you can have more messages on host1 than on host2.
  168. R2. With option --useuid imapsync doesn't use headers to identify
  169. messages on both sides but it uses their imap uid identifier.
  170. In that case duplicates on host1 are also transferred on host2.
  171. =======================================================================
  172. Q. imapsync doesn't synchronize duplicates by default but I want to.
  173. How can I synchronize duplicates?
  174. R1. Use the option --syncduplicates
  175. R2. Use the option --useuid
  176. If you have already synchronized two mailboxes without --useuid then
  177. using it right away will generate duplicates on host2. To avoid that
  178. behavior, you have to perform a first run with --usecache to build
  179. the local UID cache. Then the next runs with --useuid
  180. There are potentially issues with --usecache. They can be solved.
  181. Read the document FAQ.Use_cache.txt
  182. https://imapsync.lamiral.info/FAQ.d/FAQ.Use_cache.txt
  183. So to finalise how to synchronize duplicates:
  184. imapsync ... --tmpdir . --usecache
  185. imapsync ... --tmpdir . --useuid
  186. imapsync ... --tmpdir . --useuid
  187. ...
  188. Here --tmpdir value is the dot "." meaning "current directory".
  189. Surrounding it with double-quotes is optional.
  190. If the two mailboxes haven't been already synchronized then the
  191. first run with --usecache is useless.
  192. =======================================================================
  193. Q. How can I remove duplicates in a lone account?
  194. R. In order to remove duplicates in a lone account, just run imapsync
  195. on the same account as source and destination, plus the
  196. option --delete2duplicates, ie, with
  197. host1 == host2, user1 == user2, password1 == password2
  198. imapsync ... --delete2duplicates
  199. =======================================================================
  200. =======================================================================