123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279 |
- #!/bin/cat
- $Id: FAQ.Duplicates.txt,v 1.25 2021/10/06 20:21:13 gilles Exp gilles $
- This documentation is also available online at
- https://imapsync.lamiral.info/FAQ.d/
- https://imapsync.lamiral.info/FAQ.d/FAQ.Duplicates.txt
- =======================================================================
- Imapsync tips about duplicated messages issues.
- =======================================================================
- Questions answered in this FAQ are:
- Q. Without imapsync, I made several copies that partially failed and it
- ended with many duplicates/triplicates messages or more. Can I clean
- up the account with imapsync and how?
- Q. How does imapsync identify messages and duplicates?
- Q. How can I know if imapsync will generate duplicates on a second run?
- Q: I found multiple copies, duplicates, when I run imapsync twice or
- more. What the hell is happening?
-
- Q. imapsync calculates 479 messages in a folder but only transfers 400
- messages. What is happening?
- Q. imapsync doesn't synchronize duplicates by default but I want to.
- How can I synchronize duplicates?
- Q. How can I remove duplicates in a lone account?
- Now the questions again with their answers.
- =======================================================================
- Q. Without imapsync, I made several copies that partially failed and it
- ended with many duplicates/triplicates messages or more. Can I clean
- up the account with imapsync and how?
- R. Yes.
- See the Q/R "How can I remove duplicates in a lone account?" below.
- =======================================================================
- Q. How does imapsync identify messages and duplicates?
- R. Imapsync by default identify messages by their headers "Message-Id"
- and "Received". Usually, for a given message, "Message-Id" appears one
- time while multiple "Received" headers are common.
-
- For imapsync, messages with the same "Message-Id" and "Received" headers
- are consider identical, ie, duplicates.
- =======================================================================
- Q. How can I know if imapsync will generate duplicates on a second run?
- R. To see if imapsync will generate duplicates on a second run, start
- a second run with --dry option added. With --dry, imapsync will
- show whether it would mistakenly copy messages again, but without
- really copying them:
- imapsync ... --dry
- The final stats should also show a positive value for the line
- "Messages skipped:" since most of the skipped messages are skipped
- because they are already on host2. Example of final stats:
-
- ++++ Statistics
- Transfer started on : Thu Aug 31 04:28:32 2017
- Transfer ended on : Thu Aug 31 04:28:44 2017
- Transfer time : 11.7 sec
- Folders synced : 1/1 synced
- Messages transferred : 0
- Messages skipped : 1555
- =======================================================================
- Q: I found multiple copies, duplicates, when I run imapsync twice or
- more. What the hell is happening?
- R0. First, some explanations to understand the issue.
- Normally and by default, imapsync doesn't generate duplicates.
- So, if it does generate duplicates it means a problem occurs
- with message identification. It happens sometimes with IMAP
- servers changing the "Message-Id" header line or one or more
- of the "Received:" header lines in the header part of messages.
- By default, Imapsync uses "Message-Id" header line and
- "Received:" header lines to identify messages on both sides.
- R1. This solution is R3 simplified.
- A quick practical solution is to change the way imapsync
- identify messages that works most of the time. But since
- you're reading this because you encountered duplicates issue,
- let's check this solution in a safe way.
- First use the same commmand with additionnal options:
- imapsync ... --useheader "Message-Id" --dry
- The previous command does nothing real but it will show you
- if imapsync handles duplicates in a better way.
- The criterium is to search at the end of the sync for a line
- like this one:
- Messages skipped : 1555
- where 1555 is an example but reflects mostly the number
- of all messages already transferred.
- If you end with:
- Messages skipped : 0
- then don't go on, it means imapsync is still suffering to
- identify messages.
- If you end with many messages skipped then it's very
- good and now you can safely resync the mailbox
- and get rid of the dupplicates messages on host2 with:
- imapsync ... --useheader "Message-Id" --delete2duplicates
- End of the problem!
- R2.
- A second solution is to use option --useuid.
- With option --useuid, imapsync doesn't use header lines
- to identify and compare messages in folders.
- Instead of some headers, --useuid tell imapsync to use
- the imap UIDs given by imap servers on both sides.
- To avoid duplicates on next runs, imapsync uses a local cache
- where it keeps UIDs already transferred.
- imapsync ... --useuid
- There is an issue when --useuid is not used the first time.
- A big issue with --useuid is that it doesn't generate duplicates if
- used from the first time but it does generate duplicates after a previous
- run without --useuid (because it then uses a different method to identify
- the messages).
- A solution? Two solutions.
- The easiest is --delete2 if you are permitted to use it.
- Option --delete2 removes messages on host2
- that are not on host1. So, with --delete2 you go for resyncing all
- messages again. All previously transferred messages are deleted,
- but also messages previously there without imapsync.
- So --useuid --delete2 is an easy way to remove duplicates but it
- is not suitable in all contexts. The good context is that the host2
- account must be considered as a strict replication of the host1
- account, ie, host2 not active yet.
- A second solution, better if R3 works (see R3 below), is to build
- the cache before using --useuid
- First sync:
- imapsync ... --useheader "Message-Id" --addheader --usecache
- Next syncs:
- imapsync ... --useuid
- imapsync ... --useuid
- ...
- R3.
- Best way if you can follow it.
- Multiple copies of the emails on the destination server. Some IMAP
- servers (Domino for example) change some headers for each message
- transferred. All messages are transferred again and again each time you
- run imapsync. This is bad of course. The explanation is that imapsync
- considers messages are not the same on each side, default headers used
- to identify the messages have changed.
- You can look at the headers found by imapsync by using the --debug
- option (and search for the message on both part), Header lines from
- the source server begin with a "FH:" prefix, Header lines from the
- destination server begin with a "TH:" prefix. Since --debug is very
- verbose I suggest to isolate a email in a specific folder in case you
- want to forward me the output.
- A way to avoid this problem is by using option --useheader with
- a different set than the default ones used by imapsync.
- The default set is equivalent to:
- imapsync ... --useheader "Message-Id" --useheader "Received"
- The problem now is that what can be used instead of Message-Id
- and Received lines? Often standalone Message-Id works:
- imapsync ... --useheader "Message-Id"
- Once imapsync does not generate duplicates, the previous duplicates
- can be deleted with option --delete2duplicates
- imapsync ... --useheader "Message-Id" --delete2duplicates
- Another good way toward a solution is to isolate two or three messages
- in a BUG folder and send me the --debug output by email to
- gilles@lamiral.info
- imapsync ... --debug --folder BUG
- I will take a close look at the log and modify imapsync to fix
- this faulty duplicate behavior.
- Remark. (Trick found by Tomasz Kaczmarski)
- Option --useheader "Message-Id" asks the server to send only header
- lines beginning with "Message-Id". Some (buggy) servers send the whole
- header (all lines) instead of the "Message-Id" line. In that case, a
- trick to keep the --useheader filtering behavior is to use
- --skipheader with a negative lookahead pattern:
- imapsync ... --skipheader "^(?!Message-Id)"
- Read it as "skip every header except Message-Id".
- =======================================================================
- Q. imapsync calculates 479 messages in a folder but only transfers 400
- messages. What is happening?
- R1. Unless --useuid is used, imapsync considers a header part
- of a message to identify a message on both sides.
- By default the header part used is lines "Message-Id:" "Message-ID:"
- and "Received:" or specific lines depending on --useheader
- --skipheader. Whole header can be set by --useheader ALL
- Consequences:
- 1) Duplicate messages on host1 (identical header) are not transferred.
- The result is that you can have more messages on host1 than on host2.
- R2. With option --useuid imapsync doesn't use headers to identify
- messages on both sides but it uses their imap uid identifier.
- In that case duplicates on host1 are also transferred on host2.
- =======================================================================
- Q. imapsync doesn't synchronize duplicates by default but I want to.
- How can I synchronize duplicates?
- R1. Use the option --syncduplicates
- R2. Use the option --useuid
- If you have already synchronized two mailboxes without --useuid then
- using it right away will generate duplicates on host2. To avoid that
- behavior, you have to perform a first run with --usecache to build
- the local UID cache. Then the next runs with --useuid
- There are potentially issues with --usecache. They can be solved.
- Read the document FAQ.Use_cache.txt
- https://imapsync.lamiral.info/FAQ.d/FAQ.Use_cache.txt
- So to finalise how to synchronize duplicates:
-
- imapsync ... --tmpdir . --usecache
- imapsync ... --tmpdir . --useuid
- imapsync ... --tmpdir . --useuid
- ...
- Here --tmpdir value is the dot "." meaning "current directory".
- Surrounding it with double-quotes is optional.
- If the two mailboxes haven't been already synchronized then the
- first run with --usecache is useless.
- =======================================================================
- Q. How can I remove duplicates in a lone account?
- R. In order to remove duplicates in a lone account, just run imapsync
- on the same account as source and destination, plus the
- option --delete2duplicates, ie, with
- host1 == host2, user1 == user2, password1 == password2
-
- imapsync ... --delete2duplicates
- =======================================================================
- =======================================================================
|