ocsync.txt 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315
  1. CSYNC User Guide
  2. ================
  3. Andreas Schneider <asn@cryptomilk.org>
  4. :Author Initials: ADS
  5. csync is a lightweight utility to synchronize files between two directories
  6. on a system or between multiple systems.
  7. It synchronizes bidirectionally and allows the user to keep two copies of files
  8. and directories in sync. csync uses widely adopted protocols, such as smb or
  9. sftp, so that there is no need for a server component. It is a user-level
  10. program which means you don't need to be a superuser or administrator.
  11. Together with a Pluggable Authentication Module (PAM), the intent is to provide
  12. Roaming Home Directories for Linux (see <<X80, The PAM Module>>).
  13. Introduction
  14. ------------
  15. It is often the case that we have multiple copies (called replicas) of a
  16. filesystem or part of a filesystem (for example on a notebook and desktop
  17. computer). Changes to each replica are often made independently, and as a
  18. result, they do not contain the same information. In that case, a file
  19. synchronizer is used to make them consistent again, without losing any
  20. information.
  21. The goal is to detect conflicting updates (files which have been modified) and
  22. propagate non-conflicting updates to each replica. If there are no conflicts
  23. left, we are done, and the replicas are identical. To resolve or handle
  24. conflicts there are several algorithms available. They will be discussed
  25. one of the following sections.
  26. Basics
  27. ------
  28. This section describes some basics of file synchronization.
  29. Paths
  30. ~~~~~
  31. A path normally refers to a point which contains a set of files which should be
  32. synchronized. It is specified relative to the root of the replica locally, but
  33. has to be absolute if you use a protocol. The path is just a sequence of names
  34. separated by '/'.
  35. NOTE: The path separator is always a forward slash '/', even for Windows.
  36. csync always uses the absolute path on remote replicas. This could
  37. 'sftp://gladiac:secret@myserver/home/gladiac' for sftp.
  38. What is an update?
  39. ~~~~~~~~~~~~~~~~~~
  40. The contents of a path could be a file, a directory or a symbolic link
  41. (symbolic links are not supported yet). To be more precise, if the path refers
  42. to:
  43. - a regular file: the contents of the file are the byte stream and the
  44. metadata of the file.
  45. - a directory: then the content is the metadata of the directory.
  46. - a symbolic link: the content is the named file the link points to.
  47. csync keeps a record of each path which has been successfully synchronized. The
  48. path gets compared with the record and if it has changed since the last
  49. synchronization, we have an update. This is done by comparing the modification
  50. or change (modification time of the metadata) time. This is the way how updates
  51. are detected.
  52. What is a conflict?
  53. ~~~~~~~~~~~~~~~~~~~
  54. A path is conflicting if it fulfills the following conditions:
  55. 1. it has been updated in one replica,
  56. 2. it or any of its descendants has been updated on the other replica too, and
  57. 3. its contents in are not identical.
  58. File Synchronization
  59. --------------------
  60. The primary goal of the file synchronizer is correctness. It may change
  61. scattered or large parts of the filesystem. Since this in mostly not monitored
  62. by the user, and the file synchronizer is in a position to harm the system,
  63. csync must be safe, even in the case of unexpected errors (e.g. disk full).
  64. What was done to make csync safe is described in the following sections.
  65. One problem concerning correctness is the handling of conflicts. Each file
  66. synchronizer tries to propagate conflicting changes to the other replica. At
  67. the end both replicas should be identical. There are different strategies to
  68. fulfill these goals.
  69. csync is a three-phase file synchronizer. The decision for this design was that
  70. user interaction should be possible and it should be easy to understand the
  71. process. The three phases are update detection, reconciliation and propagation.
  72. These will be described in the following sections.
  73. Update detection
  74. ~~~~~~~~~~~~~~~~
  75. There are different strategies for update detection. csync uses a state-based
  76. modtime-inode update detector. This means it uses the modification time to
  77. detect updates. It doesn't require many resources. A record of each file is
  78. stored in a database (called statedb) and compared with the current
  79. modification time during update detection. If the file has changed since the
  80. last synchronization an instruction is set to evaluate it during the
  81. reconciliation phase. If we don't have a record for a file we investigate, it
  82. is marked as new.
  83. It can be difficult to detect renaming of files. This problem is also solved
  84. by the record we store in the statedb. If we don't find the file by the name
  85. in the database, we search for the inode number. If the inode number is found
  86. then the file has been renamed.
  87. Reconciliation
  88. ~~~~~~~~~~~~~~
  89. The most important component is the update detector, because the reconciler
  90. depends on it. The correctness of reconciler is mandatory because it can damage
  91. a filesystem. It decides which file:
  92. * Stays untouched
  93. * Has a conflict
  94. * Gets synchronized
  95. * or is *deleted*
  96. A wrong decision of the reconciler leads in most cases to a loss of data. So
  97. there are several conditions which a file synchronizer has to follow.
  98. Algorithms
  99. ^^^^^^^^^^
  100. For conflict resolution several different algorithms could be implemented. The
  101. most common algorithms are the merge and the conflict algorithm. The first
  102. is a batch algorithm and the second is one which needs user interaction.
  103. Merge algorithm
  104. +++++++++++++++
  105. The merge algorithm is an algorithm which doesn't need any user interaction. It
  106. is simple and used for example by Microsoft for Roaming Profiles. If it detects
  107. a conflict (the same file changed on both replicas) then it will use the most
  108. recent file and overwrite the other. This means you can loose some data, but
  109. normally you want the latest file.
  110. Conflict algorithm
  111. ++++++++++++++++++
  112. This is not implemented yet.
  113. If a file has a conflict the user has to decide which file should be used.
  114. Propagation
  115. ~~~~~~~~~~~
  116. The next instance of the file synchronizer is the propagator. It uses the
  117. calculated records to apply them on the current replica.
  118. The propagator uses a two-phase-commit mechanism to simulate an atomic
  119. filesystem operation.
  120. In the first phase we copy the file to a temporary file on the opposite
  121. replica. This has the advantage that we can check if the file which has been
  122. copied to the opposite replica has been transferred successfully. If the
  123. connection gets interrupted during the transfer we still have the original
  124. states of the file. This means no data will be lost.
  125. In the second phase the file on the opposite replica will be overwritten by
  126. the temporary file.
  127. After a successful propagation we have to merge the trees to reflect the
  128. current state of the filesystem tree. This updated tree will be written as a
  129. journal into the state database. It will be used during the update detection of
  130. the next synchronization. See above for a description of the state database
  131. during synchronization.
  132. Robustness
  133. ~~~~~~~~~~
  134. This is a very important topic. The file synchronizer should not crash, and if
  135. it has crashed, there should be no loss of data. To achieve this goal there are
  136. several mechanisms which will be discussed in the following sections.
  137. Crash resistance
  138. ^^^^^^^^^^^^^^^^
  139. The synchronization process can be interrupted by different events, this can
  140. be:
  141. * the system could be halted due to errors.
  142. * the disk could be full or the quota exceeded.
  143. * the network or power cable could be pulled out.
  144. * the user could force a stop of the synchronization process.
  145. * various communication errors could occur.
  146. That no data will be lost due to an event we enforce the following invariant:
  147. IMPORTANT: At every moment of the synchronization each file, has either its
  148. original content or its correct final content.
  149. This means that the original content can not be incorrect, no data can be lost
  150. until we overwrite it after a successful synchronization. Therefore, each
  151. interrupted synchronization process is a partial sync and can be continued and
  152. completed by simply running csync again. The only problem could be an error of
  153. the filesystem, so we reach this invariant only approximately.
  154. Transfer errors
  155. ^^^^^^^^^^^^^^^
  156. With the Two-Phase-Commit we check the file size after the file has transferred
  157. and we are able to detect transfer errors. A more robust approach would be a
  158. transfer protocol with checksums, but this is not doable at the moment. We may
  159. add this in the future.
  160. Future filesystems, like btrfs, will help to compare checksums instead of the
  161. filesize. This will make the synchronization safer. This does not imply that it
  162. is unsafe now, but checksums are safer than simple filesize checks.
  163. Database loss
  164. ^^^^^^^^^^^^^
  165. It is possible that the state database could get corrupted. If this happens,
  166. all files get evaluated. In this case the file synchronizer wont delete any
  167. file, but it could occur that deleted files will be restored from the other
  168. replica.
  169. To prevent a corruption or loss of the database if an error occurs or the user
  170. forces an abort, the synchronizer is working on a copy of the database and will
  171. use a Two-Phase-Commit to save it at the end.
  172. Getting started
  173. ---------------
  174. Installing csync
  175. ~~~~~~~~~~~~~~~~
  176. See the `README` and `INSTALL` files for install prerequisites and
  177. procedures. Packagers should take a look at <<X90, Appendix A: Packager Notes>>.
  178. Using the commandline client
  179. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  180. The synopsis of the commandline client is
  181. csync [OPTION...] SOURCE DESTINATION
  182. It synchronizes the content of SOURCE with DESTINATION and vice versa. The
  183. DESTINATION can be a local directory or a remote file server.
  184. csync /home/csync scheme://user:password@server:port/full/path
  185. Examples
  186. ^^^^^^^^
  187. To synchronize two local directories:
  188. csync /home/csync/replica1 /home/csync/relplica2
  189. Two synchronizer a local directory with an smb server, use
  190. csync /home/csync smb://rupert.galaxy.site/Users/csync
  191. If you use kerberos, you don't have to specify a username or a password. If you
  192. don't use kerberos, the commandline client will ask about the user and the
  193. password. If you don't want to be prompted, you can specify it on the
  194. commandline:
  195. csync /home/csync smb://csync:secret@rupert.galaxy.site/Users/csync
  196. If you use the sftp protocol and want to specify a port, you do it the
  197. following way:
  198. csync /home/csync sftp://csync@krikkit.galaxy.site:2222/home/csync
  199. The remote destination is supported by plugins. By default csync ships with smb
  200. and sftp support. For more information, see the manpage of csync(1).
  201. Exclude lists
  202. ~~~~~~~~~~~~~
  203. csync provides exclude lists with simple shell wildcard patterns. There is a
  204. global exclude list, which is normally located in
  205. '/etc/csync/csync_exclude.conf' and it has already some sane defaults. If you
  206. run csync the first time, it will create an empty exclude list for the user.
  207. This file will be '~/.csync/csync_exclude.conf'. csync considers both
  208. configuration files and an additional one if you specify it.
  209. The entries in the file are newline separated. Use
  210. '/etc/csync/csync_exclude.conf' as an example.
  211. Debug messages and dry run
  212. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  213. By default the csync client logs to stderr and you can increase the debug
  214. level with a commandline options.
  215. To simulate a run of the file synchronizer, you should set the priority to
  216. 'debug' for the categories 'csync.updater' and 'csync.reconciler' in the config
  217. file '~/.csync/csync_log.conf'. Then run csync with the '--dry-run' option.
  218. This will only run update detection and reconciliation.
  219. [[X80]]
  220. The PAM module
  221. ~~~~~~~~~~~~~~
  222. pam_csync is a PAM module to provide roaming home directories for a user
  223. session. This module is aimed at environments with central file servers where a
  224. user wishes to store his home directory. The Authentication Module verifies the
  225. identity of a user and triggers a synchronization with the server on the first
  226. login and the last logout. More information can be found in the manpage of the
  227. module pam_csync(8) or pam itself pam(8).
  228. [[X90]]
  229. Appendix A: Packager Notes
  230. --------------------------
  231. Read the `README`, `INSTALL` and `FAQ` files (in the distribution root
  232. directory).