architecture.rst 7.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
  1. Architecture
  2. ============
  3. .. index:: architecture
  4. The ownCloud project provides desktop sync clients to synchronize the
  5. contents of local directories on the desktop machines to the ownCloud.
  6. The syncing is done with csync_, a bidirectional file synchronizing tool which
  7. provides both a command line client as well as a library. A special module for
  8. csync was written to synchronize with ownCloud’s built-in WebDAV server.
  9. The ownCloud sync client is based on a tool called mirall initially written by
  10. Duncan Mac Vicar. Later Klaas Freitag joined the project and enhanced it to work
  11. with ownCloud server. Both mirall and ownCloud Client (oCC) build from the same
  12. source, currently hosted in the ownCloud source repo on gitorious.
  13. oCC is written in C++ using the `Qt Framework`_. As a result oCC runs on the
  14. three important platforms Linux, Windows and MacOS.
  15. .. _csync: http://www.csync.org
  16. .. _`Qt Framework`: http://www.qt-project.org
  17. The Sync Process
  18. ----------------
  19. First it is important to recall what syncing is. Syncing tries to keep the files
  20. on both repositories the same. That means if a file is added to one repository
  21. it is going to be copied to the other repository. If a file is changed on one
  22. repository, the change is propagated to the other repository. Also, if a file
  23. is deleted on one side, it is deleted on the other. As a matter of fact, in
  24. ownCloud syncing we do not have a typical client/server system where the
  25. server is always master.
  26. This is the major difference to other systems like a file backup where just
  27. changes and new files are propagated but files never get deleted.
  28. The oCC checks both repositories for changes frequently after a certain time
  29. span. That is refered to as a sync run. In between the local repository is
  30. monitored by a file system monitor system that starts a sync run immediately
  31. if something was edited, added or removed.
  32. Sync by Time versus ETag
  33. ------------------------
  34. .. index:: time stamps, file times, etag, unique id
  35. Until the release of ownCloud 4.5 and ownCloud Client 1.1, ownCloud employed
  36. a single file property to decide which file is newer and hence needs to be
  37. synced to the other repository: the files modification time.
  38. The *modification timestamp* is part of the files metadata. It is available on
  39. every relevant filesystem and is the natural indicator for a file change.
  40. Modification timestamps do not require special action to create and have
  41. a general meaning. One design goal of csync is to not require a special server
  42. component, that’s why it was chosen as the backend component.
  43. To compare the modification times of two files from different systems,
  44. it is needed to operate on the same base. Before version 1.1.0,
  45. csync requires both sides running on the exact same time, which can
  46. be achieved through enterprise standard `NTP time synchronisation`_ on all
  47. machines.
  48. Since this strategy is rather fragile without NTP, ownCloud 4.5 introduced a
  49. unique number, which changes whenever the file changes. Although it is a unique
  50. value, it is not a hash of the file, but a randomly chosen number, which it will
  51. transmit in the Etag_ field. Since the file number is guaranteed to change if the
  52. file changes, it can now be used to determine if one of the files has changed.
  53. .. note:: oCC 1.1 and newer require file ID capabilities on the ownCloud server,
  54. hence using them with a server earlier than 4.5.0 is not supported.
  55. Before the 1.3.0 release of the client the sync process might create faux conflict
  56. files if time deviates. The original and the conflict files only differed in the
  57. timestamp, but not in content. This behaviour was changed towards a binary check
  58. if the files are different.
  59. Just like files, directories also hold a unique id, which changes whenever
  60. one of the contained files or directories gets modified. Since this is a
  61. recursive process, it significantly reduces the effort required for a sync
  62. cycle, because the client will only walk directories with a modified unique id.
  63. This table outlines the different sync methods attempted depending
  64. on server/client combination:
  65. .. index:: compatiblity table
  66. +--------------------+-------------------+----------------------------+
  67. | Server Version | Client Version | Sync Methods |
  68. +====================+===================+============================+
  69. | 4.0.x or earlier | 1.0.5 or earlier | Time Stamp |
  70. +--------------------+-------------------+----------------------------+
  71. | 4.0.x or earlier | 1.1 or later | n/a (incompatible) |
  72. +--------------------+-------------------+----------------------------+
  73. | 4.5 or later | 1.0.5 or earlier | Time Stamp |
  74. +--------------------+-------------------+----------------------------+
  75. | 4.5 or later | 1.1 or later | File ID, Time Stamp |
  76. +--------------------+-------------------+----------------------------+
  77. It is highly recommended to upgrade to ownCloud 4.5 or later with ownCloud
  78. Client 1.1 or later, since the time stamp-based sync mechanism can
  79. lead to data loss in certain edge-cases, especially when multiple clients
  80. are involved and one of them is not in sync with NTP time.
  81. .. _`NTP time synchronisation`: http://en.wikipedia.org/wiki/Network_Time_Protocol
  82. .. _Etag: http://en.wikipedia.org/wiki/HTTP_ETag
  83. Comparison and Conflict Cases
  84. ----------------------------
  85. In a sync run the client first has to detect if one of the two repositories have
  86. changed files. On the local repository, the client traverses the file
  87. tree and compares the modification time of each file with the value it was
  88. before. The previous value is stored in the client's database. If it is not, it
  89. means that the file has been added to the local repository. Note that on
  90. the local side, the modificaton time a good attribute to detect changes because
  91. it does not depend on time shifts and such.
  92. For the remote (ie. ownCloud) repository, the client compares the ETag of each
  93. file with it's previous value. Again the previous value is queried from the
  94. database. If the ETag is still the same, the file has not changed.
  95. So what happens if a file has changed on both, the local and the remote repository
  96. since the last sync run? That means it can not easily be decided which version
  97. of the file is the one that should be used. Moreover, changes to any side must
  98. not be lost. That is called the conflict case and the client solves it by creating
  99. a conflict file of the older of the two files and save the newer one under the
  100. original file name. Conflict files are always created on the client and never on
  101. the server. The conflict file has the same name as the original file appended
  102. with the timestamp of the conflict detection.
  103. The Sync Journal
  104. ----------------
  105. The client stores the ETag number in a per-directory database, called the journal.
  106. It is located in the application directory (until version 1.1) or as a hidden file
  107. right in the directory to be synced (later versions).
  108. If the journal database gets removed, oCC's CSync backend will rebuild the database
  109. by comparing the files and their modification times. Thus it should be made sure
  110. that both server and client synchronized to NTP time before restarting the client
  111. after a database removal.
  112. The oCC also provides a button in the Settings Dialog that allows to "reset" the
  113. journal. That can be used to recreate the journal database.