Python script for automatic synchronization based on inotify

René Mayrhofer rene at mayrhofer.eu.org
Wed Mar 9 21:17:07 CET 2011


Hi everybody,

So far, I was only an avid lurker on this list, but have not yet found time to contribute myself. With this email, I hope to change this and attach a preview of a Python script/daemon that I have been meaning to release for ages, but haven't gotten around to do so. In short, it does what Sparkleshare tries to do, but without any configuration GUI (just a dot-file for configuration and one Python script that runs the background thread). My home directory (or at least, the major part of it) has been under svn and nowadays git control for quite a while, and I have tried to make my life a little easier when it comes to synchronizing the clones between multiple machines for the non-source parts (i.e. normal documents). Early versions of this script have actually even been around before Sparkleshare had been announced, so I was pleased to see others go into the same direction. If not for lack of time, I would have tried to contribute some ideas to Sparkleshare as well. As I don't see that happening anytime soon (time-wise...), I at least want to push out what I have so far.


What does it do?
------------------------
Automatically keep DVCS repositories in sync whenever changes happen by automatically committing and pushing/pulling.

How does it do it?
------------------------
0. Set up desktop notifications (for these nice bubble-style popups when anything happens) and log into a Jabber/XMPP account specified in the config file.

1. Monitor a specific path for changes with inotify.
At the moment, only one path is supported and multiple skript instances have to be run for multiple disjoint paths. This path is assumed to be (part of) a repository. Currently tested with git, but should support most DVCS (the config file allows to specify the DVCS commands called when interacting with it).

2. When changes are detected, check them into the repository that is being monitored (or delete, or move, etc.).
It automatically ignores any patterns listed in .gitignore and the config file allows to exclude other directories (e.g. repositories within the main repository).

3. Wait for a configurable time. When nothing else changes in between, commit.

4. Wait a few seconds longer (again configurable) and, if nothing else is commited, initiate a push.

5. After the push has finished, send an XMPP message to self (that is, to all clients logged in with the same account) to notify other accounts of the push.

[At any time in between]. When receiving a proper XMPP message, pull from the repository.


Thoughts that should be considered at some point but have not yet been implemented:
------------------------
- The XMPP push message already contains a parameter, namely the repository the push went to. Add another parameter to specify the repository in which the change happened so that others can try to pull directly from there, in case it is quicker. The main use case for this optimization is my standard one: the laptop sitting next to the desktop and both of them syncing each other's home directories. Going via the main, hosted server is quite a bit more inefficient than pulling via 1GB/s LAN....

- Pulls and pushes can and should be optimized. At the moment, I take a conservative locking approach whenever a conflict may occur and performance is reasonable on my main work tree with ca. 16GB (cloned GIT repo), but not stellar. Specifically, actually implement the "optimized" pull lock strategy already described in the example config file.

- Implement another option for synchronization besides XMPP (idea: a simple broadcast reflector on a single TCP port that could even run on e.g. OpenWRT, or re-use whatever the Sparkleshare server does).

- Automatically adding some context to each commit message besides the automatic date/time would be useful for finding out why a change happened. Nepomuk anybody (just kidding, maybe, for now...)?

- Allow to specify commit messages via popups. When ignored, use default commit message.

Installation
------------------------
Simple. Copy the attached .autosync-example config file to ~/.autosync, change to your needs (paths including ignores and XMPP id/password), then run the autosync.py script. Note that it currently needs a slightly extended version of jabberbot.py (e.g. in the same directory from which autosync.py is executed) to allow reception of messages from its own XMPP Id. I would like to push these minimal changes upstream, but haven't done that so far.

Disclaimer
------------------------
This is my first Python program that is longer than 100 lines. Please be easy on me with the patches, complaints and "what did you think, doing it this way?" messages. I have tried to comment wherever I found it necessary for my own understanding, but this is neither the best structured nor the most elegant program I ever wrote. Any hints for improving it are greatly welcome, and interoperability patches to work with Sparkleshare even more so. In the future, the two projects should definitely interoperate, which will come done to implementing each other's notification mechanism. My autosync Python script could then be used wherever headless operation might be required and/or Mono is not installed.
I have tested it between three systems and, in this version, it works reasonably well. However, there does seem to be the occasional kink when editors go crazy on temporary file creation, renaming, deleting originals, etc. These might be races, but I don't know for certain yet. Additional test cases are more then welcome. This script should be fairly safe to try, considering that the worst it will do is add a few hundred commits to your DVCS repo and push them to the configured default remote. But, after all, what is the point in using a DVCS if you can't roll back any changes made by you or a buggy script (yes, I did have to do that a number of times while developing the manual inotify event coalescing to cooperate better with git add/remove/mv actions).


If there are any questions, don't hesitate to drop me a line. However, I might be unable to answer quickly, as I am just in the middle of a big teaching block. 

best regards,
Rene


-------------- next part --------------
A non-text attachment was scrubbed...
Name: autosync.py
Type: text/x-python
Size: 25928 bytes
Desc: not available
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110309/62d47181/attachment-0002.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jabberbot.py
Type: text/x-python
Size: 15588 bytes
Desc: not available
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110309/62d47181/attachment-0003.py>
-------------- next part --------------
[autosync]
path = ~/amw
pidfile = ~/.autosync.pid
syncmethod = xmpp
#syncmethod = autosync-server

# There are currently two options for handling file notifications, as neither 
# one is perfect. You can choose between the 'conservative' option, which is
# slower but should work in every corner case, and the 'optimized' option, 
# which will consume less CPU and I/O resources on a remotely-triggered pull,
# but may miss local changes until the next time autosync is restarted or a
# manual commit is done on the repository.
#
# The problem is that during a pull from the remote repository, changes will
# be applied to the local file system and consequently generate file-changed
# events. These events are in turn translated to add/remove/move commands for
# the DVCS, which would duplicate the remote changes locally in the history and
# obviously doesn't work e.g. for file removes. Therefore, the file/dir changes
# caused by a remote pull must not be translated to local DCVS changes.
# The conservative strategy solves this problem by completely suspending event
# handling while the pull is active. Because it is possible that _real_ local
# changes occur concurrently to the pull, the startup command will be run after
# the pull has been finished and event processing was resumed again. This is a
# safe option, as all local changes that occurred before or during the pull
# will be picked up by the DCVS client. However, when no local changes occurred
# (which is more probable), then this strategy causes unnecessary I/O overhead.
#
# The optimized strategy also suspends the execution of local DCVS actions 
# triggered by file/directory events during the pull, but does not completely
# discard them. Instead, all events that occurred during the pull are recorded
# in an event queue which is replayed after the pull has finished. The 
# advantage is that a complete re-scan of the local repository is avoided and
# only those files/directories that saw some modification are re-checked for 
# local changes. The disadvantage is that this depends more strongly on the
# change detection capabilities (trivial ones done by autosync-dcvs and more
# complex ones done by the respective DCVS client) and it is therefore not 
# guaranteed that all local, concurrent changes are being detected. This option
# is still being evaluated for corner cases where it doesn't work, and 
# therefore is not yet the default strategy.
pulllock = conservative
#pulllock = optimized

# The number of seconds to wait for additional events before acting. Setting 
# this lower will increase the synchronization speed at the cost of CPU and
# transfer resources.
readfrequency = 5
ignorepath = .git .svn .hg src/packages src/java/openuat 
    src/csharp/sparkleshare src/cpp/cross/keepassx src/android/ipv6config 

# Note: addcmd, rmcmd, and modifycmd take one argument, movecmd takes two (first the source, then the destination).
# Note: statuscmd should return with code 0 when nothing has changed in the 
# local checked-out tree that needs to be committed and non-zero when a commit
# is required.
[dcvs]
# for git
statuscmd = git status | grep -iq "nothing to commit"
addcmd = git add %s
rmcmd = git rm %s
modifycmd = git add %s
# doesn't work when the source file no longer exists, git expects to move it itself
#movecmd = git mv %s %s
# use this instead, git will figure out that it was a move because the file is similar
movecmd = git rm %s 
    git add %s
startupcmd = git add -A
commitcmd = git commit -m "Autocommit"
pushcmd = git push
pullcmd = git pull
remoteurlcmd = git config --get remote.origin.url

# for mercurial
#statuscmd = hg status
#addcmd = hg add
#rmcmd = hg remove
#modifycmd = 
#movecmd = hg mv %s %s
#startupcmd = hg addremove
#commitcmd = hg commit -m "Autocommit"
#pushcmd = hg push
#pullcmd = hg pull -u

[xmpp]
username = your XMPP id here
password = your XMPP password here
alsonotify = if set, another XMPP id that will get notified when something happens

[autosync-server]
server = http://whatever.sync.server
username = your-username
password = your-password
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110309/62d47181/attachment-0001.pgp>


More information about the vcs-home mailing list