Python script for automatic synchronization based on inotify

Fri Mar 11 10:16:01 CET 2011

On Wed, 9 Mar 2011 21:17:07 +0100
René Mayrhofer <rene at mayrhofer.eu.org> wrote:

> 
> What does it do?
> ------------------------
> Automatically keep DVCS repositories in sync whenever changes happen by automatically committing and pushing/pulling.
>

I like this distributed sync approach, however I can see cases where
automatic pushes are not wanted, but that should be easy to handle as
the problems of monitoring changes (your 1.-4. below) and the one of
synchronizing repositories (your 5. below) are quite independent. Do you
think synchronization could be triggered manually using some hook on
the central repository or on the VCS clients?

Maybe you could try to separate those problems more clearly in the
program.

I can see the vcs-home problem partitioned like this:
  a. Track changes (keeping host/user settings separate from sharable
     ones)
  b. Monitor/Store changes (automatic commits vs. manual commits)
  c. Distribute/Synchronize changes
  d. Notify changes to physical users

I'd say your solution for c. is my preferred one for the moment, are
you using XMMP resources to differentiate logins for the same account?
Naively I'd use some combination of user/host/repository as a resource
id (http://wiki.xmpp.org/web/Jabber_Resources).

> How does it do it?
> ------------------------
> 0. Set up desktop notifications (for these nice bubble-style popups when anything happens) and log into a Jabber/XMPP account specified in the config file.
> 
> 1. Monitor a specific path for changes with inotify.
> At the moment, only one path is supported and multiple skript instances have to be run for multiple disjoint paths. This path is assumed to be (part of) a repository. Currently tested with git, but should support most DVCS (the config file allows to specify the DVCS commands called when interacting with it).
> 
> 2. When changes are detected, check them into the repository that is being monitored (or delete, or move, etc.).
> It automatically ignores any patterns listed in .gitignore and the config file allows to exclude other directories (e.g. repositories within the main repository).
> 
> 3. Wait for a configurable time. When nothing else changes in between, commit.
> 
> 4. Wait a few seconds longer (again configurable) and, if nothing else is commited, initiate a push.
> 
> 5. After the push has finished, send an XMPP message to self (that is, to all clients logged in with the same account) to notify other accounts of the push.
> 
> [At any time in between]. When receiving a proper XMPP message, pull from the repository.
> 
> 
> Thoughts that should be considered at some point but have not yet been implemented:
> ------------------------
> - The XMPP push message already contains a parameter, namely the repository the push went to. Add another parameter to specify the repository in which the change happened so that others can try to pull directly from there, in case it is quicker. The main use case for this optimization is my standard one: the laptop sitting next to the desktop and both of them syncing each other's home directories. Going via the main, hosted server is quite a bit more inefficient than pulling via 1GB/s LAN....
> 
> - Pulls and pushes can and should be optimized. At the moment, I take a conservative locking approach whenever a conflict may occur and performance is reasonable on my main work tree with ca. 16GB (cloned GIT repo), but not stellar. Specifically, actually implement the "optimized" pull lock strategy already described in the example config file.
> 
> - Implement another option for synchronization besides XMPP (idea: a simple broadcast reflector on a single TCP port that could even run on e.g. OpenWRT, or re-use whatever the Sparkleshare server does).
> 
> - Automatically adding some context to each commit message besides the automatic date/time would be useful for finding out why a change happened. Nepomuk anybody (just kidding, maybe, for now...)?
> 
> - Allow to specify commit messages via popups. When ignored, use default commit message.
> 
> Installation
> ------------------------
> Simple. Copy the attached .autosync-example config file to ~/.autosync, change to your needs (paths including ignores and XMPP id/password), then run the autosync.py script. Note that it currently needs a slightly extended version of jabberbot.py (e.g. in the same directory from which autosync.py is executed) to allow reception of messages from its own XMPP Id. I would like to push these minimal changes upstream, but haven't done that so far.
> 
> Disclaimer
> ------------------------
> This is my first Python program that is longer than 100 lines. Please be easy on me with the patches, complaints and "what did you think, doing it this way?" messages. I have tried to comment wherever I found it necessary for my own understanding, but this is neither the best structured nor the most elegant program I ever wrote. Any hints for improving it are greatly welcome, and interoperability patches to work with Sparkleshare even more so. In the future, the two projects should definitely interoperate, which will come done to implementing each other's notification mechanism. My autosync Python script could then be used wherever headless operation might be required and/or Mono is not installed.

Just a note about coding style you could try to follow PEP-8 guidelines
for python code http://www.python.org/dev/peps/pep-0008/
There are automatic style checkers to help you with that.

> I have tested it between three systems and, in this version, it works reasonably well. However, there does seem to be the occasional kink when editors go crazy on temporary file creation, renaming, deleting originals, etc. These might be races, but I don't know for certain yet. Additional test cases are more then welcome. This script should be fairly safe to try, considering that the worst it will do is add a few hundred commits to your DVCS repo and push them to the configured default remote. But, after all, what is the point in using a DVCS if you can't roll back any changes made by you or a buggy script (yes, I did have to do that a number of times while developing the manual inotify event coalescing to cooperate better with git add/remove/mv actions).
> 
> 
> If there are any questions, don't hesitate to drop me a line. However, I might be unable to answer quickly, as I am just in the middle of a big teaching block. 
> 
> best regards,
> Rene
> 
> 

Thanks for sharing,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110311/3da112f3/attachment.pgp>