dvcs-autosync+git is broken by design
Dieter Plaetinck
dieter at plaetinck.be
Sun Mar 4 13:25:54 CET 2012
I use dvcs-autosync+git on a daily basis, and contributed to the project.
In my previous email (http://www.mail-archive.com/vcs-home@lists.madduck.net/msg00579.html) I found some issues wrt race conditions and handling FS events correctly with dvcs-autosync. Basically realizing dvcs-autosync behavior is racey and can do the wrong thing (like deleting your files when it shouldn't: if we start `git rm` but the file gets added just before/when that happens, although this can be alleviated with `git rm --cached`). This prompted me to look into atomic operations (see point 3 in that mail) and more specifically started this thread on the git mailing list:
http://git.661346.n2.nabble.com/PATCH-git-add-allow-ignore-missing-always-not-just-in-dry-run-td7201858.html
My assumption was that at least git would be able to work atomically, but I was wrong.
=> If you modify a file while 'git add' is running on that file, 'git add' may abort and give an error.
The most important information is in this post, I think http://git.661346.n2.nabble.com/PATCH-git-add-allow-ignore-missing-always-not-just-in-dry-run-tp7201858p7208529.html
So what we really need is a way to bring the git index reliably in a state that corresponds with the state of the working copy (this doesn't need to be the most recent state, if the working copy is in constant flux, as long as the index becomes consistent once the WC settles down), and this without ever executing 'git rm' (as that can remove a file if you recreate the file just before/when you execute the git command), and without failing on files that got removed just before/during the add/sync operation, all that keeping in mind the file should not be changed when running 'git add' (or run 'git add' repeatedly with my patch applied until it doesn't give an error)
how could we fix all this?
some kind of snapshot mechanism? preferably very light-weight and per-file/per-directory? and with COW because it should be transparent (files should never be blocked from being written to). maybe some tricks with file descriptors, implementing an own tool that can "sync WC changes to the index" for this particular use case? restricting dvcs-autosync to certain FS'es (lvm snapshots, a fuse wrapper, ...?)
I think I just found a pretty simple solution. How about this:
* have dvcs-autosync collect inotify events like how it currently does, allowing some time for multiple events to coalesce.
* once it decides it's time to sync changes to the index (i.e. WC has been stable for a certain amount of time), it only needs to look at the _last_ notify event:
-> if the last notify event for the file was a create/modify, call this script: http://pastie.org/3518002
-> if it was a delete event: run git rm --cached
* if any further events happened, just repeat the above. there is no chance of accidental file removal or pushing the wrong things to the index, and it guarantees that the index will always be in sync with the WC once it settles down.
Dieter
More information about the vcs-home
mailing list