[announce] Sharebox, a FUSE filesystem relying on git-annex

Joey Hess joey at kitenet.net
Thu Mar 31 19:50:46 CEST 2011


Christophe-Marie Duquesne wrote:
> I am currently writing a FUSE file system based on git-annex for
> replicating binary files on several machines. I thought I could share
> it here in order to get some ideas and contributors.

Wow, you have completely anticipated a blog post I was gonna make in a
few days that a) announces git-annex's support for using Amazon S3 as a git
"remote", and b) suggests that a free, distributed dropbox-type thing
could be built on this foundation.

My day, no, my week, is officially made. This is close enough to my
birthday that you are in the running for best birthday present. :)

> What are your goals?
> Seamless synchronization "à la dropbox".
> Ability to use with big binary files such as mp3/movies.
> Entirely decentralized.
> Don't use unnecessary space
> Keep it simple: avoid special VCS commands and keep a filesystem
> interface as much as possible.

100% agree with this list, although I think that explicitly not
mentioning what kind of large binary files a tool might be used to
store is a wise thing. ;)

> Why?
> Because sparkleshare and dvcs-autosync are bad at versioning binary files

I have not looked at sparkleshare, but have been wondering if it could
be adapted to be used as a GUI frontend for git annex.

> What do you have?
> A python implementation. It is about 600 sloc, and you'll find it on
> https://github.com/chmduquesne/sharebox
> Be careful, it is very alpha and it still does not have a proper
> conflict handler.
> 
> Hey, but copying is slow!
> On my machine, copying files to a sharebox fs is about 10 times slower
> than copying it on a normal fs. All the time is spent in python's
> os.write(): I guess the only way to work around this problem is to
> rewrite the whole thing in C, but I am keeping this for later.

I do wonder if a FUSE filesystem is really the best approach. Even a tight
C implementation will need to read/write entire file contents to put
them into the filesystem. Notice that git-annex avoids doing any copying
of large file content when adding a file (it even defaults to using a
backend that doesn't checksum, in order to preserve maximum speed).

I had been thinking more along the lines of an inotify daemon
that watches a directory (like dvcs-autosync), and drives git-annex.

One real benefit of a filesystem is that you can support
modififying the files, and proxy that through to git-annex as a delete
of the old object and an add of the new object. That certainly has vaue
-- do you do it?

-- 
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 828 bytes
Desc: Digital signature
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110331/599ed001/attachment.pgp>


More information about the vcs-home mailing list