[announce] Sharebox, a FUSE filesystem relying on git-annex

Joey Hess joey at kitenet.net
Sun Apr 3 17:18:05 CEST 2011


Dieter Plaetinck wrote:

> I think having support for this in git-annex would be very useful,
> even if it's not that efficient: if this can be dealt with in
> git-annex, individual "higherlevel" projects like sharebox and
> dvcs-autosync have less headaches.  Not to mention
> sharebox/dvcs-autosync would need to do really inefficient things to
> deal with it anyway. (because they can't involve themselves into the
> actual git/dvcs tricks, they work on a higher level of abstraction),
> and it might also benefit some users who work with git-annex manually.
> How do you see this? How hard/cumbersome is it to implement this in
> git-annex? Why is it inefficient?  It's not really clear to me after
> reading the smudge information on
> http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html

http://git-annex.branchable.com/todo/smudge/

> > 	if toobig
> > 		then git_annex_add file
> > 		else git_add file
> > 	git_commit file
> 
> unfortunately I don't think so:
> - with dvcs-autosync we often commit "early", as in, the file could still be in the process of being written to, or it could be modified again after we added it.
> From what I understand, we would need to forbid our users from changing the file after it is added to git-annex, and worse: if git-annex does its "move file, replace file with symlink" trick, while the user is writing to it, this might break things.

You're right. However, you would also not want to commit many partial
versions of a large file as it was being written.

> - when a remote A pulls in the changes from remote B, for dropbox-like behavior it should also automatically:
>  * run `git annex get`
>  * git commit .git-annex/*/*.log
> Does this seem about right?

Yes.

> - deletes will also need to propagate automatically (see next paragraph), still need to figure out how to do that best.
> Note that dropbox-like behavior is different from the behavior you usually expect from git-annex users.
> * usual git-annex behavior: every remote stands on it's own, there is no forced "being in sync", so that deletes must happen as initiated by the user, and this way you can prevent them from removing files if you expect it could be the last instance of the file.
> * dropbox-like : remote A remove a file -> *all other remotes* should remove the file, so that their "working copy" looks the same. BUT the file should still be available *somewhere* so that a restore can be initiated (preferably from any of these nodes)
> 
> I see two solutions here:
> - centralized: have 1 (or more) remotes that always keep a copy of the files which are being removed on all other remotes, these would be backup-nodes, they don't follow the strict "always in sync" rule that applies to the regular nodes. (they follow the original git-annex idea more strictly)
> - decentralized: allow users to "remove files" by removing the symlink, but still keep the blob in .git-annex on at least one of the nodes, so that it can be restored from that.

Yes, that's the default behavior if the symlink is removed. There is
then a git annex unused pass that can be used to find and remove unused
content when space is needed. Given the size of modern drives, that
could be run nightly or something.

-- 
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 828 bytes
Desc: Digital signature
URL: <http://lists.madduck.net/pipermail/vcs-home/attachments/20110403/22488f68/attachment.pgp>


More information about the vcs-home mailing list