git annex get performance issues with rsync

Adam Spiers vcs-home at adamspiers.org
Wed Jan 18 17:56:51 CET 2012


On Wed, Jan 18, 2012 at 4:09 PM, Joey Hess <joey at kitenet.net> wrote:
> Adam Spiers wrote:
>> One of my USB drives just died, so I'm doing a 'git annex get --not
>> --copies 1' to re-attain data redundancy.  It seems that a new rsync
>> instance is invoked for each file?  In my case, I have thousands of
>> photos which are big enough to be worth annexing but still not
>> individually huge, so it seems that the overhead of each rsync
>> invocation is significantly impacting throughput.  A quick empirical
>> test showed in 20 seconds, that 'git annex get' managed to transfer 11
>> photos, whereas a single (manual) rsync run transferred 33.  Is this
>> easily fixable?
>
> No, it's on the todo list but very far down it.

OK.  You mean this?

    http://git-annex.branchable.com/todo/parallel_possibilities/

> You can enable ssh's connection sharing though. (ControlMaster)

The figures above were already with ControlMaster enabled.
It helps, but the rsync invocation per file still hurts a lot.


More information about the vcs-home mailing list