git annex get performance issues with rsync
Adam Spiers
vcs-home at adamspiers.org
Wed Jan 18 17:56:51 CET 2012
On Wed, Jan 18, 2012 at 4:09 PM, Joey Hess <joey at kitenet.net> wrote:
> Adam Spiers wrote:
>> One of my USB drives just died, so I'm doing a 'git annex get --not
>> --copies 1' to re-attain data redundancy. It seems that a new rsync
>> instance is invoked for each file? In my case, I have thousands of
>> photos which are big enough to be worth annexing but still not
>> individually huge, so it seems that the overhead of each rsync
>> invocation is significantly impacting throughput. A quick empirical
>> test showed in 20 seconds, that 'git annex get' managed to transfer 11
>> photos, whereas a single (manual) rsync run transferred 33. Is this
>> easily fixable?
>
> No, it's on the todo list but very far down it.
OK. You mean this?
http://git-annex.branchable.com/todo/parallel_possibilities/
> You can enable ssh's connection sharing though. (ControlMaster)
The figures above were already with ControlMaster enabled.
It helps, but the rsync invocation per file still hurts a lot.
More information about the vcs-home
mailing list