Building a tool to make the process automated

Lois Desplat lois.desplat at gmail.com
Wed Jun 13 03:24:50 CEST 2007


> I'd like to know a little more about Bazaar-NG's suitability for this
> task as compared to Subversion.


1. Performance and Repository Size
Bazaar has done a lot to improve performance over the last few versions and
they are continuing to do so.
On the slightly outdated link you can see some performance/size measures. It
is important to note that BZR has done a lot since then to improve
performance and repository size so I am not too worried about the results it
got.
http://www.ada-france.org/debian/distributed-version-control-systems.html
Then, there is this link:
http://keithp.com/blog/Repository_Formats_Matter.html
with this quote talking about SVN:

"The FSFS backend places one file per revision in a single directory; a test
import of Mozilla generated hundreds of thousands of files in this
directory, causing performance to plummet as more revisions were imported.
I'm not sure what each file contains, but it seems like revisions are
written as deltas to an existing revision, making damage to one file
propagate down through generations. Lack of strong error detection means
such errors will be undetected by the repository. CVS used to suffer badly
from this when NFS would randomly zero out blocks of files.

The Mozilla CVS repository was 2.7GB, imported to Subversion it grew to
8.2GB. Under Git, it shrunk to 450MB. Given that a Mozilla checkout is
around 350MB, it's fairly nice to have the whole project history (from 1998)
in only slightly more space."

Obviously the article is about Git, but it does show that SVN has huge
repository size requirements which is definitely not wanted here. It is
interesting though that so many of you have used SVN and have not gotten any
problems from your repository size. It is really something that I thought
was going to be a major problem.

A big advantage of Bzr is that it is distributed so you can backup or put
your repository a bit anywhere (kinda solves the svn external issue a bit).
Another one is that you can push your changes using ssh, http or using many
other methods.

Another advantage is that you don't get all these .svn directories
everywhere. With bzr you just get a .bzr directory as the root directory of
your repository. Yet another advantage is that you can have lightweight
checkouts, which lets all the history actually be stored on a remote server
or an external drive.

I don't version caches. In general, I prefer to assume something is to
> be left unversioned unless I explicitly add it, although with
> appropriate configuration (and I expect this will involve writing a lot
> of policies and having an extensible mechanism for specifying policies)
> I could be open to changing that behavior.


Yes, it does. Once, I get over the initial planning/design stage, I really
want to get to implement the main tools so that I can call for a testing
round and get all these policies in place.

One thing I've noticed is applications whose configurations you'd like
> to keep, but that automatically update metadata whenever you read the
> files, whether you care about it or not. This leads to lots of changes
> that you commit, even though you don't really want to. (Like bookmarks
> files, and other things that I discuss on the wiki). Having ways of
> dealing with this would be nice.


I found that to be a bit annoying with my $HOME directory under revision
control. It just seems that this could add a lot of unecessary garbage
(metadata) in the VCS even though it is useful.

> Under SVN, how do you make sure that the repository does not grow too
> > large over the years? I would assume that after a few years, you
> > might want to just keep a monthly granularity of your backups so as
> > to reduce the size of the repository?
>
> Hasn't been a problem for me yet, in fact with my backup script (also on
> the wiki somewhere) I've found that my mail spool grows more quickly
> than my repository.


Is there any way I could get some numbers. How big is the $HOME directory
without the version control files? How big is the repository?

I think a big advantage is to keep the file around in the repository so
> that if I need to find it later, I can. What I do need is a better way
> to search through the repository and restore something when I need it.
> The svn command never seems to do it quite right, and when it does,
> it's not easy. Any ideas?


Apple unveiled the Time Machine and I like how you can just search for a
file that has been deleted and it will give it to you using the Finder/Time
Machine interface. I guess just adding a search command/box that returns you
all the files (visually or using the command line) that match which have
been in the repository would fix that.

I suppose the flip side of this is that if you are ever the target of
> litigation or espionage, then you can't plausibly "lose" your old files
> (I'm not sure whether this is allowed or not), but I imagine the people
> developing software in a corporate setting are more concerned about
> this than most home users. It's also possible that if you really are
> concerned than you shouldn't be using version control anyway. (Although
> I'm open to being convinced otherwise about these)


I guess the deletion issue isn't that big of a problem then. It would still
be nice to fully delete a file if you want but I see how it is a very
secondary feature now.

Thank you for your response,

Lois.



More information about the vcs-home mailing list