(valid) criticisms of Git addressed

Juliano F. Ravasi ml at juliano.info
Wed Aug 27 20:58:03 CEST 2008


martin f krafft wrote:
> If you emptied your inbox, why keep it around? I expect the tools
> I use to recreate empty directories aqs needed.

Yes, but some programs don't expect their directories to disappear. The
mail example was just it, an example.

> There is one thing to be said in favour of in-filesystem metadata,
> such as .gitattributes — conflicts in those are no different than
> conflicts in content files, and all of the standard and advanced
> conflict resolution mechanisms (merge drivers, git-rerere, etc.) can
> be used for those just as well. Surely, this could be remedied by
> exposing the metadata layer as files in the event of conflicts, but
> that would be a hack in my world, and likely come with other
> problems.

In this case it is very similar to Subversion. When conflicts happen in
properties, Subversion acts like if their containers (files or
directories) were directories, and the properties were files (except
that properties can't be copied and renamed). Then everything just works
like they work for files themselves.

In a sense, Svn properties are "small files inside files".

> This has not happened to me before, or well, it's not bitten me.
> Do you mean something like:

No... I mean, for binary files. Most binary formats we use today are
compressed, and the smallest change causes the "avalanche effect" that
makes the end file completely different than the original. It is
virtually impossible for Git to detect such changes. See this example:

# Create and commit test draft of image:

~/tmp/playground% convert -font DejaVu-Sans-Book -pointsize 72
label:Test draft.png
~/tmp/playground% git add draft.png
~/tmp/playground% git commit -m "First version of image."
Created initial commit 9325950: First version of image.
 1 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 draft.png

# Change a single pixel in image and rename it to its final name:

~/tmp/playground% convert draft.png -draw 'point 1,1' final.png
~/tmp/playground% rm draft.png
~/tmp/playground% git add final.png

# Commit and check:

~/tmp/playground% git commit -a -m "Final version of image."
Created commit ff1506c: Final version of image.
 2 files changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 draft.png
 create mode 100644 final.png
~/tmp/playground% git log -M --follow final.png
commit ff1506c6c6e99773c989fc61e8c0e9d73a0cf2db
Author: Juliano F. Ravasi <...>
Date:   Wed Aug 27 15:29:50 2008 -0300

    Final version of image.


See? Just a single-pixel change of an image, together with a rename from
draft.png to final.png broke the history, because Git doesn't record
this information. It depends on heuristics that may be valid for text
files, but not for any other file.

Any human is capable of looking to both draft.png and final.png and will
see clearly that they are almost the same file, and it makes complete
sense to share the history (since final.png was created based on
draft.png). But Git is not smart enough to look inside images to check
that the only difference between them is a single pixel... and it wasn't
designed for this purpose.

> If you store the encoding along with the filename, you'll run into
> a whole lewd of other issues when transcoding.
> 
> My solution to this is just to have UTF-8 everywhere. I am all too
> glad to have waved goodbye to all those encoding nightmares that
> were iso8859-* and ascii.

Yes, but unfortunately, there are many issues that push people to keep
using legacy encodings. They are legacy, but not obsolete. Tons of
Portuguese-localized systems still rely on ISO-8859-1, tons of
Japanese-localized systems still rely on Shift-JIS or EUC-JP, and so
on... It is not simple to just convert everything to UTF-8, it is
something that must be planned, tested, etc.

> I think this is a feature. If we keep adding backwards-compatibility
> layers to tools, we not only make them bigger, more error-prone, and
> harder to maintain, but we also slow down the transition to better
> times.

You have a point. But even if Git embraces and suggests everyone to use
UTF-8, it should at least detect and reject any non-UTF-8 normalized
input, so that you don't end with things like two files with the same
name in the repository, or names that can't be interpreted with any
Unicode meaning (that is necessary when porting to Windows and MacOS X).

Regards,

-- 
Juliano F. Ravasi ·· http://juliano.info/
5105 46CC B2B7 F0CD 5F47 E740 72CA 54F4 DF37 9E96

"A candle loses nothing by lighting another candle." -- Erin Majors

* NOTE: Don't try to reach me through this address, use "contact@" instead.


More information about the vcs-home mailing list