Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

git diff shows unicode symbols in angle brackets

I have a file with unicode symbols (russian text). When I fix some typo I use git diff --color-words=. to see the changes I've done.

In case of unicode (cyrillic) symbols I get some mess with angle brackets like so:

$ cat p1
привет

$ cat p2
Привет

$ git diff --color-words=. --no-index p1 p2
diff --git 1/p1 2/p2
index d0f56e1..d84c480 100644
--- 1/p1
+++ 2/p2
@@ -1 +1 @@
<D0><BF><9F>ривет

It looks like git diff --color-words=. is checking the difference between bytes and not between symbols as I expect.

Is there any way to tell git to work properly with unicode symbols?

UPD about my environment: I get the same on Mac OS and on Linux host.

My shell vars are:

BASH=/bin/bash
HOSTTYPE=x86_64
LANG=ru_RU.UTF-8
OSTYPE=darwin10.0
PS1='h:W u$ '
SHELL=/bin/bash
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
TERM=xterm-256color
TERM_PROGRAM=iTerm.app
_=-l

I have reset git config to default settings like so:

$ git config -l
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
core.ignorecase=true

git version

$ git --version
git version 1.7.3.5
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For me less — the git pager — was to blame (thanks @kostix). Experiment by disabling the pager altogether:

git --no-pager diff p1 p2

My case was commit messages containing emojis; it's fundamentally the same problem though.

$ git log --oneline
93a1866 <U+1F43C>

$ git --no-pager log --oneline
93a1866 ??

$ export LESS='--raw-control-chars'
$ git log --oneline
93a1866 ??

$ git config --global core.pager 'less --raw-control-chars'
$ git log --oneline
93a1866 ??

NB: the --RAW-CONTROL-CHARS option causes less to pass through ANSI color escapes, but will still munge other control chars (emoji included). My less is globally configured with --RAW-CONTROL-CHARS and my git pager with --raw-control-chars as above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...