Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
288 views
in Technique[技术] by (71.8m points)

python - Programmatically tell if a Unicode character takes up more than one character space in a terminal

I discovered that in the Mac OS X Terminal, some Unicode characters take up more than one character space. For example 27FC (long rightwards arrow from bar). It prints two characters wide, but the second character prints on top of whatever the next character is, so you have to do ?<space> for it to print correctly. For example, ?a prints like. Arrow + a (I made the font size large so that you could see it, but it does it for all font sizes).

By the way, this is the Menlo font in the Mac OS X 10.6 Terminal application.

23B3 (SUMMATION TOP) actually prints as two characters wide and tall (at least in Safari, it does this in the browser too, notice how it overlaps with the above line)?

However, in the terminal in Ubuntu, none of these characters print wider or taller than one character.

Is there a way to programmatically tell if a character takes up more than one space?

I'm using Python, so something that works either in pure Python or on POSIX (i.e., I can call some bash command using the os module) would be preferred.

Also, I should note that if I increase the "Character Spacing" setting in the font settings of the terminal to 1.5 (from the default 1.0), then it looks like Arrow + a spaced.

Also, it'd be nice if an answer could give some insight into all of this (i.e., why does it happen?)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

While it's not relevant for the specific examples you give (all of which display at the size of a single character for me on Ubuntu), CJK characters have a unicode property which indicates that they are wider than normal, and display at double width in some terminals.

For example, in python:

# 'a' is a normal (narrow) character
# '愛' can be interpreted as a double-width (wide) character
import unicodedata
assert unicodedata.east_asian_width('a') == 'N'
assert unicodedata.east_asian_width('愛') == 'W'

Apart from this, I don't think there's a specification for how much space certain characters should take up, other than the size of the glyph in whatever font you are using (which your terminal is probably ignoring for the reason Ignacio gave).

For more info on the "east asian width" property, see http://www.unicode.org/reports/tr11/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...