I'm a embedded software engineer! I'm now learning English. and i like photography. I also like Programming with Python, and learning the Turbo Gears! for Web Development!

Friday, May 04, 2007

little about Python and Unicode

这里的内容摘录自“All about Python and Unicode
In fact, you should forget all about bytes and think of Unicode strings as sets of symbols.
Unicode objects have no fixed computer representation.

看看下面的示例便知道,Unicode确实只是一种“符号表示”,真正的数据表示应该是'\xNN'这样的一个字节一个字节,所以说Codec做的工作就是这之间的这种转换。
>>> b = repr(uni2)
>>> b"u'\\x1a\\u0bc3\\u1451\\U0001d10c'"
>>> uni2.encode("utf-8")
'\x1a\xe0\xaf\x83\xe1\x91\x91\xf0\x9d\x84\x8c'
>>> print uni2.encode("utf-8")
喁冡憫饾剬
>>> print uni2
Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'gbk' codec can't encode character u'\u0bc3' in position 1:illegal multibyte sequence

No comments: