Life is Thinking & Feeling: little about Python and Unicode

Friday, May 04, 2007

little about Python and Unicode

这里的内容摘录自“All about Python and Unicode”
In fact, you should forget all about bytes and think of Unicode strings as sets of symbols.
Unicode objects have no fixed computer representation.

看看下面的示例便知道，Unicode确实只是一种“符号表示”，真正的数据表示应该是'\xNN'这样的一个字节一个字节，所以说Codec做的工作就是这之间的这种转换。
>>> b = repr(uni2)
>>> b"u'\\x1a\\u0bc3\\u1451\\U0001d10c'"
>>> uni2.encode("utf-8")
'\x1a\xe0\xaf\x83\xe1\x91\x91\xf0\x9d\x84\x8c'
>>> print uni2.encode("utf-8")
喁冡憫饾剬
>>> print uni2
Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'gbk' codec can't encode character u'\u0bc3' in position 1:illegal multibyte sequence

Life is Thinking & Feeling

Friday, May 04, 2007

little about Python and Unicode

No comments:

Blog Archive

About Me

Life is Thinking & Feeling

Friday, May 04, 2007

little about Python and Unicode

No comments:

Subscribe To

Blog Archive

About Me