In fact, you should forget all about bytes and think of Unicode strings as sets of symbols.
Unicode objects have no fixed computer representation.
看看下面的示例便知道,Unicode确实只是一种“符号表示”,真正的数据表示应该是'\xNN'这样的一个字节一个字节,所以说Codec做的工作就是这之间的这种转换。
>>> b = repr(uni2)
>>> b"u'\\x1a\\u0bc3\\u1451\\U0001d10c'"
>>> uni2.encode("utf-8")
'\x1a\xe0\xaf\x83\xe1\x91\x91\xf0\x9d\x84\x8c'
>>> print uni2.encode("utf-8")
喁冡憫饾剬
>>> print uni2
Traceback (most recent call last): File "
No comments:
Post a Comment