Python 2.2 有什么新变化¶

作者:: A.M. Kuchling

概述¶

本文本介绍了 Python 2.2.2 的新增特性，该版本发布于 2002 年 10 月 14日。 Python 2.2.2 是 Python 2.2 的问题修正发布版，最初发布于 2001 年 12 月 21 日。

Python 2.2 可以被看作是 "清理发布版"。有一些特性如生成器和迭代器等是全新的，但大多数变化，尽管可能是重大而深远的，都是为了清理语言设计中的不规范和阴暗角落。

本文并不试图提供对新特性的完整规范说明，而是提供一个便捷的概览。要获取全部细节，你应该参阅 Python 2.2 的文档，比如 Python 库参考和 Python 参考指南。如果你想要了解某项更改的完整实现和设计理念，请参阅特定新特性的 PEP。

PEP 252 和 253：类型和类的修改¶

Python 2.2 中最大且影响最深远的改变是针对 Python 的对象和类模型。这些变化应该是向下兼容的，因此你的代码将能继续运行而无需修改，但这些变化提供了一些很棒的新功能。在开始本文最长和最复杂的部分之前，我提供对这些变化的概览并附带一些注释。

很久以前，我写过一个网页，列出了 Python 设计中的一些缺陷。其中一个最显著的缺陷是无法对子类化用 C 实现的 Python 类型。特别是，无法对子类化内置类型，因此你不能简单地对子类化列表，以便向其添加一个有用的方法。虽然 UserList 模块提供了一个支持所有列表方法的类并且可以进一步子类化，但有很多 C 代码期望的是一个普通的 Python 列表，不能接受 UserList 实例。

Python 2.2 修正了此问题，并在此过程中添加了一些令人激动的新功能。简明概述如下:

你可以继承内置类型，例如列表和整数，并且你的子类应该在任何需要原始类型的地方正常工作。这使得 Python 的面向对象编程更加灵活和强大。
现在，除了之前版本的 Python 中可用的实例方法外，还可以定义静态方法和类方法。这使得你可以更灵活地组织类的行为。
还可以通过使用一种称为属性的新机制，在访问或设置实例属性时自动调用方法。许多 __getattr__() 的用法可以重写为使用属性，从而使代码更简单、更快速。作为一个小的额外好处，属性现在也可以有文档字符串。
可以使用 __slots__ 限制实例的合法属性列表，从而防止拼写错误，并且在未来的 Python 版本中可能进行更多的优化。

一些用户对这些变化表示担忧。确实，他们说，新功能很棒，可以实现以前版本的 Python 无法做到的各种技巧，但它们也使语言变得更加复杂。一些人表示，他们一直推荐 Python 是因为它的简单性，现在感觉这种简单性正在丧失。

个人而言，我认为没有必要担心。许多新功能相当深奥，你可以编写大量 Python 代码而不需要了解它们。编写一个简单的类并不比以前更难，因此除非确实需要，否则你不必费心去学习或教授这些新功能。一些以前只有在 C 语言中才能实现的非常复杂的任务，现在可以用纯 Python 实现，在我看来，这一切都更好了。

我不会尝试涵盖所有为了使新功能生效而需要的每一个边缘情况和小改动。相反，本节将只勾勒出大致的轮廓。有关 Python 2.2 新对象模型的更多信息，请参见相关链接的“相关链接”部分。

旧式类和新式类¶

首先，你应该知道 Python 2.2 实际上有两种类型的类：经典类（或旧式类）和新式类。旧式类模型与早期版本的 Python 中的类模型完全相同。本节描述的所有新功能仅适用于新式类。这种分歧并不是永久的；最终，旧式类将被淘汰，可能在 Python 3.0 中被移除。

那么如何定义一个新式类呢？你可以通过继承一个现有的新式类来实现。大多数 Python 内置类型，如整数、列表、字典，甚至文件，现在都是新式类。此外，还添加了一个名为 object 的新式类，它是所有内置类型的基类，因此如果没有合适的内置类型，你可以直接继承 object 类：

class C(object):
    def __init__ (self):
        ...
    ...

这意味着在 Python 2.2 中，不带任何基类的 class 声明总是经典类。（实际上，你也可以通过设置一个名为 __metaclass__ 的模块级变量来改变这一点——详见 PEP 253 ——但更简单的方法是直接继承 object。）

内置类型的类型对象在 Python 2.2 中作为内置对象提供，使用了一种巧妙的技巧命名。Python 一直有名为 int()、float() 和 str() 的内置函数。在 Python 2.2 中，它们不再是函数，而是作为被调用时表现为工厂的类型对象。

>>> int
<type 'int'>
>>> int('123')
123

为了使类型集合完整，Python 2.2 中还添加了新的类型对象，例如 dict() 和 file()。以下是一个更有趣的例子，向文件对象添加一个 lock() 方法：

class LockableFile(file):
    def lock (self, operation, length=0, start=0, whence=0):
        import fcntl
        return fcntl.lockf(self.fileno(), operation,
                           length, start, whence)

现在已经过时的 posixfile 模块包含一个类，该类模仿了文件对象的所有方法，并添加了一个 lock() 方法，但这个类不能传递给期望内置文件对象的内部函数，而这在我们的新 LockableFile 实现中是可能的。

描述器¶

在以前的 Python 版本中，没有一致的方法来发现对象支持的属性和方法。有一些非正式的约定，例如定义 __members__ 和 __methods__ 属性，这些属性是名称列表，但扩展类型或类的作者往往不会去定义它们。你可以退而求其次，检查对象的 __dict__ 属性，但在使用类继承或任意的 __getattr__() 钩子时，这仍然可能是不准确的。

新类模型的一个核心理念是正式化了使用描述符来描述对象属性的 API。描述符指定属性的值，说明它是方法还是字段。通过描述符 API，静态方法和类方法成为可能，以及其他更复杂的构造。

属性描述符是存在于类对象内部的对象，它们自身具有一些属性。描述符协议由三个主要方法组成：

__name__ 是属性的名称。
__doc__ 是属性的文档字符串。
__get__(object) 是一个从 object 中提取属性值的方法。
__set__(object, value) 将 object 上的属性设为 value。
__delete__(object, value) 将删除 object 的 value 属性。

例如，当你写下 obj.x，Python 实际要执行的步骤是:

descriptor = obj.__class__.x
descriptor.__get__(obj)

对于方法，descriptor.__get__() 返回一个可调用的临时对象，它将实例和要调用的方法封装在一起。这也是为什么现在可以实现静态方法和类方法的原因；它们有描述符，可以只封装方法，或者封装方法和类。作为对这些新方法类型的简要说明，静态方法不传递实例，因此类似于常规函数。类方法传递对象的类，但不传递对象本身。静态方法和类方法的定义如下：

class C(object):
    def f(arg1, arg2):
        ...
    f = staticmethod(f)

    def g(cls, arg1, arg2):
        ...
    g = classmethod(g)

The staticmethod() function takes the function f(), and returns it wrapped up in a descriptor so it can be stored in the class object. You might expect there to be special syntax for creating such methods (def static f, defstatic f(), or something like that) but no such syntax has been defined yet; that's been left for future versions of Python.

更多的新功能，如 __slots__ 和属性，也作为新类型的描述符实现。编写一个实现新功能的描述符类并不困难。例如，可以编写一个描述符类，使其能够为方法编写类似 Eiffel 风格的前置条件和后置条件。使用该功能的类可能定义如下：

from eiffel import eiffelmethod

class C(object):
    def f(self, arg1, arg2):
        # The actual function
        ...
    def pre_f(self):
        # Check preconditions
        ...
    def post_f(self):
        # Check postconditions
        ...

    f = eiffelmethod(f, pre_f, post_f)

请注意，使用新 eiffelmethod() 的人不必了解任何关于描述符的知识。这就是我认为新功能不会增加语言基本复杂性的原因。会有一些向导需要了解它，以便编写 eiffelmethod() 或 ZODB 或其他内容，但大多数用户只会在生成的库之上编写代码，而不会理会实现细节。

多重继承：钻石规则¶

通过改变名称解析规则，多重继承也变得更加有用。请看下面这组类（图表摘自 PEP 253 ，作者 Guido van Rossum）：

      class A:
        ^ ^  def save(self): ...
       /   \
      /     \
     /       \
    /         \
class B     class C:
    ^         ^  def save(self): ...
     \       /
      \     /
       \   /
        \ /
      class D

经典类的查找规则很简单，但并不高明；基类的查找是深度优先的，从左到右依次查找。对 D.save() 的引用将搜索类 D 、B ，然后是 A ，其中 save() 将被找到并返回。C.save() 根本不会被找到。这很糟糕，因为如果 C 的 save() 方法正在保存 C 特有的某些内部状态，不调用该方法将导致该状态永远不会被保存。

新式类遵循一种不同的算法，虽然解释起来有点复杂，但在这种情况下能做正确的事情。（请注意，Python 2.3 改变了这个算法，在大多数情况下会产生相同的结果，但对于非常复杂的继承图会产生更有用的结果。）

List all the base classes, following the classic lookup rule and include a class multiple times if it's visited repeatedly. In the above example, the list of visited classes is [D, B, A, C, A].
Scan the list for duplicated classes. If any are found, remove all but one occurrence, leaving the last one in the list. In the above example, the list becomes [D, B, C, A] after dropping duplicates.

Following this rule, referring to D.save() will return C.save(), which is the behaviour we're after. This lookup rule is the same as the one followed by Common Lisp. A new built-in function, super(), provides a way to get at a class's superclasses without having to reimplement Python's algorithm. The most commonly used form will be super(class, obj), which returns a bound superclass object (not the actual class object). This form will be used in methods to call a method in the superclass; for example, D's save() method would look like this:

class D (B,C):
    def save (self):
        # Call superclass .save()
        super(D, self).save()
        # Save D's private information here
        ...

super() 在以 super(class) 或 super(class1, class2) 形式调用时也可以返回未绑定的超类对象，但这可能并不常用。

属性访问¶

A fair number of sophisticated Python classes define hooks for attribute access using __getattr__(); most commonly this is done for convenience, to make code more readable by automatically mapping an attribute access such as obj.parent into a method call such as obj.get_parent. Python 2.2 adds some new ways of controlling attribute access.

First, __getattr__(attr_name) is still supported by new-style classes, and nothing about it has changed. As before, it will be called when an attempt is made to access obj.foo and no attribute named foo is found in the instance's dictionary.

New-style classes also support a new method, __getattribute__(attr_name). The difference between the two methods is that __getattribute__() is always called whenever any attribute is accessed, while the old __getattr__() is only called if foo isn't found in the instance's dictionary.

However, Python 2.2's support for properties will often be a simpler way to trap attribute references. Writing a __getattr__() method is complicated because to avoid recursion you can't use regular attribute accesses inside them, and instead have to mess around with the contents of __dict__. __getattr__() methods also end up being called by Python when it checks for other methods such as __repr__() or __coerce__(), and so have to be written with this in mind. Finally, calling a function on every attribute access results in a sizable performance loss.

property is a new built-in type that packages up three functions that get, set, or delete an attribute, and a docstring. For example, if you want to define a size attribute that's computed, but also settable, you could write:

class C(object):
    def get_size (self):
        result = ... computation ...
        return result
    def set_size (self, size):
        ... compute something based on the size
        and set internal state appropriately ...

    # Define a property.  The 'delete this attribute'
    # method is defined as None, so the attribute
    # can't be deleted.
    size = property(get_size, set_size,
                    None,
                    "Storage size of this instance")

That is certainly clearer and easier to write than a pair of __getattr__()/__setattr__() methods that check for the size attribute and handle it specially while retrieving all other attributes from the instance's __dict__. Accesses to size are also the only ones which have to perform the work of calling a function, so references to other attributes run at their usual speed.

Finally, it's possible to constrain the list of attributes that can be referenced on an object using the new __slots__ class attribute. Python objects are usually very dynamic; at any time it's possible to define a new attribute on an instance by just doing obj.new_attr=1. A new-style class can define a class attribute named __slots__ to limit the legal attributes to a particular set of names. An example will make this clear:

>>> class C(object):
...     __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'

Note how you get an AttributeError on the attempt to assign to an attribute not listed in __slots__.

PEP 234: 迭代器¶

Python 2.2 的另一个重要新增功能是在 C 和 Python 两个层面上引入了迭代接口。对象可以定义如何被调用者循环遍历。

In Python versions up to 2.1, the usual way to make for item in obj work is to define a __getitem__() method that looks something like this:

def __getitem__(self, index):
    return <next item>

__getitem__() is more properly used to define an indexing operation on an object so that you can write obj[5] to retrieve the sixth element. It's a bit misleading when you're using this only to support for loops. Consider some file-like object that wants to be looped over; the index parameter is essentially meaningless, as the class probably assumes that a series of __getitem__() calls will be made with index incrementing by one each time. In other words, the presence of the __getitem__() method doesn't mean that using file[5] to randomly access the sixth element will work, though it really should.

In Python 2.2, iteration can be implemented separately, and __getitem__() methods can be limited to classes that really do support random access. The basic idea of iterators is simple. A new built-in function, iter(obj) or iter(C, sentinel), is used to get an iterator. iter(obj) returns an iterator for the object obj, while iter(C, sentinel) returns an iterator that will invoke the callable object C until it returns sentinel to signal that the iterator is done.

Python classes can define an __iter__() method, which should create and return a new iterator for the object; if the object is its own iterator, this method can just return self. In particular, iterators will usually be their own iterators. Extension types implemented in C can implement a tp_iter function in order to return an iterator, and extension types that want to behave as iterators can define a tp_iternext function.

So, after all this, what do iterators actually do? They have one required method, next(), which takes no arguments and returns the next value. When there are no more values to be returned, calling next() should raise the StopIteration exception.

>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
>>>

In 2.2, Python's for statement no longer expects a sequence; it expects something for which iter() will return an iterator. For backward compatibility and convenience, an iterator is automatically constructed for sequences that don't implement __iter__() or a tp_iter slot, so for i in [1,2,3] will still work. Wherever the Python interpreter loops over a sequence, it's been changed to use the iterator protocol. This means you can do things like this:

>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)

Iterator support has been added to some of Python's basic types. Calling iter() on a dictionary will return an iterator which loops over its keys:

>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10

That's just the default behaviour. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator. In a minor related change, the in operator now works on dictionaries, so key in dict is now equivalent to dict.has_key(key).

Files also provide an iterator, which calls the readline() method until there are no more lines in the file. This means you can now read each line of a file using code like this:

for line in file:
    # do something for each line
    ...

Note that you can only go forward in an iterator; there's no way to get the previous element, reset the iterator, or make a copy of it. An iterator object could provide such additional capabilities, but the iterator protocol only requires a next() method.

参见

PEP 234 - 迭代器: 由 Ka-Ping Yee 和 GvR 撰写；由 Python Labs 小组（主要由 GvR 和 Tim Peters）实现。

PEP 255: 简单的生成器¶

生成器是另一个新增特性，它是与迭代器的引入相互关联的。

You're doubtless familiar with how function calls work in Python or C. When you call a function, it gets a private namespace where its local variables are created. When the function reaches a return statement, the local variables are destroyed and the resulting value is returned to the caller. A later call to the same function will get a fresh new set of local variables. But, what if the local variables weren't thrown away on exiting a function? What if you could later resume the function where it left off? This is what generators provide; they can be thought of as resumable functions.

这里是一个生成器函数的最简示例:

def generate_ints(N):
    for i in range(N):
        yield i

A new keyword, yield, was introduced for generators. Any function containing a yield statement is a generator function; this is detected by Python's bytecode compiler which compiles the function specially as a result. Because a new keyword was introduced, generators must be explicitly enabled in a module by including a from __future__ import generators statement near the top of the module's source code. In Python 2.3 this statement will become unnecessary.

When you call a generator function, it doesn't return a single value; instead it returns a generator object that supports the iterator protocol. On executing the yield statement, the generator outputs the value of i, similar to a return statement. The big difference between yield and a return statement is that on reaching a yield the generator's state of execution is suspended and local variables are preserved. On the next call to the generator's next() method, the function will resume executing immediately after the yield statement. (For complicated reasons, the yield statement isn't allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the interaction between yield and exceptions.)

下面是 generate_ints() 生成器的用法示例:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in generate_ints
StopIteration

你可以等价地写成 for i in generate_ints(5) 或 a,b,c = generate_ints(3)。

Inside a generator function, the return statement can only be used without a value, and signals the end of the procession of values; afterwards the generator cannot return any further values. return with a value, such as return 5, is a syntax error inside a generator function. The end of the generator's results can also be indicated by raising StopIteration manually, or by just letting the flow of execution fall off the bottom of the function.

You could achieve the effect of generators manually by writing your own class and storing all the local variables of the generator as instance variables. For example, returning a list of integers could be done by setting self.count to 0, and having the next() method increment self.count and return it. However, for a moderately complicated generator, writing a corresponding class would be much messier. Lib/test/test_generators.py contains a number of more interesting examples. The simplest one implements an in-order traversal of a tree using generators recursively.

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Two other examples in Lib/test/test_generators.py produce solutions for the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no queen threatens another) and the Knight's Tour (a route that takes a knight to every square of an $NxN$ chessboard without visiting any square twice).

生成器的概念源自其他编程语言，尤其是 Icon（https://www2.cs.arizona.edu/icon/ ），在 Icon 语言中，生成器的概念是核心。在 Icon 中，每个表达式和函数调用生成器的概念源自其他编程语言，尤其是 Icon。在Icon中，每个表达式和函数调用都可以表现得像一个生成器。以下是来自“Icon 编程语言概述”中的一个示例，展示了生成器的用法 https://www2.cs.arizona.edu/icon/docs/ipd266.htm ：

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring "or" is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python 并没有像 Icon 那样将生成器作为核心概念来采纳。生成器被认为是 Python 核心语言的一部分，但学习或使用它们并不是强制性的；如果它们不能解决你的问题，可以完全忽略它们。与 Icon 相比，Python 的一个新颖特性是生成器的状态表示为一个具体对象（迭代器），该对象可以传递给其他函数或存储在数据结构中。

参见

PEP 255 - 简单生成器: 由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。主要由 Neil Schemenauer 和 Tim Peters 实现，并包含来自 Python Labs 团队的修正。

PEP 237: 统一长整数和整数¶

In recent versions, the distinction between regular integers, which are 32-bit values on most machines, and long integers, which can be of arbitrary size, was becoming an annoyance. For example, on platforms that support files larger than 2**32 bytes, the tell() method of file objects has to return a long integer. However, there were various bits of Python that expected plain integers and would raise an error if a long integer was provided instead. For example, in Python 1.5, only regular integers could be used as a slice index, and 'abc'[1L:] would raise a TypeError exception with the message 'slice index must be int'.

Python 2.2 将根据需要将数值从短整数转换为长整数。'L' 后缀不再需要用于表示长整数字面量，因为现在编译器会自动选择适当的类型。（在未来的 2.x 版本的 Python 中，使用 'L' 后缀将被不鼓励，并在 Python 2.4 中触发警告，可能在 Python 3.0 中被移除。）许多以前会引发 OverflowError 的操作现在会返回一个长整数作为结果。例如：

>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L

In most cases, integers and long integers will now be treated identically. You can still distinguish them with the type() built-in function, but that's rarely needed.

参见

PEP 237 - 统一长整数和整数: 由 Moshe Zadka 和 Guido van Rossum 撰写 ; 大部分由 Guido van Rossum 实现。

PEP 238：修改除法运算符¶

The most controversial change in Python 2.2 heralds the start of an effort to fix an old design flaw that's been in Python from the beginning. Currently Python's division operator, /, behaves like C's division operator when presented with two integer arguments: it returns an integer result that's truncated down when there would be a fractional part. For example, 3/2 is 1, not 1.5, and (-1)/2 is -1, not -0.5. This means that the results of division can vary unexpectedly depending on the type of the two operands and because Python is dynamically typed, it can be difficult to determine the possible types of the operands.

(The controversy is over whether this is really a design flaw, and whether it's worth breaking existing code to fix this. It's caused endless discussions on python-dev, and in July 2001 erupted into a storm of acidly sarcastic postings on comp.lang.python. I won't argue for either side here and will stick to describing what's implemented in 2.2. Read PEP 238 for a summary of arguments and counter-arguments.)

由于这一变化可能会破坏现有代码，因此它正在非常逐步地引入。Python 2.2 开始了这一过渡，但直到 Python 3.0 这一转换才会完全完成。

First, I'll borrow some terminology from PEP 238. "True division" is the division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25, and so forth. "Floor division" is what Python's / operator currently does when given integer operands; the result is the floor of the value returned by true division. "Classic division" is the current mixed behaviour of /; it returns the result of floor division when the operands are integers, and returns the result of true division when one of the operands is a floating-point number.

Python 2.2 引入了以下变化：

A new operator, //, is the floor division operator. (Yes, we know it looks like C++'s comment symbol.) // always performs floor division no matter what the types of its operands are, so 1 // 2 is 0 and 1.0 // 2.0 is also 0.0.

// is always available in Python 2.2; you don't need to enable it using a __future__ statement.
By including a from __future__ import division in a module, the / operator will be changed to return the result of true division, so 1/2 is 0.5. Without the __future__ statement, / still means classic division. The default meaning of / will not change until Python 3.0.
Classes can define methods called __truediv__() and __floordiv__() to overload the two division operators. At the C level, there are also slots in the PyNumberMethods structure so extension types can define the two operators.
Python 2.2 支持一些命令行参数，用于测试代码是否能在除法语义改变的情况下正常工作。运行 Python 并使用 -Q warn 选项时，当对两个整数应用除法时会发出警告。你可以利用这个功能找到受影响的代码并进行修复。默认情况下，Python 2.2 会执行经典除法而不会发出警告；在 Python 2.3 中，警告将默认开启。

参见

PEP 238：改变除法运算符: 由 Moshe Zadka 和 Guido van Rossum 撰写 ; 由 Guido van Rossum 实现。

Unicode 的改变¶

Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by supplying --enable-unicode=ucs4 to the configure script. (It's also possible to specify --disable-unicode to completely disable Unicode support.)

When built to use UCS-4 (a "wide Python"), the interpreter can natively handle Unicode characters from U+000000 to U+110000, so the range of legal values for the unichr() function is expanded accordingly. Using an interpreter compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still cause unichr() to raise a ValueError exception. This is all described in PEP 261, "Support for 'wide' Unicode characters"; consult it for further details.

Another change is simpler to explain. Since their introduction, Unicode strings have supported an encode() method to convert the string to a selected encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*]) method has been added to 8-bit strings (though not to Unicode strings) in 2.2. decode() assumes that the string is in the specified encoding and decodes it, returning whatever is returned by the codec.

Using this new feature, codecs have been added for tasks not directly related to Unicode. For example, codecs have been added for uu-encoding, MIME's base64 encoding, and compression with the zlib module:

>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*

end
>>> "sheesh".encode('rot-13')
'furrfu'

To convert a class instance to Unicode, a __unicode__() method can be defined by a class, analogous to __str__().

encode(), decode(), and __unicode__() were implemented by Marc-André Lemburg. The changes to support using UCS-4 internally were implemented by Fredrik Lundh and Martin von Löwis.

参见

PEP 261 - 对 '宽' Unicode 字符的支持: 由 Paul Prescod 编写。

PEP 227: 嵌套的作用域¶

In Python 2.1, statically nested scopes were added as an optional feature, to be enabled by a from __future__ import nested_scopes directive. In 2.2 nested scopes no longer need to be specially enabled, and are now always present. The rest of this section is a copy of the description of nested scopes from my "What's New in Python 2.1" document; if you read it when 2.1 came out, you can skip the rest of this section.

Python 2.1 中的最大改变是 Python 的作用域规则，在Python 2.2中得到完善。在 Python 2.0 中，任意给定的时刻至多使用三个命名空间来查找变量名称：局部、模块和内置命名空间。这往往会导致令人吃惊的结果因为它与人们直觉上的预期不相匹配。例如，一个嵌套的递归函数将不起作用:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

The function g() will always raise a NameError exception, because the binding of the name g isn't in either its local namespace or in the module-level namespace. This isn't much of a problem in practice (how often do you recursively define interior functions like this?), but this also made using the lambda expression clumsier, and this was a problem in practice. In code which uses lambda you can often find local variables being copied by passing them as the default values of arguments.

def find(self, name):
    "Return list of any entries equal to 'name'"
    L = filter(lambda x, name=name: x == name,
               self.list_attribute)
    return L

结果将会严重损害以高度函数式风格编写的 Python 代码的可读性。

Python 2.2 最显著的改变是增加了静态作用域这一语言特征来解决此问题。作为它的第一项影响，在上述示例中的 name=name 默认参数现在将不再必要。简单地说，当一个函数内部的给定变量名没有被赋值时（通过赋值语句，或者 def, class 或 import 语句），对该变量的引用将在外层作用域的局部命名空间中查找。对于该规则的更详细解释，以及具体实现的分析，请参阅相应的 PEP。

对于同时在模块层级和包含下层函数定义的函数内部局部变量使用了相同变量名的代码来说这项改变可能会导致一些兼容性问题。不过这看来不太可能发生，因为阅读这样的代码本来就会相当令人困惑。

此项改变的一个附带影响是在特定条件下函数作用域内部 from module import * 和 exec 语句将不允许使用。 Python 参考手册已经写明 from module import * 仅在模块最高层级上是可用的，但此前 CPython 解释器从未强制实施此规则。作为嵌套作用域具体实现的一部分，将 Python 源码转为字节码的编译器会生成不同的代码来访问某个包含作用域内的变量。 from module import * 和 exec 会使得编译器无法正确执行，因为它们会向局部命名空间添加在编译时还不存在的名称。为此，如果一个函数包含带有自由变量的函数定义或 lambda 表达式，编译器将通过引发 SyntaxError 异常来提示。

为了使前面的解释更清楚，下面是一个例子:

x = 1
def f():
    # The next line is a syntax error
    exec 'x=2'
    def g():
        return x

包含 exec 语句的第 4 行有语法错误，因为 exec 会定义一个名为 x 的新局部变量，它的值应当被 g() 所访问。

这应该不会是太大的限制，因为 exec 在多数 Python 代码中都极少被使用（而当它被使用时，往往也是个存在糟糕设计的信号）。

参见

PEP 227 - 静态嵌套作用域: 由 Jeremy Hylton 撰写并实现。

新增和改进的模块¶

The xmlrpclib module was contributed to the standard library by Fredrik Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple remote procedure call protocol built on top of HTTP and XML. For example, the following snippet retrieves a list of RSS channels from the O'Reilly Network, and then lists the recent headlines for one channel:

import xmlrpclib
s = xmlrpclib.Server(
      'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
channels = s.meerkat.getChannels()
# channels is a list of dictionaries, like this:
# [{'id': 4, 'title': 'Freshmeat Daily News'}
#  {'id': 190, 'title': '32Bits Online'},
#  {'id': 4549, 'title': '3DGamers'}, ... ]

# Get the items for one channel
items = s.meerkat.getItems( {'channel': 4} )

# 'items' is another list of dictionaries, like this:
# [{'link': 'http://freshmeat.net/releases/52719/',
#   'description': 'A utility which converts HTML to XSL FO.',
#   'title': 'html2fo 0.3 (Default)'}, ... ]

The SimpleXMLRPCServer module makes it easy to create straightforward XML-RPC servers. See http://xmlrpc.scripting.com/ for more information about XML-RPC.

The new hmac module implements the HMAC algorithm described by RFC 2104. (Contributed by Gerhard Häring.)
Several functions that originally returned lengthy tuples now return pseudo-sequences that still behave like tuples but also have mnemonic attributes such as memberst_mtime or tm_year. The enhanced functions include stat(), fstat(), statvfs(), and fstatvfs() in the os module, and localtime(), gmtime(), and strptime() in the time module.

For example, to obtain a file's size using the old tuples, you'd end up writing something like file_size = os.stat(filename)[stat.ST_SIZE], but now this can be written more clearly as file_size = os.stat(filename).st_size.

此特性的初始补丁由 Nick Mathewson 贡献。
The Python profiler has been extensively reworked and various errors in its output have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
The socket module can be compiled to support IPv6; specify the --enable-ipv6 option to Python's configure script. (Contributed by Jun-ichiro "itojun" Hagino.)
Two new format characters were added to the struct module for 64-bit integers on platforms that support the C long long type. q is for a signed 64-bit integer, and Q is for an unsigned one. The value is returned in Python's long integer type. (Contributed by Tim Peters.)
In the interpreter's interactive mode, there's a new built-in function help() that uses the pydoc module introduced in Python 2.1 to provide interactive help. help(object) displays any available help text about object. help() with no argument puts you in an online help utility, where you can enter the names of functions, classes, or modules to read their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee's pydoc module.)
Various bugfixes and performance improvements have been made to the SRE engine underlying the re module. For example, the re.sub() and re.split() functions have been rewritten in C. Another contributed patch speeds up certain Unicode character ranges by a factor of two, and a new finditer() method that returns an iterator over all the non-overlapping matches in a given string. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von Löwis.)
The smtplib module now supports RFC 2487, "Secure SMTP over TLS", so it's now possible to encrypt the SMTP traffic between a Python program and the mail transport agent being handed a message. smtplib also supports SMTP authentication. (Contributed by Gerhard Häring.)
The imaplib module, maintained by Piers Lauder, has support for several new extensions: the NAMESPACE extension defined in RFC 2342, SORT, GETACL and SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)
The rfc822 module's parsing of email addresses is now compliant with RFC 2822, an update to RFC 822. (The module's name is not going to be changed to rfc2822.) A new package, email, has also been added for parsing and generating e-mail messages. (Contributed by Barry Warsaw, and arising out of his work on Mailman.)
The difflib module now contains a new Differ class for producing human-readable lists of changes (a "delta") between two sequences of lines of text. There are also two generator functions, ndiff() and restore(), which respectively return a delta from two sequences, or one of the original sequences from a delta. (Grunt work contributed by David Goodger, from ndiff.py code by Tim Peters who then did the generatorization.)
New constants ascii_letters, ascii_lowercase, and ascii_uppercase were added to the string module. There were several modules in the standard library that used string.letters to mean the ranges A-Za-z, but that assumption is incorrect when locales are in use, because string.letters varies depending on the set of legal characters defined by the current locale. The buggy modules have all been fixed to use ascii_letters instead. (Reported by an unknown person; fixed by Fred L. Drake, Jr.)
The mimetypes module now makes it easier to use alternative MIME-type databases by the addition of a MimeTypes class, which takes a list of filenames to be parsed. (Contributed by Fred L. Drake, Jr.)
A Timer class was added to the threading module that allows scheduling an activity to happen at some future time. (Contributed by Itamar Shtull-Trauring.)

解释器的改变和修正¶

有些变化只会影响那些在 C 级别处理 Python 解释器的人，因为他们正在编写 Python 扩展模块、嵌入解释器或仅仅是在修改解释器本身。如果你只编写 Python 代码，这里描述的变化对你几乎没有影响。

Profiling and tracing functions can now be implemented in C, which can operate at much higher speeds than Python-based functions and should reduce the overhead of profiling and tracing. This will be of interest to authors of development environments for Python. Two new C functions were added to Python's API, PyEval_SetProfile() and PyEval_SetTrace(). The existing sys.setprofile() and sys.settrace() functions still exist, and have simply been changed to use the new C-level interface. (Contributed by Fred L. Drake, Jr.)
Another low-level API, primarily of interest to implementers of Python debuggers and development tools, was added. PyInterpreterState_Head() and PyInterpreterState_Next() let a caller walk through all the existing interpreter objects; PyInterpreterState_ThreadHead() and PyThreadState_Next() allow looping over all the thread states for a given interpreter. (Contributed by David Beazley.)
垃圾收集器的 C 级接口已经发生了变化，使得编写支持垃圾收集的扩展类型和调试函数误用变得更容易。各种函数的语义略有不同，因此需要重命名一系列函数。使用旧 API 的扩展仍然可以编译，但不会参与垃圾收集，因此应优先考虑将它们更新为 2.2 版本。

要将一个扩展模块升级至新 API，请执行下列步骤:
将 Py_TPFLAGS_GC 重命名为 Py_TPFLAGS_HAVE_GC。
使用 PyObject_GC_New() 或 PyObject_GC_NewVar() 来分配
对象，并使用 PyObject_GC_Del() 来释放它们。
Rename PyObject_GC_Init() to PyObject_GC_Track() and PyObject_GC_Fini() to PyObject_GC_UnTrack().
从对象大小计算中移除 PyGC_HEAD_SIZE。
Remove calls to PyObject_AS_GC() and PyObject_FROM_GC().
A new et format sequence was added to PyArg_ParseTuple(); et takes both a parameter and an encoding name, and converts the parameter to the given encoding if the parameter turns out to be a Unicode string, or leaves it alone if it's an 8-bit string, assuming it to already be in the desired encoding. This differs from the es format character, which assumes that 8-bit strings are in Python's default ASCII encoding and converts them to the specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS support on Windows described in the following section.)
A different argument parsing function, PyArg_UnpackTuple(), has been added that's simpler and presumably faster. Instead of specifying a format string, the caller simply gives the minimum and maximum number of arguments expected, and a set of pointers to PyObject* variables that will be filled in with argument values.
Two new flags METH_NOARGS and METH_O are available in method definition tables to simplify implementation of methods with no arguments or a single untyped argument. Calling such methods is more efficient than calling a corresponding method that uses METH_VARARGS. Also, the old METH_OLDARGS style of writing C methods is now officially deprecated.
Two new wrapper functions, PyOS_snprintf() and PyOS_vsnprintf() were added to provide cross-platform implementations for the relatively new snprintf() and vsnprintf() C lib APIs. In contrast to the standard sprintf() and vsprintf() functions, the Python versions check the bounds of the buffer used to protect against buffer overruns. (Contributed by M.-A. Lemburg.)
The _PyTuple_Resize() function has lost an unused parameter, so now it takes 2 parameters instead of 3. The third argument was never used, and can simply be discarded when porting code from earlier versions to Python 2.2.

其他的改变和修正¶

As usual there were a bunch of other improvements and bugfixes scattered throughout the source tree. A search through the CVS change logs finds there were 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1 applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82 bugs. These figures are likely to be underestimates.

一些较为重要的改变:

The code for the MacOS port for Python, maintained by Jack Jansen, is now kept in the main Python CVS tree, and many changes have been made to support MacOS X.

The most significant change is the ability to build Python as a framework, enabled by supplying the --enable-framework option to the configure script when compiling Python. According to Jack Jansen, "This installs a self-contained Python installation plus the OS X framework "glue" into /Library/Frameworks/Python.framework (or another location of choice). For now there is little immediate added benefit to this (actually, there is the disadvantage that you have to change your PATH to be able to find Python), but it is the basis for creating a full-blown Python application, porting the MacPython IDE, possibly using Python as a standard OSA scripting language and much more."

Most of the MacPython toolbox modules, which interface to MacOS APIs such as windowing, QuickTime, scripting, etc. have been ported to OS X, but they've been left commented out in setup.py. People who want to experiment with these modules can uncomment them manually.
Keyword arguments passed to built-in functions that don't take them now cause a TypeError exception to be raised, with the message "function takes no keyword arguments".
Weak references, added in Python 2.1 as an extension module, are now part of the core because they're used in the implementation of new-style classes. The ReferenceError exception has therefore moved from the weakref module to become a built-in exception.
A new script, Tools/scripts/cleanfuture.py by Tim Peters, automatically removes obsolete __future__ statements from Python source code.
An additional flags argument has been added to the built-in function compile(), so the behaviour of __future__ statements can now be correctly observed in simulated shells, such as those presented by IDLE and other development environments. This is described in PEP 264. (Contributed by Michael Hudson.)
The new license introduced with Python 1.6 wasn't GPL-compatible. This is fixed by some minor textual changes to the 2.2 license, so it's now legal to embed Python inside a GPLed program again. Note that Python itself is not GPLed, but instead is under a license that's essentially equivalent to the BSD license, same as it always was. The license changes were also applied to the Python 2.0.1 and 2.1.1 releases.
When presented with a Unicode filename on Windows, Python will now convert it to an MBCS encoded string, as used by the Microsoft file APIs. As MBCS is explicitly used by the file APIs, Python's choice of ASCII as the default encoding turns out to be an annoyance. On Unix, the locale's character set is used if locale.nl_langinfo(CODESET) is available. (Windows support was contributed by Mark Hammond with assistance from Marc-André Lemburg. Unix support was added by Martin von Löwis.)
大文件支持目前已在 Windows 上启用。（由 Tim Peters 贡献。）
Tools/scripts/ftpmirror.py 脚本现在会解析 .netrc 文件，如果存在的话。（由 Mike Romberg 贡献。）
Some features of the object returned by the xrange() function are now deprecated, and trigger warnings when they're accessed; they'll disappear in Python 2.3. xrange objects tried to pretend they were full sequence types by supporting slicing, sequence multiplication, and the in operator, but these features were rarely used and therefore buggy. The tolist() method and the start, stop, and step attributes are also being deprecated. At the C level, the fourth argument to the PyRange_New() function, repeat, has also been deprecated.
There were a bunch of patches to the dictionary implementation, mostly to fix potential core dumps if a dictionary contains objects that sneakily changed their hash value, or mutated the dictionary they were contained in. For a while python-dev fell into a gentle rhythm of Michael Hudson finding a case that dumped core, Tim Peters fixing the bug, Michael finding another case, and round and round it went.
On Windows, Python can now be compiled with Borland C thanks to a number of patches contributed by Stephen Hansen, though the result isn't fully functional yet. (But this is progress...)
Another Windows enhancement: Wise Solutions generously offered PythonLabs use of their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers used Wise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
Files ending in .pyw can now be imported on Windows. .pyw is a Windows-only thing, used to indicate that a script needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping up to display the output. This patch makes it possible to import such scripts, in case they're also usable as modules. (Implemented by David Bolen.)
On platforms where Python uses the C dlopen() function to load extension modules, it's now possible to set the flags used by dlopen() using the sys.getdlopenflags() and sys.setdlopenflags() functions. (Contributed by Bram Stolk.)
The pow() built-in function no longer supports 3 arguments when floating-point numbers are supplied. pow(x, y, z) returns (x**y) % z, but this is never useful for floating point numbers, and the final result varies unpredictably depending on the platform. A call such as pow(2.0, 8.0, 7.0) will now raise a TypeError exception.

致谢¶

作者感谢以下人员为本文的各种草案提供建议，更正和帮助： Fred Bremmer, Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg, Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.

Python 2.2 有什么新变化¶

概述¶

PEP 252 和 253：类型和类的修改¶

旧式类和新式类¶

描述器¶

多重继承：钻石规则¶

属性访问¶

PEP 234: 迭代器¶

PEP 255: 简单的生成器¶

PEP 237: 统一长整数和整数¶

PEP 238：修改除法运算符¶

Unicode 的改变¶

PEP 227: 嵌套的作用域¶

新增和改进的模块¶

解释器的改变和修正¶

其他的改变和修正¶

致谢¶

目录

上一主题

下一主题

当前页

Python 2.2 有什么新变化¶

概述¶

PEP 252 和 253：类型和类的修改¶

旧式类和新式类¶

描述器¶

多重继承：钻石规则¶

属性访问¶

相关链接¶

PEP 234: 迭代器¶

PEP 255: 简单的生成器¶

PEP 237: 统一长整数和整数¶

PEP 238：修改除法运算符¶

Unicode 的改变¶

PEP 227: 嵌套的作用域¶

新增和改进的模块¶

解释器的改变和修正¶

其他的改变和修正¶

致谢¶