PyPy 打算推出无 GIL 版本,或和双版本并行发行

PyPy是Python开发者为了更好的Hack Python创建的项目。此外,PyPy比CPython是更加灵活,易于使用和试验,以制定具体的功能在不同情况的实现方法,可以很容易实施。该项目的目标是,让PyPy比C实现的Python更为容易的适应各个项目和方便裁剪。

Python 社区一直有讨论移除 GIL(Global Interpreter Lock) 的声音,而且各解析器也有做各种尝试去解决这个问题。Jython 和 IronPython 在底层平台的帮助下已成功地将其移除,而像 gilectomy 、CPython 则还没有结果。

PyPy 团队 8 月 14 日发文表示,其团队自今年的 February Sprint  后一直在进行移除 GIL 的各种试验,希望能实现 IronPython 和 Jython 的效果(相比之下,他们认为在 CPython 中移除 GIL 会更难,因为还需要解决多线程引用计数的问题)。到目前为止,终于拥有了一个无 GIL 版的 PyPy ,它可以运行非常简单的多线程、并行化的程序,但如果是更复杂的程序可能会出现故障。后续将针对此问题进行重点研究。

不过由于这样的工作会使 PyPy 代码库和团队的日常工作复杂化,PyPy 团队表示想判断社区和商业伙伴(非个人捐赠)是否对该实现感兴趣。如果他们能得到一个 10 万美元的合同,他们将提供一个完整工作的 无 GIL PyPy 解释器,并可能与默认的 PyPy 版本分开发行。他们随后在文章中附上了具体的技术细节。

文章发布后,引起了 Python 群体的热议,有表示支持的,也有认为这其实就是在找投资,但没有看到明显的商业价值;还有人认为开发过程中其实可以忽略 GIL ,并不需要这么麻烦。


Let's remove the Global Interpreter Lock

Hello everyone

The Python community has been discussing removing the Global Interpreter Lock for a long time. There have been various attempts at removing it: Jython or IronPython successfully removed it with the help of the underlying platform, and some have yet to bear fruit, like gilectomy. Since our February sprint in Leysin, we have experimented with the topic of GIL removal in the PyPy project. We believe that the work done in IronPython or Jython can be reproduced with only a bit more effort in PyPy. Compared to that, removing the GIL in CPython is a much harder topic, since it also requires tackling the problem of multi-threaded reference counting. See the section below for further details.

As we announced at EuroPython, what we have so far is a GIL-less PyPy which can run very simple multi-threaded, nicely parallelized, programs. At the moment, more complicated programs probably segfault. The remaining 90% (and another 90%) of work is with putting locks in strategic places so PyPy does not segfault during concurrent accesses to data structures.

Since such work would complicate the PyPy code base and our day-to-day work, we would like to judge the interest of the community and the commercial partners to make it happen (we are not looking for individual donations at this point). We estimate a total cost of $50k, out of which we already have backing for about 1/3 (with a possible 1/3 extra from the STM money, see below). This would give us a good shot at delivering a good proof-of-concept working PyPy with no GIL. If we can get a $100k contract, we will deliver a fully working PyPy interpreter with no GIL as a release, possibly separate from the default PyPy release.

People asked several questions, so I'll try to answer the technical parts here.

What would the plan entail?

We've already done the work on the Garbage Collector to allow doing multi- threaded programs in RPython. "All" that is left is adding locks on mutable data structures everywhere in the PyPy codebase. Since it would significantly complicate our workflow, we require real interest in that topic, backed up by commercial contracts in order to justify the added maintenance burden.

Why did the STM effort not work out?

STM was a research project that proved that the idea is possible. However, the amount of user effort that is required to make programs run in a parallelizable way is significant, and we never managed to develop tools that would help in doing so. At the moment we're not sure if more work spent on tooling would improve the situation or if the whole idea is really doomed. The approach also ended up adding significant overhead on single threaded programs, so in the end it is very easy to make your programs slower. (We have some money left in the donation pot for STM which we are not using; according to the rules, we could declare the STM attempt failed and channel that money towards the present GIL removal proposal.)

Wouldn't subinterpreters be a better idea?

Python is a very mutable language - there are tons of mutable state and basic objects (classes, functions,...) that are compile-time in other language but runtime and fully mutable in Python. In the end, sharing things between subinterpreters would be restricted to basic immutable data structures, which defeats the point. Subinterpreters suffers from the same problems as multiprocessing with no additional benefits. We believe that reducing mutability to implement subinterpreters is not viable without seriously impacting the semantics of the language (a conclusion which applies to many other approaches too).

Why is it easier to do in PyPy than CPython?

Removing the GIL in CPython has two problems:

how do we guard access to mutable data structures with locks and

what to do with reference counting that needs to be guarded.

PyPy only has the former problem; the latter doesn't exist, due to a different garbage collector approach. Of course the first problem is a mess too, but at least we are already half-way there. Compared to Jython or IronPython, PyPy lacks some data structures that are provided by JVM or .NET, which we would need to implement, hence the problem is a little harder than on an existing multithreaded platform. However, there is good research and we know how that problem can be solved.

Best regards,

Maciej Fijalkowski


Python技术交流互助群 ( 请勿加多个群 ):

群1: 87464755

群2: 333646237

群3: 318130924

群4: 385100854

相关词搜索:pypy GIL