Multirelease effort: Moving to Python 3

List overview All Threads
Download

newer

older

Re: Multirelease effort: Moving to...

Bohuslav Kabrda

18 Jul 2013 18 Jul '13

10:24 a.m.

Hi all, as a new Fedora Python maintainer, I have set myself a goal of moving Fedora to Python 3 as a default. This is going to be a multirelease effort that is going to affect lots of Fedora parts. Since we will need to switch default package manager from Yum to DNF (which is supposed to work with Python 3), we will need to wait for that. I've been told that DNF should be default in F22, so that's my target, too. That should also give everyone else plenty of time to work on other essential packages to make this happen.

Here is my analysis/proposal: Before switching, we need to make sure that everything "important" (*) is Python 3 compatible. There are three steps I see in this transition: 1) Getting rid of Python 2 in mock minimal buildroot. 2) Porting Anaconda to Python 3. 3) Making all livecd packages depend on Python 3 by default (and eventually getting rid of Python 2 from livecd) - this will also require switching from Yum to DNF as a default, that is supposed to support Python 3. ( 4) Making as much of the remaining packages Python 3 compatible )

In past few days, I've been going through packages that are part of the above steps. I have reported numerous bugs asking upstream and/or Fedora maintainers for help with porting to Python 3. We have some spare cycles in our small Python packaging team, that we will try to provide to whoever needs them most, but we're limited and we'll have to rely on the upstreams to do most of the work. I'm attaching a document with list of packages that need porting with some notes/links to opened bugs. Sometime soon, I'll open a tracking bug for this, so that everyone can see where we are quickly. (*) I call these "important" packages (in terms of being important for the Python 3 switch)

...

From packaging point of view, this will probably require:

1) Renaming python package to python2 2) Renaming python3 package to python 3) Switching the %{?with_python3} conditionals in specfiles to %{?with_python2} (we will probably create a script to automate this, at least partially)

FAQ: Q: Why do we need to switch to Python 3? A: Because Python 2 is old, slower, less pythonic, doesn't get any more functionality and it won't be that long before the official upstream support ends [1]

Q: How do I port to Python 3? A: There are tons of tutorials and howtos about porting and the differencies in general. E.g. [2] (general), [3] (c-extensions)

Q: What about Python 2? A: We will maintain that at least as long as upstream supports it. After that, I'd prefer dropping it, but since I know there will be people wanting to keep it around, I'll gladly give the maintenance to someone else.

I'll be glad to answer all your questions and discuss the above points. Nothing is set into stone and I'd love to hear your ideas and comments. Thanks for reading this through! Slavek.

-- Regards, Bohuslav "Slavek" Kabrda. [1] http://www.python.org/dev/peps/pep-0373/ [2] http://docs.python.org/dev/howto/pyporting.html [3] http://docs.python.org/3/howto/cporting.html

Attachments:

python3-fedora.txt (text/plain — 3.2 KB)

Show replies by date

Toshio Kuratomi

18 Jul 18 Jul

11:53 a.m.

On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...

Hi all, as a new Fedora Python maintainer, I have set myself a goal of moving Fedora to Python 3 as a default.

I'm not sure we want to make python3 default depending on what your definition of default is.

/usr/bin/python should refer to python2 -- http://www.python.org/dev/peps/pep-0394/ I'd be -1 to changing this

The python package itself should probably also remain python2 due to dependencies and expectations from other distros and documentation -- I think I'd be -1 to changing this

The Fedora live images contain only python3, not python2 -- I'd be heavily in favour of this. +1

...

This is going to be a multirelease effort that is going to affect lots of Fedora parts. Since we will need to switch default package manager from Yum to DNF (which is supposed to work with Python 3), we will need to wait for that. I've been told that DNF should be default in F22, so that's my target, too. That should also give everyone else plenty of time to work on other essential packages to make this happen.

Getting there at the same time as we get to DNF sounds like a good timeline. (But see my note on anaconda below). +1

...

Here is my analysis/proposal: Before switching, we need to make sure that everything "important" (*) is Python 3 compatible. There are three steps I see in this transition:

Getting rid of Python 2 in mock minimal buildroot.

I'm not sure about this one as it will cause a lot of package churn. It might be a necessary pain pointi or it might be a pain point we want to defer until later in our porting efforts. Have to think about it more.

...

Porting Anaconda to Python 3.

+1 -- unfortunately, this probably depends on DNF.... So we may need to push DNF in F22 and anaconda compatible with python3 in F23.

...

Making all livecd packages depend on Python 3 by default (and eventually getting rid of Python 2 from livecd) - this will also require switching from Yum to DNF as a default, that is supposed to support Python 3.

+1 -- this is what I see as the eventual goal (or perhaps, livecd python2 free followed by DVD python2 free followed by distro python2 free).

3.5) Switch tools that could target either python2 or python3 to target python3. Currently the packaging guidelines say to target python2 to control dep proliferation and because that's the most supported by the larger python ecosystem. We should switch the recommendation when our minimal environment must have python3 but does not need to have python2.

...

( 4) Making as much of the remaining packages Python 3 compatible )

We could talk quite a bit on this point -- How active do we want to be with the things that aren't in one of the essential buckets from further up. We could defer thinking about this until after we get the livecd python2-free, though.

...

In past few days, I've been going through packages that are part of the above steps. I have reported numerous bugs asking upstream and/or Fedora maintainers for help with porting to Python 3. We have some spare cycles in our small Python packaging team, that we will try to provide to whoever needs them most, but we're limited and we'll have to rely on the upstreams to do most of the work. I'm attaching a document with list of packages that need porting with some notes/links to opened bugs. Sometime soon, I'll open a tracking bug for this, so that everyone can see where we are quickly.

(*) I call these "important" packages (in terms of being important for the Python 3 switch)

Cool. A list of packages that are on the livecd is good. One thing to remember, though, is that the current Python Guidelines specify that we are not to ship python3 versions of packages if upstream is not going to support us in that effort: https://fedoraproject.org/wiki/Packaging:Python#Subpackages

We could change that but I'm not 100% behind the idea of changing it. As stated in the Guidelines:

" [...]doing this on our own in Fedora is essentially creating a fork. That has a large burden for maintaining the code, fixing bugs, porting when a new version of upstream's code appears, managing a release schedule, and other tasks normally handled by upstream. It's much better if we can cooperate with upstream to share this work than doing it all on our own. "

Luckily, in recent years I've only encountered a few upstreams that are unwilling to look at python3 patches. Most upstreams are amenable to taking patches that establish python3 compatibility. We just need to remain clear that we have to work upstream to get these python3 versions into fedora, not do it in our packages without upstream being on board.

...

From packaging point of view, this will probably require:

Renaming python package to python2

Renaming python3 package to python

-1: What are the benefits of this as the cost of this is very high in several ways: * updating our dependencies * divergence from other distros (I believe that arch is the only distro that has decided to ship python3 as "python". everyone else ships python3 as python3) * updating our documentation * divergence from other upstream/googlable documentation

I could see us renaming the python package to python2, keeping a Virtual Provide in the python2 package for python (and similar for all of the subpackages and python-doc package), and leaving python3 as it is. This might be a stepping stone to when the internet's memory hasstarted associating "python" with python3 instead of python2.

...

Switching the %{?with_python3} conditionals in specfiles to %{?with_python2} (we will probably create a script to automate this, at least partially)

-1: This one doesn't make any sense to do. The third-party python library ecosystem is highly weighted for python2. There are only a handful of libraries that support python3 and not python2. There are a boatload of libraries that support python2 and not python3. We're starting from a base of existing python2 packages that may add support for python3. The conditionals are there to enable packaging of that situation.

...

FAQ: Q: Why do we need to switch to Python 3? A: Because Python 2 is old, slower, less pythonic, doesn't get any more functionality and it won't be that long before the official upstream support ends [1]

Although I agree with the need to switch to python3, I don't think the first three reasons are very compelling arguments (they're only half-truths) -- we should concentrate on the last reason and also on features that python3 has that pyhton2 doesn't. Chained exceptions are a pretty nice thing, for instance.

...

Q: How do I port to Python 3? A: There are tons of tutorials and howtos about porting and the differencies in general. E.g. [2] (general), [3] (c-extensions)

The best two tutorials for python3 porting are likely: https://wiki.ubuntu.com/Python/3 (Will be moving to the python.org wiki in the near future)

http://python3porting.com/

...

Q: What about Python 2? A: We will maintain that at least as long as upstream supports it. After that, I'd prefer dropping it, but since I know there will be people wanting to keep it around, I'll gladly give the maintenance to someone else.

<nod> 2015 is right around the corner... I think someone else will get stuck maintaining the package :-/

...

I'll be glad to answer all your questions and discuss the above points. Nothing is set into stone and I'd love to hear your ideas and comments.

I sent out a message earlier that we should have a python sig/python guidelines discussion at flock. I think that nick and I are the only two that can definitely attend that in person.

Can anyone else make this timeslot on IRC?

http://flock2013.sched.org/event/281138262885f34d97408cfe65cdf21b?iframe=yes...

Planning for python3 and any needed updates to the Guidelines surrounding this are one of the things I wanted us to discuss.

[..]

One thing it might be nice to see in the below list is what things we have some upstream control over already. I believe the gdb work is being driven by dmalcolm. anaconda and yum/dnf are things we are upstream for. etc. Knowing about this responsibility will help us to understand where we control our own destiny and where we're dependent on other upstreams.

In some cases where upstream isn't going to port (for instance, dead upstream), we may need to either port to a different upstream (potentially large one-time cost) or fork upstream (ongoing maintainance burden).

One specific note:

...

                - python-pycurl - TODO - https://github.com/p/pycurl/pull/28 (is this the official upstream?)

You probably need to find someone to take over upstream maintainance of python-pycurl. Over the past two or three years, various people have stepped up to take over upstream and never gotten more than a release out the door.

-Toshio

Bohuslav Kabrda

19 Jul 19 Jul

1:41 a.m.

----- Original Message -----

...

On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...
Hi all, as a new Fedora Python maintainer, I have set myself a goal of moving Fedora to Python 3 as a default.

I'm not sure we want to make python3 default depending on what your definition of default is.

/usr/bin/python should refer to python2 -- http://www.python.org/dev/peps/pep-0394/ I'd be -1 to changing this

So, my definition of default is "all system tools use Python 3, it is the only Python that gets to minimal buildroot/minimal Fedora installation" - that means: - livecd can still ship Python 2 - /usr/bin/python points to Python 3 - Please note, that the pep you're referring to also states that "python should refer to the same target as python2 but may refer to python3 on some bleeding edge distributions", so this wouldn't really be going against the pep.

...

The python package itself should probably also remain python2 due to dependencies and expectations from other distros and documentation -- I think I'd be -1 to changing this

The Fedora live images contain only python3, not python2 -- I'd be heavily in favour of this. +1

...
This is going to be a multirelease effort that is going to affect lots of Fedora parts. Since we will need to switch default package manager from Yum to DNF (which is supposed to work with Python 3), we will need to wait for that. I've been told that DNF should be default in F22, so that's my target, too. That should also give everyone else plenty of time to work on other essential packages to make this happen.

Getting there at the same time as we get to DNF sounds like a good timeline. (But see my note on anaconda below). +1

...
Here is my analysis/proposal: Before switching, we need to make sure that everything "important" (*) is Python 3 compatible. There are three steps I see in this transition:

Getting rid of Python 2 in mock minimal buildroot.

I'm not sure about this one as it will cause a lot of package churn. It might be a necessary pain pointi or it might be a pain point we want to defer until later in our porting efforts. Have to think about it more.

If you look at the minimal mock buildroot for rawhide now, the only thing that is drawing in Python is gdb because of it's Python bindings (if I'm not mistaken). So compiling GDB against Python 3, which should work with newest gdb, will accomplish this AFAICS.

...

...

Porting Anaconda to Python 3.

+1 -- unfortunately, this probably depends on DNF.... So we may need to push DNF in F22 and anaconda compatible with python3 in F23.

DNF is a continuous effort. I believe that DNF will provide it's Python 3 bindings sooner than in F22, so Anaconda devels can simultaneously do porting to Python 3 as well as to DNF. IMO this is good thing, since they will just do one big rewrite instead of two smaller.

...

...

Making all livecd packages depend on Python 3 by default (and eventually

getting rid of Python 2 from livecd) - this will also require switching from Yum to DNF as a default, that is supposed to support Python 3.

+1 -- this is what I see as the eventual goal (or perhaps, livecd python2 free followed by DVD python2 free followed by distro python2 free).

3.5) Switch tools that could target either python2 or python3 to target python3. Currently the packaging guidelines say to target python2 to control dep proliferation and because that's the most supported by the larger python ecosystem. We should switch the recommendation when our minimal environment must have python3 but does not need to have python2.

IMO we should switch this for F21, since livecd ships Python 3 anyway, so the switch doesn't have to happen in one point, but can be continuous.

...

...
( 4) Making as much of the remaining packages Python 3 compatible )

We could talk quite a bit on this point -- How active do we want to be with the things that aren't in one of the essential buckets from further up. We could defer thinking about this until after we get the livecd python2-free, though.

This is really the last step, that is somehow tied what you mentioned as a reaction to 3) - going through the rest of packages on DVD and then whole distro. This will take few more releases I guess, but it is not that important as sorting out livecd.

...

...
In past few days, I've been going through packages that are part of the above steps. I have reported numerous bugs asking upstream and/or Fedora maintainers for help with porting to Python 3. We have some spare cycles in our small Python packaging team, that we will try to provide to whoever needs them most, but we're limited and we'll have to rely on the upstreams to do most of the work. I'm attaching a document with list of packages that need porting with some notes/links to opened bugs. Sometime soon, I'll open a tracking bug for this, so that everyone can see where we are quickly.

(*) I call these "important" packages (in terms of being important for the Python 3 switch)

Cool. A list of packages that are on the livecd is good. One thing to remember, though, is that the current Python Guidelines specify that we are not to ship python3 versions of packages if upstream is not going to support us in that effort: https://fedoraproject.org/wiki/Packaging:Python#Subpackages

We could change that but I'm not 100% behind the idea of changing it. As stated in the Guidelines:

" [...]doing this on our own in Fedora is essentially creating a fork. That has a large burden for maintaining the code, fixing bugs, porting when a new version of upstream's code appears, managing a release schedule, and other tasks normally handled by upstream. It's much better if we can cooperate with upstream to share this work than doing it all on our own. "

Luckily, in recent years I've only encountered a few upstreams that are unwilling to look at python3 patches. Most upstreams are amenable to taking patches that establish python3 compatibility. We just need to remain clear that we have to work upstream to get these python3 versions into fedora, not do it in our packages without upstream being on board.

Yep. When I was opening the bugs, I tried to open them in the sense "please work with upstream to port to Python 3". Only when I found out that upstream supports Python 3, I asked the maintainer to add python3- subpackage.

...

...
From packaging point of view, this will probably require:

Renaming python package to python2

Renaming python3 package to python

-1: What are the benefits of this as the cost of this is very high in several ways:

updating our dependencies

divergence from other distros (I believe that arch is the only distro that has decided to ship python3 as "python". everyone else ships python3 as python3)

updating our documentation

divergence from other upstream/googlable documentation

I could see us renaming the python package to python2, keeping a Virtual Provide in the python2 package for python (and similar for all of the subpackages and python-doc package), and leaving python3 as it is. This might be a stepping stone to when the internet's memory hasstarted associating "python" with python3 instead of python2.

...

Switching the %{?with_python3} conditionals in specfiles to

%{?with_python2} (we will probably create a script to automate this, at least partially)

-1: This one doesn't make any sense to do. The third-party python library ecosystem is highly weighted for python2. There are only a handful of libraries that support python3 and not python2. There are a boatload of libraries that support python2 and not python3. We're starting from a base of existing python2 packages that may add support for python3. The conditionals are there to enable packaging of that situation.

And this situation will be changing in the future. Right now, there are not so many Python packages in Fedora that only support Python 2 (I didn't count, but you don't see them too often these days). IMO Fedora should lead the way of making Python 3 "the Python" and Python 2 "the old compat version". This also makes sense in the traditional linux-distro "one version of package" that we should be trying to pursue.

...

...
FAQ: Q: Why do we need to switch to Python 3? A: Because Python 2 is old, slower, less pythonic, doesn't get any more functionality and it won't be that long before the official upstream support ends [1]

Although I agree with the need to switch to python3, I don't think the first three reasons are very compelling arguments (they're only half-truths) -- we should concentrate on the last reason and also on features that python3 has that pyhton2 doesn't. Chained exceptions are a pretty nice thing, for instance.

So first three reason: - Python 2 is old - how is that a half-truth? - Slower - yes, in the beginning, Python 3 was significantly slower because of nonoptimal code after the rewrite from Python 3. But with Python 3.3 for instance, you get tons of speed improvements - decimal module for instance got a significant boost. Brett Cannon had a nice presentation about speed benchmarking [1]. Yes, Python 3 is slower in some areas, but mostly it's faster. - Less Pythonic - where do I start with this? Python 3 got rid of tons of unnecessary syntactic constructs as well as builtin object methods. E.g. "print" vs. "print()"; exception raising syntax; dict.iteritems() removed and dict.items() only left, more consistent unicode handling etc. So in the sense of having only one way to do things, Python 3 is more pythonic than Python 2. If you read through zen of python, you can find more arguments for this (e.g. making int and long one type - "Special cases aren't special enough to break the rules."; simplification/rewrite of parts of stdlib - "Simple is better than complex.", etc.)

...

...
Q: How do I port to Python 3? A: There are tons of tutorials and howtos about porting and the differencies in general. E.g. [2] (general), [3] (c-extensions)

The best two tutorials for python3 porting are likely: https://wiki.ubuntu.com/Python/3 (Will be moving to the python.org wiki in the near future)

http://python3porting.com/

Thanks for the links.

...

...
Q: What about Python 2? A: We will maintain that at least as long as upstream supports it. After that, I'd prefer dropping it, but since I know there will be people wanting to keep it around, I'll gladly give the maintenance to someone else.

<nod> 2015 is right around the corner... I think someone else will get stuck maintaining the package :-/

...
I'll be glad to answer all your questions and discuss the above points. Nothing is set into stone and I'd love to hear your ideas and comments.

I sent out a message earlier that we should have a python sig/python guidelines discussion at flock. I think that nick and I are the only two that can definitely attend that in person.

Can anyone else make this timeslot on IRC?

http://flock2013.sched.org/event/281138262885f34d97408cfe65cdf21b?iframe=yes...

I'll try.

...

Planning for python3 and any needed updates to the Guidelines surrounding this are one of the things I wanted us to discuss.

[..]

One thing it might be nice to see in the below list is what things we have some upstream control over already. I believe the gdb work is being driven by dmalcolm. anaconda and yum/dnf are things we are upstream for. etc. Knowing about this responsibility will help us to understand where we control our own destiny and where we're dependent on other upstreams.

In some cases where upstream isn't going to port (for instance, dead upstream), we may need to either port to a different upstream (potentially large one-time cost) or fork upstream (ongoing maintainance burden).

One specific note:

...
                - python-pycurl - TODO -
                https://github.com/p/pycurl/pull/28 (is this the
                official upstream?)
You probably need to find someone to take over upstream maintainance of python-pycurl. Over the past two or three years, various people have stepped up to take over upstream and never gotten more than a release out the door.

Yeah, I'll try to speak to Fedora's maintainer of pycurl to see if he has any updates on this and we'll see.

...

-Toshio

Thanks for your thoughts. Slavek.

-- Regards, Bohuslav "Slavek" Kabrda. [1] https://speakerdeck.com/pyconslides/python-3-dot-3-trust-me-its-better-than-...

Toshio Kuratomi

3:11 p.m.

On Fri, Jul 19, 2013 at 02:41:23AM -0400, Bohuslav Kabrda wrote:

...

----- Original Message -----

...
On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...
Hi all, as a new Fedora Python maintainer, I have set myself a goal of moving Fedora to Python 3 as a default.

I'm not sure we want to make python3 default depending on what your definition of default is.

/usr/bin/python should refer to python2 -- http://www.python.org/dev/peps/pep-0394/ I'd be -1 to changing this

So, my definition of default is "all system tools use Python 3, it is the only Python that gets to minimal buildroot/minimal Fedora installation" - that means:

I'm okay with this portion of the definition. One note is I would be hesitant about the timing of python3 being the only python that is installed into the minimal buildroot. This should probably happen in rawhide right after a branching.

...

livecd can still ship Python 2

I would consider this to be the goal that we should shoot for, though. We are constantly fighting for space on the install images and we know that people who install Fedora would like to have the ability to slim down what is installed. Shooting for no-python2 on the livecds and after that, no python2 on the install dvds, and still later, no need for python2 in the packages in the repository seem like milestones that have actual real value for end users.

...

/usr/bin/python points to Python 3

I am firmly against this. more depth was in my reply to mmaslano although I'll reply to one thing here:

...

Please note, that the pep you're referring to also states that "python

should refer to the same target as python2 but may refer to python3 on some bleeding edge distributions", so this wouldn't really be going against the pep.

This is a misinterpretion of the PEP. (This section is confusing, though: "python should refer to the same target as python2" is a recommendation to distributions. "but may refer to python3 on some bleeding edge distributions" is a statement of fact for end users to watch out for) See the recommendation section:

"For the time being, it is recommended that python should refer to python2 (however, some distributions have already chosen otherwise; see the Rationale and Migration Notes below)."

and Future Changes Section: http://www.python.org/dev/peps/pep-0394/#future-changes-to-this-recommendati...

" It is anticipated that there will eventually come a time where the third party ecosystem surrounding Python 3 is sufficiently mature for this recommendation to be updated to suggest that the python symlink refer to python3 rather than python2.

This recommendation will be periodically reviewed over the next few years, and updated when the core development team judges it appropriate. "

The "may refer to python3" phrase is an acknowledgment that arch has moved to /usr/bin/python == python3 and isn't going to revert even though upstream thinks it's a... premature time to make that switch. (To be fair to arch, the discussion and PEP happened as a result of arch making that switch so they'd already committed to that before the consensus was formed that this would be a bad thing to do atthis time. We don't have that excuse ;-)

If you'd like to read the discussions for yourself, there are three threads linked from the PEP. An even earlier one is at: http://mail.python.org/pipermail/python-dev/2010-November/105252.html

...

...
The python package itself should probably also remain python2 due to dependencies and expectations from other distros and documentation -- I think I'd be -1 to changing this

The Fedora live images contain only python3, not python2 -- I'd be heavily in favour of this. +1

...
This is going to be a multirelease effort that is going to affect lots of Fedora parts. Since we will need to switch default package manager from Yum to DNF (which is supposed to work with Python 3), we will need to wait for that. I've been told that DNF should be default in F22, so that's my target, too. That should also give everyone else plenty of time to work on other essential packages to make this happen.

Getting there at the same time as we get to DNF sounds like a good timeline. (But see my note on anaconda below). +1

...
Here is my analysis/proposal: Before switching, we need to make sure that everything "important" (*) is Python 3 compatible. There are three steps I see in this transition:

Getting rid of Python 2 in mock minimal buildroot.

I'm not sure about this one as it will cause a lot of package churn. It might be a necessary pain pointi or it might be a pain point we want to defer until later in our porting efforts. Have to think about it more.

If you look at the minimal mock buildroot for rawhide now, the only thing that is drawing in Python is gdb because of it's Python bindings (if I'm not mistaken). So compiling GDB against Python 3, which should work with newest gdb, will accomplish this AFAICS.

<nod> I thought that python might be one of the packages that showed up in the dep chain here: http://fedoraproject.org/wiki/Packaging:Guidelines#Exceptions_2

which would mean that packages might be leaving it out of BuildRequires right now. I wasn't able to find a dep chain leading back to python from one of those so I think that any package which isn't explicitly BuildRequiring python just has a bug. No objection here although as noted earlier, we should probably do this right after a rawhide branch so that any of these bugs are found and fixed in plenty of time.

...

...
...

Porting Anaconda to Python 3.

+1 -- unfortunately, this probably depends on DNF.... So we may need to push DNF in F22 and anaconda compatible with python3 in F23.

DNF is a continuous effort. I believe that DNF will provide it's Python 3 bindings sooner than in F22, so Anaconda devels can simultaneously do porting to Python 3 as well as to DNF. IMO this is good thing, since they will just do one big rewrite instead of two smaller.

Well.... 1) If DNF lands their python3 bindings sooner, that's fine for the timeline. But if they don't, anaconda can't be finished until after. So this is something to note as a key piece of the switch. 2) Fedora anaconda experience (and general open source development experience as well... you've read the Joel on Software article about netscape, right?) would tend to show that big rewrites are worse than several smaller ones.

Sure, smaller ones mean that you touch the same code a few times before you're satisfied with it. But smaller ones mean that you can stop partway through (say, for instance, because we needed a few extra weeks to port to DNF and that doesn't leave us enough time to complete the python3 port [even assuming that that doesn't take longer than anticipated] in time for Fedora 22's release date) which is just anticipating that Murphy's Law is inevitably going to throw a wrench into your timeline for completion.

...

...
...

Making all livecd packages depend on Python 3 by default (and eventually

getting rid of Python 2 from livecd) - this will also require switching from Yum to DNF as a default, that is supposed to support Python 3.

+1 -- this is what I see as the eventual goal (or perhaps, livecd python2 free followed by DVD python2 free followed by distro python2 free).

3.5) Switch tools that could target either python2 or python3 to target python3. Currently the packaging guidelines say to target python2 to control dep proliferation and because that's the most supported by the larger python ecosystem. We should switch the recommendation when our minimal environment must have python3 but does not need to have python2.

IMO we should switch this for F21, since livecd ships Python 3 anyway, so the switch doesn't have to happen in one point, but can be continuous.

Ehhh... I don't think the livecd having to ship python3 is a good measure for this. I think something considerably more minimal than the livecd would be better. talked to mattdm (since he's been working on minimal environments) and he suggested @core or @standard groups might be appropriate.) The idea is to avoid doubling the needed python stacks on minimal environments until necessary. Switching tools that have the option of running on either one or the other of the stacks to python3 prematurely means that we start doubling the the python stacks needed before it's necessary.

...

...
...
( 4) Making as much of the remaining packages Python 3 compatible )

We could talk quite a bit on this point -- How active do we want to be with the things that aren't in one of the essential buckets from further up. We could defer thinking about this until after we get the livecd python2-free, though.

This is really the last step, that is somehow tied what you mentioned as a reaction to 3) - going through the rest of packages on DVD and then whole distro. This will take few more releases I guess, but it is not that important as sorting out livecd.

yeah, this strikes me as extending far into the future. I will note, though, that ideas about changing /usr/bin/python to point to python3 probably come in the latter stages of this step rather than before.

...

...
...

Switching the %{?with_python3} conditionals in specfiles to

%{?with_python2} (we will probably create a script to automate this, at least partially)

-1: This one doesn't make any sense to do. The third-party python library ecosystem is highly weighted for python2. There are only a handful of libraries that support python3 and not python2. There are a boatload of libraries that support python2 and not python3. We're starting from a base of existing python2 packages that may add support for python3. The conditionals are there to enable packaging of that situation.

And this situation will be changing in the future. Right now, there are not so many Python packages in Fedora that only support Python 2 (I didn't count, but you don't see them too often these days).

Uh.... What's your methodology? This is a very, very, very bad estimate but I think it'll show that we need more than an anecdote to prove that statement:

$ repoquery -q 'python3-*' |wc -l 259 $ repoquery -q 'python-*' |wc -l 1099

Also, just to be clear, you do understand why switching the conditionals doesn't work for our existing packages, correct?

...

IMO Fedora should lead the way of making Python 3 "the Python" and Python 2 "the old compat version". This also makes sense in the traditional linux-distro "one version of package" that we should be trying to pursue.

-1. I'm serious, you've got the wrong conceptual model of the relationship between python2 and python3 stuck in your head and it's coloring what you're trying to achieve in bad ways. Python3 is not an upgrade to Python2. Python3 is a new language. It is compatible in many ways. If you can target recent enough versions (at least python-2.6 but python2.7 is better and python-3.3) you can set out to purposefully code things that work on both languages. But if you're writing general, working python2 code using idioms and thought processes that you've mastered over the last 10 years, chances are extremely high that not even your simple scripts are going to run without modification.

...

...
...
FAQ: Q: Why do we need to switch to Python 3? A: Because Python 2 is old, slower, less pythonic, doesn't get any more functionality and it won't be that long before the official upstream support ends [1]

Although I agree with the need to switch to python3, I don't think the first three reasons are very compelling arguments (they're only half-truths) -- we should concentrate on the last reason and also on features that python3 has that pyhton2 doesn't. Chained exceptions are a pretty nice thing, for instance.

So first three reason:

Python 2 is old - how is that a half-truth?

C is older. Let's get rid of that first. Old is not a reason to switch or get rid of it.

...

Slower - yes, in the beginning, Python 3 was significantly slower because of nonoptimal code after the rewrite from Python 3. But with Python 3.3 for instance, you get tons of speed improvements - decimal module for instance got a significant boost. Brett Cannon had a nice presentation about speed benchmarking [1]. Yes, Python 3 is slower in some areas, but mostly it's faster.

* pypy is faster and mostly python2 compatible. Many shops that want speed are switching to that rather than python3. (Not saying this is a good idea for a distro to do. I'm just saying that making the speed argument is not compelling). * people don't use python for raw speed of processing. They really just care if it's fast enough. People who write python code would be happy to take speed increases if they were free. But python2 to python3 requires porting code so it comes with a significant cost. Speed is a side effect of switching, not a reason to switch.

...

Less Pythonic - where do I start with this? Python 3 got rid of tons of unnecessary syntactic constructs as well as builtin object methods. E.g. "print" vs. "print()"; exception raising syntax; dict.iteritems() removed and dict.items() only left, more consistent unicode handling etc. So in the sense of having only one way to do things, Python 3 is more pythonic than Python 2. If you read through zen of python, you can find more arguments for this (e.g. making int and long one type - "Special cases aren't special enough to break the rules."; simplification/rewrite of parts of stdlib - "Simple is better than complex.", etc.)

pythonic is a very vague statement and I wouldn't consider most of your list to be examples of those. Yes, python3 may be a *better* language (and I would include most of your list as "features of python3 that python2 does not have) but a more pythonic language... that's not something that you can readily measure. For instance, I can make the case that python3's unicode handling is less pythonic than python2 as it violates three rules of the zen of python:

Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess.

(To be fair, python2 violated some of these rules in its unicode handling as well, although errors should never pass silently would probably take some work to convince most people :-)

Anyhow, I stick to my assertion that we should be talking mostly about upstream support ending as the reason to switch and also the features that python3 provides that python2 does not [and as noted, I'd lump your "pythonic" list into this category.] Stating a compelling argument of why people should change isn't just about identifying all the things that the change will do. It's also about identifying the things that the change will do that are important to the person and resonate with their needs.

-Toshio

Nick Coghlan

21 Jul 21 Jul

7:15 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/20/2013 06:11 AM, Toshio Kuratomi wrote:

...

pythonic is a very vague statement and I wouldn't consider most of your list to be examples of those. Yes, python3 may be a *better* language (and I would include most of your list as "features of python3 that python2 does not have) but a more pythonic language... that's not something that you can readily measure. For instance, I can make the case that python3's unicode handling is less pythonic than python2 as it violates three rules of the zen of python:

Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess.

(To be fair, python2 violated some of these rules in its unicode handling as well, although errors should never pass silently would probably take some work to convince most people :-)

The *only* reason Python 3 allows any Unicode errors to pass silently is because Python 2 tolerated broken system configurations (like non-UTF-8 filesystem metadata on nominally UTF-8 systems) by treating them as opaque 8-bit strings that were retrieved from OS interfaces and then passed back unmodified (see PEP 383 for details). If Python 3 didn't work on those systems, people would blame Python 3, not the already broken system configuration ("But Python 2 works, why is Python 3 so broken?"). os.listdir() -> open() is the canonical example of the kind of "round trip" activity that we felt we needed to support even for systems with improperly encoded metadata (including file names).

You can already force Python 3 into completely strict mode by doing:

import codecs; codecs.register_error("surrogateescape", codecs.strict_errors)

...

...
...
b"\xe9".decode("ascii", errors="surrogateescape")

'\udce9'

...

...
...
import codecs; codecs.register_error("surrogateescape", codecs.strict_errors) b"\xe9".decode("ascii", errors="surrogateescape")

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

I'd eventually like Python 3 to support a string tainting model, but who knows when I'll find time to actually work on that :(

To clarify what I mean by "string tainting" (since it's more than the simple tainted-or-not model used in other languages), currently Python 3 unicode strings may exist in a state similar to 8-bit strings in Python 2: they're tainted by particular encoding assumptions, so combining them with arbitrary other pieces of "text" or encoding them to a different output format isn't a valid operation. Unfortunately, Python 3, like Python 2, currently allows you to combine strings tainted by such assumptions without complaint, unless/until you try to encode them again and than you *might* get an error, or you might just get invalid data. It's significantly less common in Python 3 than it was in Python 2 (as it requires that you decode data using an encoding that doesn't actually match the data contents, whereas in Python 2 it just required combining 8-bit data that used two different encodings), but the problem definitely isn't solved at this point, just mitigated.

Tainting would involve having the surrogateescape codec set an attribute on a string recording the encoding assumption if it had to embed any surrogates in the Private Use Area, as well as a keyword only "taint" argument to decode operations (e.g. to force tainting when using "latin-1" as a universal text codec). Various string operations would then be modified to use the following rules:

* Both input strings untainted? Output is untainted. * One input tainted, one untainted? Output is tainted with the same assumption as the tainted input * Both inputs tainted with the same assumption? Output is also tainted with that assumption. * Inputs tainted with different assumptions? Immediate ValueError complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to encode tainted strings to an encoding other than the tainted one.

Strings would likely gain a "remove_taint" method (name TBD), that did the "encode using tainted encoding, redecode using correct encoding".

And yes, this could be used for traditional tainting as well - setting the taint assumption to something like "<untrusted>" (e.g. for user input) or "<sensitive>" (e.g. for not-yet-hashed user credentials) would be enough to prevent serialisation under this scheme.

However, I have my hands full with packaging issues at the moment (see PEP 426), and then I still want to fix the model for embedding CPython (see PEP 432). So if it's left to me, there's no way this idea could become reality before Python 3.5. I may at least lob it in the direction of python-ideas, though, to see if someone else is prepared to run with it...

Cheers, Nick.

- -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJR7HmjAAoJEHEkJo9fMO/LwggH/i2eMcsgjLZ1BhuaAJdhxB4+ wDmwKs3wGrmDjCh0JJoo49TiOlBhP5szN4YEqnKWTfmJdbg/pW2bLBaxeAxApVu1 EjXjo9jQR+fxj38D1yyLa8QHJxWbr70CmA7/K9aTDt+rvK83a8a2eIw1GjcnGrv4 6qOJ7E/vbkyCa31wRIXrb/OysizJbJRQd1+luEVWaI2yo9kM/ogcpfB4DyJVW7Sm /DMvXSpkASPhqXgN9DhHcbLk5eY27rKqr9gZ2UuTt67XWXh1btKj/zc0WPWF7rX+ Twr2dckl0x0iHh637MWxGRjiMBqIBCTUOnQZ+WTjBhLsUUTyAH0yvHFUfuJoffo= =85mI -----END PGP SIGNATURE-----

Toshio Kuratomi

22 Jul 22 Jul

12:25 a.m.

On Mon, Jul 22, 2013 at 10:15:31AM +1000, Nick Coghlan wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/20/2013 06:11 AM, Toshio Kuratomi wrote:

...
pythonic is a very vague statement and I wouldn't consider most of your list to be examples of those. Yes, python3 may be a *better* language (and I would include most of your list as "features of python3 that python2 does not have) but a more pythonic language... that's not something that you can readily measure. For instance, I can make the case that python3's unicode handling is less pythonic than python2 as it violates three rules of the zen of python:

Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess.

(To be fair, python2 violated some of these rules in its unicode handling as well, although errors should never pass silently would probably take some work to convince most people :-)

The *only* reason Python 3 allows any Unicode errors to pass silently is because Python 2 tolerated broken system configurations (like non-UTF-8 filesystem metadata on nominally UTF-8 systems) by treating them as opaque 8-bit strings that were retrieved from OS interfaces and then passed back unmodified (see PEP 383 for details). If Python 3 didn't work on those systems, people would blame Python 3, not the already broken system configuration ("But Python 2 works, why is Python 3 so broken?"). os.listdir() -> open() is the canonical example of the kind of "round trip" activity that we felt we needed to support even for systems with improperly encoded metadata (including file names).

Actually, surrogateescape is a *great* improvement over the previous python3 behaviour of silently dropping data that it did not understand.

If python3 could just finally fix outputting text with surrogateescaped bytes then it would finally clean up the last portion of this and I would be able to stop pointing out the various ways that python3's unicode handling is just as broken as pyhton2's -- just in different ways. :-)

...

Tainting would involve having the surrogateescape codec set an attribute on a string recording the encoding assumption if it had to embed any surrogates in the Private Use Area, as well as a keyword only "taint" argument to decode operations (e.g. to force tainting when using "latin-1" as a universal text codec). Various string operations would then be modified to use the following rules:

Both input strings untainted? Output is untainted.

One input tainted, one untainted? Output is tainted with the same

assumption as the tainted input

Both inputs tainted with the same assumption? Output is also tainted

with that assumption.

This sounds like it might be nice. The one thing I'm a little unsure about is that it sounds like code is going to have to handle this explicitly. Judging from the way all but a select few people handle Text vs encoded bytes right now, that seems like it won't achieve very much. OTOH, I could see this as being an additional bit of information that's entirely optional whether people use it. I think that could be helpful in some cases of debugging. (OTOH, often when encoding vs text issues arise it's because the coder and program have no way to know the correct encoding. When that happens, so the extra information might not be that useful for the majority of cases anyway).

...

Inputs tainted with different assumptions? Immediate ValueError

complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to encode tainted strings to an encoding other than the tainted one.

I'm a little leery of these. The reason is that after using both python2 and the early versions of python3 I became a firm believer that the problem with python2's unicode handling wasn't that it threw exceptions, rather the problem was that the same bit of code was too prone to passing through certain data without error and throwing an error with other data. Programmers who tested their code with only ascii data or only data encoded in their locale's encoding, or only when their locale was a utf-8 encoding were unable to replicate or understand the errors that their user's got when they ran them in the crazy real-world environments that user's inevitably have. These rules that throw an Exception suffer from the same reliance on the specific data and environment and will lead to similar tracebacks that programmers won't be able to easily replicate.

-Toshio

Nick Coghlan

2:15 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/22/2013 03:25 PM, Toshio Kuratomi wrote:

...

On Mon, Jul 22, 2013 at 10:15:31AM +1000, Nick Coghlan wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/20/2013 06:11 AM, Toshio Kuratomi wrote:

...
pythonic is a very vague statement and I wouldn't consider most of your list to be examples of those. Yes, python3 may be a *better* language (and I would include most of your list as "features of python3 that python2 does not have) but a more pythonic language... that's not something that you can readily measure. For instance, I can make the case that python3's unicode handling is less pythonic than python2 as it violates three rules of the zen of python:

Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess.

(To be fair, python2 violated some of these rules in its unicode handling as well, although errors should never pass silently would probably take some work to convince most people :-)

The *only* reason Python 3 allows any Unicode errors to pass silently is because Python 2 tolerated broken system configurations (like non-UTF-8 filesystem metadata on nominally UTF-8 systems) by treating them as opaque 8-bit strings that were retrieved from OS interfaces and then passed back unmodified (see PEP 383 for details). If Python 3 didn't work on those systems, people would blame Python 3, not the already broken system configuration ("But Python 2 works, why is Python 3 so broken?"). os.listdir() -> open() is the canonical example of the kind of "round trip" activity that we felt we needed to support even for systems with improperly encoded metadata (including file names).

Actually, surrogateescape is a *great* improvement over the previous python3 behaviour of silently dropping data that it did not understand.

That behaviour only existed in 3.1. It's one of the reasons nobody really used 3.1 for anything ;)

...

If python3 could just finally fix outputting text with surrogateescaped bytes then it would finally clean up the last portion of this and I would be able to stop pointing out the various ways that python3's unicode handling is just as broken as pyhton2's -- just in different ways. :-)

Attempting to encode data containing surrogate escapes without setting "errors=surrogateescape" is a sign that tainted data has escaped somewhere. So it's late notification of an error, but still indicative of an error somewhere. We'll never silence it by default.

...

...
Tainting would involve having the surrogateescape codec set an attribute on a string recording the encoding assumption if it had to embed any surrogates in the Private Use Area, as well as a keyword only "taint" argument to decode operations (e.g. to force tainting when using "latin-1" as a universal text codec). Various string operations would then be modified to use the following rules:

Both input strings untainted? Output is untainted. * One input

tainted, one untainted? Output is tainted with the same assumption as the tainted input * Both inputs tainted with the same assumption? Output is also tainted with that assumption.

This sounds like it might be nice. The one thing I'm a little unsure about is that it sounds like code is going to have to handle this explicitly. Judging from the way all but a select few people handle Text vs encoded bytes right now, that seems like it won't achieve very much. OTOH, I could see this as being an additional bit of information that's entirely optional whether people use it. I think that could be helpful in some cases of debugging. (OTOH, often when encoding vs text issues arise it's because the coder and program have no way to know the correct encoding. When that happens, so the extra information might not be that useful for the majority of cases anyway).

...

Inputs tainted with different assumptions? Immediate

ValueError complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to encode tainted strings to an encoding other than the tainted one.

I'm a little leery of these. The reason is that after using both python2 and the early versions of python3 I became a firm believer that the problem with python2's unicode handling wasn't that it threw exceptions, rather the problem was that the same bit of code was too prone to passing through certain data without error and throwing an error with other data. Programmers who tested their code with only ascii data or only data encoded in their locale's encoding, or only when their locale was a utf-8 encoding were unable to replicate or understand the errors that their user's got when they ran them in the crazy real-world environments that user's inevitably have. These rules that throw an Exception suffer from the same reliance on the specific data and environment and will lead to similar tracebacks that programmers won't be able to easily replicate.

You can't get away from this problem: decoding with the wrong encoding corrupts your data. The only thing we can control are the consequences of that data corruption. The aim of Python 3 is that the *default* behaviour will be to produce an exception (hopefully with enough info to let you debug the situation), but provide various tools to let you say "I prefer the risk of silent output corruption, thanks."

Once the PEP 432 initialisation model is in place, I'd also like to provide a "force UTF-8" setting, so people can more easily ensure cross-platform consistency, even if some of the systems they use aren't configured to use UTF-8 as the environmental encoding.

Cheers, Nick.

- -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJR7NwmAAoJEHEkJo9fMO/LirQH/1FBBQM5qiQjN7mzjH7egWr0 ykxjs2Xm1FfAV4VOOUSDnEPU24tJ5yAf+yu3+vDlhiu8hdPXQnZ4xeqlmDOmxHHZ xnURHrfZl7rZDHvjkaGb4ilYuxYFhUIgeSTNDgrajPxdCP92D54vgtX/Y04PcFWb ATc24oaj/zYuA7siqwe6YGrtHg6ON93kZvYWgSX+p4x0JIdyjv8brS/LkohyVSkl 6bRHqnmwt3dMx87JaPZwjHhAn2hpNYEYYdac6zAkyT3b2AgHYHYlob0E9bjk+0nk 60p1cSwqK7NTMaGFR8lyt1V7bJsO6z3vGkjKW/kj9BTGmqGDRtVih0xEMw+Pe30= =FNqT -----END PGP SIGNATURE-----

Toshio Kuratomi

9:42 a.m.

On Mon, Jul 22, 2013 at 05:15:50PM +1000, Nick Coghlan wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/22/2013 03:25 PM, Toshio Kuratomi wrote:

...
If python3 could just finally fix outputting text with surrogateescaped bytes then it would finally clean up the last portion of this and I would be able to stop pointing out the various ways that python3's unicode handling is just as broken as pyhton2's -- just in different ways. :-)

Attempting to encode data containing surrogate escapes without setting "errors=surrogateescape" is a sign that tainted data has escaped somewhere. So it's late notification of an error, but still indicative of an error somewhere. We'll never silence it by default.

That's a bit simplified from what python3's direction on this is unless Victor Stinner's work is only intended to be temporary.

$ export LC_ALL=en_US.utf-8 $ mkdir abc$'\xff' $ python3.3

...

...
...
import os se_dirlisting = os.listdir('.') # surrogateescape in a text string: repr(se_dirlisting[0])

"'abc\udcff'"

...

...
...
# This doesn't traceback and it has to encode se_dirlisting when passing # it out of python: os.listdir(se_dirlisting[0])

[]

...

...
...
# Works with other modules as well: import subprocess subprocess.call(['ls', se_dirlisting[0]])

AFAIK, the justification is that the surrogateescape'd strings are both coming from and going to the OS. They're crossing outside of the line that python3 draws around itself and there's an implicit encoding and decoding there. This seems fine to me as a strategy. The problems are just that there are places where python3 doesn't yet use surrogateescape when crossing this boundary. The one I was specifically thinking of when I wrote this was the print() function:

...

...
...
print(se_dirlisting[0])

Traceback (most recent call last): File "<stdin>", line 3, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 3: surrogates not allowed

(When I mentioned this at pycon you brought up: http://bugs.python.org/issue15216 which looked promising but seems to have stalled ;-)

...

...
...

Inputs tainted with different assumptions? Immediate

ValueError complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to encode tainted strings to an encoding other than the tainted one.

I'm a little leery of these. The reason is that after using both python2 and the early versions of python3 I became a firm believer that the problem with python2's unicode handling wasn't that it threw exceptions, rather the problem was that the same bit of code was too prone to passing through certain data without error and throwing an error with other data. Programmers who tested their code with only ascii data or only data encoded in their locale's encoding, or only when their locale was a utf-8 encoding were unable to replicate or understand the errors that their user's got when they ran them in the crazy real-world environments that user's inevitably have. These rules that throw an Exception suffer from the same reliance on the specific data and environment and will lead to similar tracebacks that programmers won't be able to easily replicate.

You can't get away from this problem: decoding with the wrong encoding corrupts your data.

Incorrect. Decoding with the wrong encoding and then attempting to operate on the decoded data in certain way will corrupt your data. Operating on it in certain other ways will not. surrogateescape was designed because people realized that "round tripping" the data was a desirable feature. Your thoughts on tainting still allow for manipulating strings with other strings that are tainted in the same manner.

...

The only thing we can control are the consequences of that data corruption. The aim of Python 3 is that the *default* behaviour will be to produce an exception (hopefully with enough info to let you debug the situation), but provide various tools to let you say "I prefer the risk of silent output corruption, thanks."

Uh no. In the discussion about unicode handling in the python3.0 timeframe on python-dev, I brought up this problem and suggested that we were learning the wrong lesson from python2. ie: at that time we were learning that throwing an error was the wrong thing to do even though there was plainly a problem between the data and the programmer's assumptions. My theory was that it wasn't the error that was a python2 wart, it was the fact that we threw the error too late, too infrequently, and just in general, in a way that made it harder for the programmer to debug the errors the user would see on their real-world systems. The response at that time was that errors were, in fact, the wart that people wanted to remove.

MvL then produced the surrogateescape handler which was supposed to address these things by allowing people to round trip undecodable bytes into text strings and back out again. This was good in that we no longer *silently* threw data away just because we couldn't decode it. But it was bad as it reintroduced the python2 problem of having a valid text string that portions of the standard python3 framework (mostly stdlib functions) could not handle. This meant that the programmer could test print(os.listdir()) on their system where all filenames were utf-8 and things would work. But a user could then run the same code on a system where the locale did not match with the encoding of all the filenames and would get a traceback. This was a portion of the python2 wart resurfaced.

Victor Stinner's work since then to integrate surrogateescape into more stdlib functions has helped immensely to craft a unified strategy for undecodable bytes. print() still throws a traceback but most other places where we take in and send out undecodable bytes just work.

Now, I'm not saying that the idea of annotating surogateescaped data with information about the encoding it was created using is bad. I am *leery* (not yet convinced that the proposal is safe but also not yet convinced that it is harmful... just knowing that there's a potential bad interaction there) of throwing an exception as a result of examining that data as it brings back a portion of the python2 wart of throwing an exception late in the process rather than early, when the data is entering the system. But, just like surrogateescape itself, I can see that the idea is that the error cases are supposed to shrink with this iteration -- at some point the hope would be that the error conditions are small enough that the programmer no longer has to care about them and perhaps this is that tipping point.

Some comments though -- if you're going to throw an error, don't throw a ValueError. It's immensely useful in python2 to get something (UnicodeError) that is only thrown by an attempt to transform the data wrongly for two reasons: Lazy people can just catch UnicodeError and be done with it. People debugging issues can immediately see that the bug falls into a (relatively) narrow space of issues.

If someone creates their own surrogateescaped string: se_str = 'abc\udcff' the taint-checking machinery should allow that to be combined with strings from any other sources. It likely means that the programmer is working around an API they don't control that takes text strings but should take a bytes string.

-Toshio

Nick Coghlan

23 Jul 23 Jul

12:30 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/23/2013 12:42 AM, Toshio Kuratomi wrote:

...

On Mon, Jul 22, 2013 at 05:15:50PM +1000, Nick Coghlan wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 07/22/2013 03:25 PM, Toshio Kuratomi wrote:

...
If python3 could just finally fix outputting text with surrogateescaped bytes then it would finally clean up the last portion of this and I would be able to stop pointing out the various ways that python3's unicode handling is just as broken as pyhton2's -- just in different ways. :-)

Attempting to encode data containing surrogate escapes without setting "errors=surrogateescape" is a sign that tainted data has escaped somewhere. So it's late notification of an error, but still indicative of an error somewhere. We'll never silence it by default.

That's a bit simplified from what python3's direction on this is unless Victor Stinner's work is only intended to be temporary.

$ export LC_ALL=en_US.utf-8 $ mkdir abc$'\xff' $ python3.3

...
...
...
import os se_dirlisting = os.listdir('.') # surrogateescape in a text string: repr(se_dirlisting[0])

"'abc\udcff'"

...
...
...
# This doesn't traceback and it has to encode se_dirlisting when passing # it out of python: os.listdir(se_dirlisting[0])

[]

...
...
...
# Works with other modules as well: import subprocess subprocess.call(['ls', se_dirlisting[0]])

0

For some APIs we set surrogateescape on the user's behalf. It's still getting set, though :)

...

AFAIK, the justification is that the surrogateescape'd strings are both coming from and going to the OS. They're crossing outside of the line that python3 draws around itself and there's an implicit encoding and decoding there. This seems fine to me as a strategy. The problems are just that there are places where python3 doesn't yet use surrogateescape when crossing this boundary.

Strictly speaking, there are a bunch of interfaces that we declare as operating based on "os.fsencode" and "os.fsdecode". The fact we're not especially clear on which encoding/decoding strategy we use for particular APIs is the docs gap Armin was talking about in http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/

The following (if I recall correctly) all use os.fsencode/decode:

os.environ os.listdir sys.argv os.exec* functions subprocess environ subprocess arguments

We *should* use os.fsdecode to ensure file name attributes are always unicode, but currently do not (I just filed a bug for that: http://bugs.python.org/issue18534).

...

The one I was specifically thinking of when I wrote this was the print() function:

...
...
...
print(se_dirlisting[0])

Traceback (most recent call last): File "<stdin>", line 3, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 3: surrogates not allowed

Unfortunately, the standard streams *don't* currently use the same scheme as we assume for other operating system APIs. The reason we don't is because we're reluctant to assume that all data received over those streams will be in an ASCII or UTF-8 compatible encoding (e.g. our feedback from Japan is that there are still plenty of systems there using non-ASCII compatible encodings)

...

(When I mentioned this at pycon you brought up: http://bugs.python.org/issue15216 which looked promising but seems to have stalled ;-)

Alas, there's currently no champion to drive it. I'm interested, but don't have enough time (improving the Unicode handling is third on the list of "big problems in Python" that I currently care about, after packaging and the initialisation code). Several of the other core devs are sufficiently dubious of the notion of allowing mojibake to be created silently that they're conflicted on offering the feature at all, and thus not motivated to work on it :(

Since absolutely nobody in the world cares enough about the CPython upstream to pay *anyone* to work on it full time (not even the Linux distros or the members of the OpenStack foundation), this situation is unlikely to change any time soon :(

...

...
...
...

Inputs tainted with different assumptions? Immediate

ValueError complaining about the taint mismatch

String encoding would be updated to trigger a ValueError when asked to encode tainted strings to an encoding other than the tainted one.

I'm a little leery of these. The reason is that after using both python2 and the early versions of python3 I became a firm believer that the problem with python2's unicode handling wasn't that it threw exceptions, rather the problem was that the same bit of code was too prone to passing through certain data without error and throwing an error with other data. Programmers who tested their code with only ascii data or only data encoded in their locale's encoding, or only when their locale was a utf-8 encoding were unable to replicate or understand the errors that their user's got when they ran them in the crazy real-world environments that user's inevitably have. These rules that throw an Exception suffer from the same reliance on the specific data and environment and will lead to similar tracebacks that programmers won't be able to easily replicate.

You can't get away from this problem: decoding with the wrong encoding corrupts your data.

Incorrect. Decoding with the wrong encoding and then attempting to operate on the decoded data in certain way will corrupt your data. Operating on it in certain other ways will not. surrogateescape was designed because people realized that "round tripping" the data was a desirable feature. Your thoughts on tainting still allow for manipulating strings with other strings that are tainted in the same manner.

Agreed, there should have been a "potentially" in there, since it depends on what you do with it.

...

MvL then produced the surrogateescape handler which was supposed to address these things by allowing people to round trip undecodable bytes into text strings and back out again. This was good in that we no longer *silently* threw data away just because we couldn't decode it. But it was bad as it reintroduced the python2 problem of having a valid text string that portions of the standard python3 framework (mostly stdlib functions) could not handle. This meant that the programmer could test print(os.listdir()) on their system where all filenames were utf-8 and things would work. But a user could then run the same code on a system where the locale did not match with the encoding of all the filenames and would get a traceback. This was a portion of the python2 wart resurfaced.

Right, but the data driven aspect is largely unavoidable at this point - - until we get data that doesn't match the system configuration as we understand it, then as far Python knows, the system configuration is correct.

...

Victor Stinner's work since then to integrate surrogateescape into more stdlib functions has helped immensely to craft a unified strategy for undecodable bytes. print() still throws a traceback but most other places where we take in and send out undecodable bytes just work.

Yes, I agree printing to stdout is still a problem. That's why I wrote http://python-notes.curiousefficiency.org/en/latest/python3/text_file_proces... and am in favour of making it easy to change the encoding and error settings of an existing stream.

...

Now, I'm not saying that the idea of annotating surogateescaped data with information about the encoding it was created using is bad. I am *leery* (not yet convinced that the proposal is safe but also not yet convinced that it is harmful... just knowing that there's a potential bad interaction there) of throwing an exception as a result of examining that data as it brings back a portion of the python2 wart of throwing an exception late in the process rather than early, when the data is entering the system. But, just like surrogateescape itself, I can see that the idea is that the error cases are supposed to shrink with this iteration -- at some point the hope would be that the error conditions are small enough that the programmer no longer has to care about them and perhaps this is that tipping point.

Right, the situation we genuinely care about is that if a system is properly configured to use UTF-8 for everything, then it should all be fine and wonderful. Everything beyond that is a best effort attempt to better cope with what we consider to be legacy system configurations.

...

Some comments though -- if you're going to throw an error, don't throw a ValueError. It's immensely useful in python2 to get something (UnicodeError) that is only thrown by an attempt to transform the data wrongly for two reasons: Lazy people can just catch UnicodeError and be done with it. People debugging issues can immediately see that the bug falls into a (relatively) narrow space of issues.

Yeah, UnicodeError or a new TaintError would be better.

...

If someone creates their own surrogateescaped string: se_str = 'abc\udcff' the taint-checking machinery should allow that to be combined with strings from any other sources. It likely means that the programmer is working around an API they don't control that takes text strings but should take a bytes string.

Sure, I think tainting should be a separate mechanism. It's also purely hypothetical vapourware at this point, unless/until someone else finds it interesting enough to try to implement it.

Cheers, Nick.

- -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane

Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJR7hUJAAoJEHEkJo9fMO/LpA0H/jeHbKG3ITEPuF91o0Xkvu2Z Bxogc8BOd2qsavRdCToTR6Cw98uKRjSEerjAfhP3he/X4MLqlSmz/nMi5rzxwodH us0okSSggKjo3blWumonn9MafJqRI9JcuK6J2iHi6Nq15L4yzEIybK/2AcM//Hyk NJfqWUEnjlU0Nb0Wk8t9q2QIg+NmIZKwIMRitaAPC+QDPuT6fF/FiESrDQpyJbGw OVga6x3V8xxCHQD0GnXktavGhg9SE0sN1w+gAXsb5c90c5Ns6vI/YDgQqe3YRbol O6VkwJv2XtcNTWymywzHOCqQPZ1kjmDmSeOEoT1AimMMcfmwQBdRjlKLXT5WFEA= =OxWx -----END PGP SIGNATURE-----

Bohuslav Kabrda

2:13 a.m.

----- Original Message -----

...

I'm okay with this portion of the definition. One note is I would be hesitant about the timing of python3 being the only python that is installed into the minimal buildroot. This should probably happen in rawhide right after a branching.

To be more specific, I meant "installed into the minimal buildroot by default", which only means building GDB with Python 3, as I've written elsewhere. IMO it shouldn't cause any problems, because all the packages that use python 2 during build should BR: python2-devel anyway, right? (maybe some packages with build scripts written in python 2 may have minor issues, but these should be solvable by simply adding BR: python or rewriting to python 3)

...

...

livecd can still ship Python 2

I would consider this to be the goal that we should shoot for, though. We are constantly fighting for space on the install images and we know that people who install Fedora would like to have the ability to slim down what is installed. Shooting for no-python2 on the livecds and after that, no python2 on the install dvds, and still later, no need for python2 in the packages in the repository seem like milestones that have actual real value for end users.

Ok, +1

...

...

/usr/bin/python points to Python 3

I am firmly against this. more depth was in my reply to mmaslano although I'll reply to one thing here:

...

Please note, that the pep you're referring to also states that "python

should refer to the same target as python2 but may refer to python3 on some bleeding edge distributions", so this wouldn't really be going against the pep.

This is a misinterpretion of the PEP. (This section is confusing, though: "python should refer to the same target as python2" is a recommendation to distributions. "but may refer to python3 on some bleeding edge distributions" is a statement of fact for end users to watch out for) See the recommendation section:

"For the time being, it is recommended that python should refer to python2 (however, some distributions have already chosen otherwise; see the Rationale and Migration Notes below)."

and Future Changes Section: http://www.python.org/dev/peps/pep-0394/#future-changes-to-this-recommendati...

" It is anticipated that there will eventually come a time where the third party ecosystem surrounding Python 3 is sufficiently mature for this recommendation to be updated to suggest that the python symlink refer to python3 rather than python2.

This recommendation will be periodically reviewed over the next few years, and updated when the core development team judges it appropriate. "

The "may refer to python3" phrase is an acknowledgment that arch has moved to /usr/bin/python == python3 and isn't going to revert even though upstream thinks it's a... premature time to make that switch. (To be fair to arch, the discussion and PEP happened as a result of arch making that switch so they'd already committed to that before the consensus was formed that this would be a bad thing to do atthis time. We don't have that excuse ;-)

If you'd like to read the discussions for yourself, there are three threads linked from the PEP. An even earlier one is at: http://mail.python.org/pipermail/python-dev/2010-November/105252.html

One way or the other, the PEP states that people should be careful about what /usr/bin/python points to, since it may be python 2 as well as python 3. This gives us the freedom to do the switch when we want and not break the upstream expectations (yes, I know that huge number of people expect /usr/bin/python to point to python 3, but this pep basically says that it's a bad idea to assume that).

...

...
...
The python package itself should probably also remain python2 due to dependencies and expectations from other distros and documentation -- I think I'd be -1 to changing this

The Fedora live images contain only python3, not python2 -- I'd be heavily in favour of this. +1

...
This is going to be a multirelease effort that is going to affect lots of Fedora parts. Since we will need to switch default package manager from Yum to DNF (which is supposed to work with Python 3), we will need to wait for that. I've been told that DNF should be default in F22, so that's my target, too. That should also give everyone else plenty of time to work on other essential packages to make this happen.

Getting there at the same time as we get to DNF sounds like a good timeline. (But see my note on anaconda below). +1

...
Here is my analysis/proposal: Before switching, we need to make sure that everything "important" (*) is Python 3 compatible. There are three steps I see in this transition:

Getting rid of Python 2 in mock minimal buildroot.

I'm not sure about this one as it will cause a lot of package churn. It might be a necessary pain pointi or it might be a pain point we want to defer until later in our porting efforts. Have to think about it more.

If you look at the minimal mock buildroot for rawhide now, the only thing that is drawing in Python is gdb because of it's Python bindings (if I'm not mistaken). So compiling GDB against Python 3, which should work with newest gdb, will accomplish this AFAICS.

<nod> I thought that python might be one of the packages that showed up in the dep chain here: http://fedoraproject.org/wiki/Packaging:Guidelines#Exceptions_2

which would mean that packages might be leaving it out of BuildRequires right now. I wasn't able to find a dep chain leading back to python from one of those so I think that any package which isn't explicitly BuildRequiring python just has a bug. No objection here although as noted earlier, we should probably do this right after a rawhide branch so that any of these bugs are found and fixed in plenty of time.

...
...
...

Porting Anaconda to Python 3.

+1 -- unfortunately, this probably depends on DNF.... So we may need to push DNF in F22 and anaconda compatible with python3 in F23.

DNF is a continuous effort. I believe that DNF will provide it's Python 3 bindings sooner than in F22, so Anaconda devels can simultaneously do porting to Python 3 as well as to DNF. IMO this is good thing, since they will just do one big rewrite instead of two smaller.

Well.... 1) If DNF lands their python3 bindings sooner, that's fine for the timeline. But if they don't, anaconda can't be finished until after. So this is something to note as a key piece of the switch. 2) Fedora anaconda experience (and general open source development experience as well... you've read the Joel on Software article about netscape, right?) would tend to show that big rewrites are worse than several smaller ones.

Sure, smaller ones mean that you touch the same code a few times before you're satisfied with it. But smaller ones mean that you can stop partway through (say, for instance, because we needed a few extra weeks to port to DNF and that doesn't leave us enough time to complete the python3 port [even assuming that that doesn't take longer than anticipated] in time for Fedora 22's release date) which is just anticipating that Murphy's Law is inevitably going to throw a wrench into your timeline for completion.

And still, they can write Python 3 code that is compatible with both Python 2 and Python 3, so if DNF fails to provide Python 3 bindings in time, they can run on Python 2.

...

...
...
...

Making all livecd packages depend on Python 3 by default (and

eventually getting rid of Python 2 from livecd) - this will also require switching from Yum to DNF as a default, that is supposed to support Python 3.

+1 -- this is what I see as the eventual goal (or perhaps, livecd python2 free followed by DVD python2 free followed by distro python2 free).

3.5) Switch tools that could target either python2 or python3 to target python3. Currently the packaging guidelines say to target python2 to control dep proliferation and because that's the most supported by the larger python ecosystem. We should switch the recommendation when our minimal environment must have python3 but does not need to have python2.

IMO we should switch this for F21, since livecd ships Python 3 anyway, so the switch doesn't have to happen in one point, but can be continuous.

Ehhh... I don't think the livecd having to ship python3 is a good measure for this. I think something considerably more minimal than the livecd would be better. talked to mattdm (since he's been working on minimal environments) and he suggested @core or @standard groups might be appropriate.) The idea is to avoid doubling the needed python stacks on minimal environments until necessary. Switching tools that have the option of running on either one or the other of the stacks to python3 prematurely means that we start doubling the the python stacks needed before it's necessary.

As I replied to mattdm, I'm not against switching this at a single time (although doing more little steps is better than doing a single huge step ;)).

...

...
...
...
( 4) Making as much of the remaining packages Python 3 compatible )

We could talk quite a bit on this point -- How active do we want to be with the things that aren't in one of the essential buckets from further up. We could defer thinking about this until after we get the livecd python2-free, though.

This is really the last step, that is somehow tied what you mentioned as a reaction to 3) - going through the rest of packages on DVD and then whole distro. This will take few more releases I guess, but it is not that important as sorting out livecd.

yeah, this strikes me as extending far into the future. I will note, though, that ideas about changing /usr/bin/python to point to python3 probably come in the latter stages of this step rather than before.

...
...
...

Switching the %{?with_python3} conditionals in specfiles to

%{?with_python2} (we will probably create a script to automate this, at least partially)

-1: This one doesn't make any sense to do. The third-party python library ecosystem is highly weighted for python2. There are only a handful of libraries that support python3 and not python2. There are a boatload of libraries that support python2 and not python3. We're starting from a base of existing python2 packages that may add support for python3. The conditionals are there to enable packaging of that situation.

And this situation will be changing in the future. Right now, there are not so many Python packages in Fedora that only support Python 2 (I didn't count, but you don't see them too often these days).

Uh.... What's your methodology? This is a very, very, very bad estimate but I think it'll show that we need more than an anecdote to prove that statement:

$ repoquery -q 'python3-*' |wc -l 259 $ repoquery -q 'python-*' |wc -l 1099

Heh, true :) I based this on packages that I review (out of which likely 90 % or so have python3- subpackage) and I didn't really count. Well, it is high time we start working on these 700 :)

...

Also, just to be clear, you do understand why switching the conditionals doesn't work for our existing packages, correct?

Please be more specific. What do you mean "doesn't work"?

...

...
IMO Fedora should lead the way of making Python 3 "the Python" and Python 2 "the old compat version". This also makes sense in the traditional linux-distro "one version of package" that we should be trying to pursue.

-1. I'm serious, you've got the wrong conceptual model of the relationship between python2 and python3 stuck in your head and it's coloring what you're trying to achieve in bad ways. Python3 is not an upgrade to Python2. Python3 is a new language. It is compatible in many ways. If you can target recent enough versions (at least python-2.6 but python2.7 is better and python-3.3) you can set out to purposefully code things that work on both languages. But if you're writing general, working python2 code using idioms and thought processes that you've mastered over the last 10 years, chances are extremely high that not even your simple scripts are going to run without modification.

And again, I'm saying that your conceptual model is not necessarily the correct one.

...

...
...
...
FAQ: Q: Why do we need to switch to Python 3? A: Because Python 2 is old, slower, less pythonic, doesn't get any more functionality and it won't be that long before the official upstream support ends [1]

Although I agree with the need to switch to python3, I don't think the first three reasons are very compelling arguments (they're only half-truths) -- we should concentrate on the last reason and also on features that python3 has that pyhton2 doesn't. Chained exceptions are a pretty nice thing, for instance.

So first three reason:

Python 2 is old - how is that a half-truth?

C is older. Let's get rid of that first. Old is not a reason to switch or get rid of it.

Python 2 has Python 3 as a successor, C does not (yes, I know you'll say C++, but this is really not the case).

...

...

Slower - yes, in the beginning, Python 3 was significantly slower because

of nonoptimal code after the rewrite from Python 3. But with Python 3.3 for instance, you get tons of speed improvements - decimal module for instance got a significant boost. Brett Cannon had a nice presentation about speed benchmarking [1]. Yes, Python 3 is slower in some areas, but mostly it's faster.

pypy is faster and mostly python2 compatible. Many shops that want speed are switching to that rather than python3. (Not saying this is a good idea for a distro to do. I'm just saying that making the speed argument is not compelling).

people don't use python for raw speed of processing. They really just care if it's fast enough. People who write python code would be happy to take speed increases if they were free. But python2 to python3 requires porting code so it comes with a significant cost. Speed is a side effect of switching, not a reason to switch.

It is one of the reasons to switch for me.

...

...

Less Pythonic - where do I start with this? Python 3 got rid of tons of

unnecessary syntactic constructs as well as builtin object methods. E.g. "print" vs. "print()"; exception raising syntax; dict.iteritems() removed and dict.items() only left, more consistent unicode handling etc. So in the sense of having only one way to do things, Python 3 is more pythonic than Python 2. If you read through zen of python, you can find more arguments for this (e.g. making int and long one type - "Special cases aren't special enough to break the rules."; simplification/rewrite of parts of stdlib - "Simple is better than complex.", etc.)

pythonic is a very vague statement and I wouldn't consider most of your list to be examples of those. Yes, python3 may be a *better* language (and I would include most of your list as "features of python3 that python2 does not have) but a more pythonic language... that's not something that you can readily measure. For instance, I can make the case that python3's unicode handling is less pythonic than python2 as it violates three rules of the zen of python:

Explicit is better than implicit. Errors should never pass silently. In the face of ambiguity, refuse the temptation to guess.

Eh, I really don't see where Python 3 unicode handling violates these. Could you be more specific?

...

(To be fair, python2 violated some of these rules in its unicode handling as well, although errors should never pass silently would probably take some work to convince most people :-)

Anyhow, I stick to my assertion that we should be talking mostly about upstream support ending as the reason to switch and also the features that python3 provides that python2 does not [and as noted, I'd lump your "pythonic" list into this category.] Stating a compelling argument of why people should change isn't just about identifying all the things that the change will do. It's also about identifying the things that the change will do that are important to the person and resonate with their needs.

-Toshio

-- Regards, Bohuslav "Slavek" Kabrda.

Nick Coghlan

3:18 a.m.

On 07/23/2013 05:13 PM, Bohuslav Kabrda wrote:

...

One way or the other, the PEP states that people should be careful about what /usr/bin/python points to, since it may be python 2 as well as python 3. This gives us the freedom to do the switch when we want and not break the upstream expectations (yes, I know that huge number of people expect /usr/bin/python to point to python 3, but this pep basically says that it's a bad idea to assume that).

As the migration notes in PEP 394 say: "More conservative distributions that are less willing to tolerate breakage of third party scripts continue to alias it to python2. Until the conventions described in this PEP are more widely adopted, having python invoke python2 will remain the recommended option."

So I guess it depends if Fedora sees itself as being in the "More conservative" camp or not :)

...

...
Also, just to be clear, you do understand why switching the conditionals doesn't work for our existing packages, correct?

Please be more specific. What do you mean "doesn't work"?

...
...
IMO Fedora should lead the way of making Python 3 "the Python" and Python 2 "the old compat version". This also makes sense in the traditional linux-distro "one version of package" that we should be trying to pursue.

-1. I'm serious, you've got the wrong conceptual model of the relationship between python2 and python3 stuck in your head and it's coloring what you're trying to achieve in bad ways. Python3 is not an upgrade to Python2. Python3 is a new language. It is compatible in many ways. If you can target recent enough versions (at least python-2.6 but python2.7 is better and python-3.3) you can set out to purposefully code things that work on both languages. But if you're writing general, working python2 code using idioms and thought processes that you've mastered over the last 10 years, chances are extremely high that not even your simple scripts are going to run without modification.

And again, I'm saying that your conceptual model is not necessarily the correct one.

For simple scripts, I think Toshio is correct. The basic syntax changes, like print becoming a function, or the way you bind a caught exception to a name, are easy to handle for applications and libraries with something like python-modernize, but that's not going to happen for the vast majority of sysadmin and general user scripts out there.

Consider the flak Canonical got when they switched /bin/sh from bash to dash, and exposed all the scripts people had written that used bash extensions, but still had /bin/sh in the shebang line. The risk in switching the python symlink to python3 isn't really in breaking *applications*, it's in breaking *scripts*.

There are *lots* of ways for things to break if Python 2 code that isn't expecting it is run under Python 3. The "Common Stumbling Blocks" (http://docs.python.org/dev/whatsnew/3.0.html#common-stumbling-blocks) in the release notes for 3.0 cover some highlights. Most migration guides gloss over these, since they're looking at things from the perspective of development project that are taking steps to get ready for the change, rather than an end user who has just run "fedup" and finds that all of their homegrown Python scripts are now throwing syntax errors and other strange things.

That's why I actually advocate for distros (including Fedora) to design all their packages to not care, and always using a name qualified with the major language version. If a sysadmin changes /bin/python to point to something else, Fedora itself shouldn't be affected at all, only user scripts.

As the migration notes in PEP 394 say:

* It is suggested that even distribution-specific packages follow the python2/python3 convention, even in code that is not intended to operate on other distributions. This will reduce problems if the distribution later decides to change the version of the Python interpreter that the python command invokes, or if a sysadmin installs a custom python command with a different major version than the distribution default. Distributions can test whether they are fully following this convention by changing the python interpreter on a test box and checking to see if anything breaks. * If the above point is adhered to and sysadmins are permitted to change the python command, then the python command should always be implemented as a link to the interpreter binary (or a link to a link) and not vice versa. That way, if a sysadmin does decide to replace the installed python file, they can do so without inadvertently deleting the previously installed binary. * If the Python 2 interpreter becomes uncommon, scripts should nevertheless continue to use the python3 convention rather that just python. This will ease transition in the event that yet another major version of Python is released. * If these conventions are adhered to, it will become the case that the python command is only executed in an interactive manner as a user convenience, or to run scripts that are source compatible with both Python 2 and Python 3.

It's important to read *all* of the migration notes in PEP 394, not just the parts that agree with your preferred plan - back when we wrote it, we did put a fair bit of thought into how a distribution could migrate their own software to Python 3 without breaking end user scripts that are still expecting Python 2.

It may be that in trying to be diplomatic about what Arch did, we softened the wording too much so it can be interpreted as "what the Arch folks did wasn't clearly denounced, so we can copy them rather than doing what the PEP suggests!" :(

...

...
...
So first three reason:

Python 2 is old - how is that a half-truth?

C is older. Let's get rid of that first. Old is not a reason to switch or get rid of it.

Python 2 has Python 3 as a successor, C does not (yes, I know you'll say C++, but this is really not the case).

A more accurate comparison would be the switch from K&R C to ANSI C (and that took a long time).

Python 2.7 is a stable, well liked language. While the core development team plans to stop making new binary releases of 2.7 in 2015, it will still be receiving upstream source only security releases for some time after that.

...

...
...

Slower - yes, in the beginning, Python 3 was significantly slower because

of nonoptimal code after the rewrite from Python 3. But with Python 3.3 for instance, you get tons of speed improvements - decimal module for instance got a significant boost. Brett Cannon had a nice presentation about speed benchmarking [1]. Yes, Python 3 is slower in some areas, but mostly it's faster.

pypy is faster and mostly python2 compatible. Many shops that want speed are switching to that rather than python3. (Not saying this is a good idea for a distro to do. I'm just saying that making the speed argument is not compelling).

people don't use python for raw speed of processing. They really just care if it's fast enough. People who write python code would be happy to take speed increases if they were free. But python2 to python3 requires porting code so it comes with a significant cost. Speed is a side effect of switching, not a reason to switch.

It is one of the reasons to switch for me.

One key advantage of Python 3.3+ is drastically reduced memory usage for applications that deal almost entirely in unicode strings (courtesy of PEP 393).

Cheers, Nick.

-- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

Andrew McNabb

18 Jul 18 Jul

10:56 p.m.

On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...

From packaging point of view, this will probably require:

Renaming python package to python2

Renaming python3 package to python

Switching the %{?with_python3} conditionals in specfiles to %{?with_python2} (we will probably create a script to automate this, at least partially)

Renaming the python package to python2 kind of makes sense, but renaming the python3 package to python seems needlessly confusing. Wouldn't it make sense to just keep python2 and python3 side by side without ambiguity until some long future date when python2 disappears?

-- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

Nick Coghlan

19 Jul 19 Jul

3:50 a.m.

On 07/19/2013 01:56 PM, Andrew McNabb wrote:

...

On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...
From packaging point of view, this will probably require:

Renaming python package to python2

Renaming python3 package to python

Switching the %{?with_python3} conditionals in specfiles to %{?with_python2} (we will probably create a script to automate this, at least partially)

Renaming the python package to python2 kind of makes sense, but renaming the python3 package to python seems needlessly confusing. Wouldn't it make sense to just keep python2 and python3 side by side without ambiguity until some long future date when python2 disappears?

I wrote PEP 394 after Arch forced the issue (by switching the python symlink to Python 3), and my preferred/suggested approach is to actually declare "/usr/bin/python" the domain of the user/sysadmin, and have all system packages use the qualified python3 naming.

Although, if PEP 432 comes to fruition, then we may be able to have a shiny new pysystem (or some other name) that has all the defaults flipped to lock things down (i.e. ignoring user settings) by the time Fedora gets to Python 3 by default.

Also (switching hats back to the one in my sig). If the default installation client changes, that could mean some fun for Beaker (although I guess we already support alternate installation tools on the older RHEL releases...). Manageable, but glad I'm not finding out about this when someone files a bug complaining that they can't install a new Fedora release in Beaker :)

Cheers, Nick.

-- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

Nick Coghlan

21 Jul 21 Jul

6:36 p.m.

On 07/19/2013 06:50 PM, Nick Coghlan wrote:

...

On 07/19/2013 01:56 PM, Andrew McNabb wrote:

...
On Thu, Jul 18, 2013 at 11:24:22AM -0400, Bohuslav Kabrda wrote:

...
From packaging point of view, this will probably require:

Renaming python package to python2

Renaming python3 package to python

Switching the %{?with_python3} conditionals in specfiles to %{?with_python2} (we will probably create a script to automate this, at least partially)

Renaming the python package to python2 kind of makes sense, but renaming the python3 package to python seems needlessly confusing. Wouldn't it make sense to just keep python2 and python3 side by side without ambiguity until some long future date when python2 disappears?

I wrote PEP 394 after Arch forced the issue (by switching the python symlink to Python 3),

Oops, credit where it's due: Kerrick wrote the initial version, then I altered it quite a bit during the subsequent discussions on python-dev :)

Cheers, Nick.

-- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Testing Solutions Team Lead Beaker Development Lead (http://beaker-project.org/)

3938

Age (days ago)

3943

Last active (days ago)

python-devel@lists.fedoraproject.org

13 comments

4 participants

tags (0)

participants (4)

Andrew McNabb
Bohuslav Kabrda
Nick Coghlan
Toshio Kuratomi