mirror of
https://github.com/jart/cosmopolitan.git
synced 2025-05-31 09:42:27 +00:00
python-3.6.zip added from Github
README.cosmo contains the necessary links.
This commit is contained in:
parent
75fc601ff5
commit
0c4c56ff39
4219 changed files with 1968626 additions and 0 deletions
765
third_party/python/Doc/howto/argparse.rst
vendored
Normal file
765
third_party/python/Doc/howto/argparse.rst
vendored
Normal file
|
@ -0,0 +1,765 @@
|
|||
*****************
|
||||
Argparse Tutorial
|
||||
*****************
|
||||
|
||||
:author: Tshepang Lekhonkhobe
|
||||
|
||||
.. _argparse-tutorial:
|
||||
|
||||
This tutorial is intended to be a gentle introduction to :mod:`argparse`, the
|
||||
recommended command-line parsing module in the Python standard library.
|
||||
|
||||
.. note::
|
||||
|
||||
There are two other modules that fulfill the same task, namely
|
||||
:mod:`getopt` (an equivalent for :c:func:`getopt` from the C
|
||||
language) and the deprecated :mod:`optparse`.
|
||||
Note also that :mod:`argparse` is based on :mod:`optparse`,
|
||||
and therefore very similar in terms of usage.
|
||||
|
||||
|
||||
Concepts
|
||||
========
|
||||
|
||||
Let's show the sort of functionality that we are going to explore in this
|
||||
introductory tutorial by making use of the :command:`ls` command:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ ls
|
||||
cpython devguide prog.py pypy rm-unused-function.patch
|
||||
$ ls pypy
|
||||
ctypes_configure demo dotviewer include lib_pypy lib-python ...
|
||||
$ ls -l
|
||||
total 20
|
||||
drwxr-xr-x 19 wena wena 4096 Feb 18 18:51 cpython
|
||||
drwxr-xr-x 4 wena wena 4096 Feb 8 12:04 devguide
|
||||
-rwxr-xr-x 1 wena wena 535 Feb 19 00:05 prog.py
|
||||
drwxr-xr-x 14 wena wena 4096 Feb 7 00:59 pypy
|
||||
-rw-r--r-- 1 wena wena 741 Feb 18 01:01 rm-unused-function.patch
|
||||
$ ls --help
|
||||
Usage: ls [OPTION]... [FILE]...
|
||||
List information about the FILEs (the current directory by default).
|
||||
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
|
||||
...
|
||||
|
||||
A few concepts we can learn from the four commands:
|
||||
|
||||
* The :command:`ls` command is useful when run without any options at all. It defaults
|
||||
to displaying the contents of the current directory.
|
||||
|
||||
* If we want beyond what it provides by default, we tell it a bit more. In
|
||||
this case, we want it to display a different directory, ``pypy``.
|
||||
What we did is specify what is known as a positional argument. It's named so
|
||||
because the program should know what to do with the value, solely based on
|
||||
where it appears on the command line. This concept is more relevant
|
||||
to a command like :command:`cp`, whose most basic usage is ``cp SRC DEST``.
|
||||
The first position is *what you want copied,* and the second
|
||||
position is *where you want it copied to*.
|
||||
|
||||
* Now, say we want to change behaviour of the program. In our example,
|
||||
we display more info for each file instead of just showing the file names.
|
||||
The ``-l`` in that case is known as an optional argument.
|
||||
|
||||
* That's a snippet of the help text. It's very useful in that you can
|
||||
come across a program you have never used before, and can figure out
|
||||
how it works simply by reading its help text.
|
||||
|
||||
|
||||
The basics
|
||||
==========
|
||||
|
||||
Let us start with a very simple example which does (almost) nothing::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.parse_args()
|
||||
|
||||
Following is a result of running the code:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h]
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
$ python3 prog.py --verbose
|
||||
usage: prog.py [-h]
|
||||
prog.py: error: unrecognized arguments: --verbose
|
||||
$ python3 prog.py foo
|
||||
usage: prog.py [-h]
|
||||
prog.py: error: unrecognized arguments: foo
|
||||
|
||||
Here is what is happening:
|
||||
|
||||
* Running the script without any options results in nothing displayed to
|
||||
stdout. Not so useful.
|
||||
|
||||
* The second one starts to display the usefulness of the :mod:`argparse`
|
||||
module. We have done almost nothing, but already we get a nice help message.
|
||||
|
||||
* The ``--help`` option, which can also be shortened to ``-h``, is the only
|
||||
option we get for free (i.e. no need to specify it). Specifying anything
|
||||
else results in an error. But even then, we do get a useful usage message,
|
||||
also for free.
|
||||
|
||||
|
||||
Introducing Positional arguments
|
||||
================================
|
||||
|
||||
An example::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("echo")
|
||||
args = parser.parse_args()
|
||||
print(args.echo)
|
||||
|
||||
And running the code:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py
|
||||
usage: prog.py [-h] echo
|
||||
prog.py: error: the following arguments are required: echo
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h] echo
|
||||
|
||||
positional arguments:
|
||||
echo
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
$ python3 prog.py foo
|
||||
foo
|
||||
|
||||
Here is what's happening:
|
||||
|
||||
* We've added the :meth:`add_argument` method, which is what we use to specify
|
||||
which command-line options the program is willing to accept. In this case,
|
||||
I've named it ``echo`` so that it's in line with its function.
|
||||
|
||||
* Calling our program now requires us to specify an option.
|
||||
|
||||
* The :meth:`parse_args` method actually returns some data from the
|
||||
options specified, in this case, ``echo``.
|
||||
|
||||
* The variable is some form of 'magic' that :mod:`argparse` performs for free
|
||||
(i.e. no need to specify which variable that value is stored in).
|
||||
You will also notice that its name matches the string argument given
|
||||
to the method, ``echo``.
|
||||
|
||||
Note however that, although the help display looks nice and all, it currently
|
||||
is not as helpful as it can be. For example we see that we got ``echo`` as a
|
||||
positional argument, but we don't know what it does, other than by guessing or
|
||||
by reading the source code. So, let's make it a bit more useful::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("echo", help="echo the string you use here")
|
||||
args = parser.parse_args()
|
||||
print(args.echo)
|
||||
|
||||
And we get:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py -h
|
||||
usage: prog.py [-h] echo
|
||||
|
||||
positional arguments:
|
||||
echo echo the string you use here
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
|
||||
Now, how about doing something even more useful::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", help="display a square of a given number")
|
||||
args = parser.parse_args()
|
||||
print(args.square**2)
|
||||
|
||||
Following is a result of running the code:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4
|
||||
Traceback (most recent call last):
|
||||
File "prog.py", line 5, in <module>
|
||||
print(args.square**2)
|
||||
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
|
||||
|
||||
That didn't go so well. That's because :mod:`argparse` treats the options we
|
||||
give it as strings, unless we tell it otherwise. So, let's tell
|
||||
:mod:`argparse` to treat that input as an integer::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", help="display a square of a given number",
|
||||
type=int)
|
||||
args = parser.parse_args()
|
||||
print(args.square**2)
|
||||
|
||||
Following is a result of running the code:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4
|
||||
16
|
||||
$ python3 prog.py four
|
||||
usage: prog.py [-h] square
|
||||
prog.py: error: argument square: invalid int value: 'four'
|
||||
|
||||
That went well. The program now even helpfully quits on bad illegal input
|
||||
before proceeding.
|
||||
|
||||
|
||||
Introducing Optional arguments
|
||||
==============================
|
||||
|
||||
So far we have been playing with positional arguments. Let us
|
||||
have a look on how to add optional ones::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--verbosity", help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
if args.verbosity:
|
||||
print("verbosity turned on")
|
||||
|
||||
And the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py --verbosity 1
|
||||
verbosity turned on
|
||||
$ python3 prog.py
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h] [--verbosity VERBOSITY]
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--verbosity VERBOSITY
|
||||
increase output verbosity
|
||||
$ python3 prog.py --verbosity
|
||||
usage: prog.py [-h] [--verbosity VERBOSITY]
|
||||
prog.py: error: argument --verbosity: expected one argument
|
||||
|
||||
Here is what is happening:
|
||||
|
||||
* The program is written so as to display something when ``--verbosity`` is
|
||||
specified and display nothing when not.
|
||||
|
||||
* To show that the option is actually optional, there is no error when running
|
||||
the program without it. Note that by default, if an optional argument isn't
|
||||
used, the relevant variable, in this case :attr:`args.verbosity`, is
|
||||
given ``None`` as a value, which is the reason it fails the truth
|
||||
test of the :keyword:`if` statement.
|
||||
|
||||
* The help message is a bit different.
|
||||
|
||||
* When using the ``--verbosity`` option, one must also specify some value,
|
||||
any value.
|
||||
|
||||
The above example accepts arbitrary integer values for ``--verbosity``, but for
|
||||
our simple program, only two values are actually useful, ``True`` or ``False``.
|
||||
Let's modify the code accordingly::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--verbose", help="increase output verbosity",
|
||||
action="store_true")
|
||||
args = parser.parse_args()
|
||||
if args.verbose:
|
||||
print("verbosity turned on")
|
||||
|
||||
And the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py --verbose
|
||||
verbosity turned on
|
||||
$ python3 prog.py --verbose 1
|
||||
usage: prog.py [-h] [--verbose]
|
||||
prog.py: error: unrecognized arguments: 1
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h] [--verbose]
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--verbose increase output verbosity
|
||||
|
||||
Here is what is happening:
|
||||
|
||||
* The option is now more of a flag than something that requires a value.
|
||||
We even changed the name of the option to match that idea.
|
||||
Note that we now specify a new keyword, ``action``, and give it the value
|
||||
``"store_true"``. This means that, if the option is specified,
|
||||
assign the value ``True`` to :data:`args.verbose`.
|
||||
Not specifying it implies ``False``.
|
||||
|
||||
* It complains when you specify a value, in true spirit of what flags
|
||||
actually are.
|
||||
|
||||
* Notice the different help text.
|
||||
|
||||
|
||||
Short options
|
||||
-------------
|
||||
|
||||
If you are familiar with command line usage,
|
||||
you will notice that I haven't yet touched on the topic of short
|
||||
versions of the options. It's quite simple::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("-v", "--verbose", help="increase output verbosity",
|
||||
action="store_true")
|
||||
args = parser.parse_args()
|
||||
if args.verbose:
|
||||
print("verbosity turned on")
|
||||
|
||||
And here goes:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py -v
|
||||
verbosity turned on
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h] [-v]
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-v, --verbose increase output verbosity
|
||||
|
||||
Note that the new ability is also reflected in the help text.
|
||||
|
||||
|
||||
Combining Positional and Optional arguments
|
||||
===========================================
|
||||
|
||||
Our program keeps growing in complexity::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display a square of a given number")
|
||||
parser.add_argument("-v", "--verbose", action="store_true",
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
if args.verbose:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
And now the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py
|
||||
usage: prog.py [-h] [-v] square
|
||||
prog.py: error: the following arguments are required: square
|
||||
$ python3 prog.py 4
|
||||
16
|
||||
$ python3 prog.py 4 --verbose
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py --verbose 4
|
||||
the square of 4 equals 16
|
||||
|
||||
* We've brought back a positional argument, hence the complaint.
|
||||
|
||||
* Note that the order does not matter.
|
||||
|
||||
How about we give this program of ours back the ability to have
|
||||
multiple verbosity values, and actually get to use them::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display a square of a given number")
|
||||
parser.add_argument("-v", "--verbosity", type=int,
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
if args.verbosity == 2:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
elif args.verbosity == 1:
|
||||
print("{}^2 == {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
And the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4
|
||||
16
|
||||
$ python3 prog.py 4 -v
|
||||
usage: prog.py [-h] [-v VERBOSITY] square
|
||||
prog.py: error: argument -v/--verbosity: expected one argument
|
||||
$ python3 prog.py 4 -v 1
|
||||
4^2 == 16
|
||||
$ python3 prog.py 4 -v 2
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py 4 -v 3
|
||||
16
|
||||
|
||||
These all look good except the last one, which exposes a bug in our program.
|
||||
Let's fix it by restricting the values the ``--verbosity`` option can accept::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display a square of a given number")
|
||||
parser.add_argument("-v", "--verbosity", type=int, choices=[0, 1, 2],
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
if args.verbosity == 2:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
elif args.verbosity == 1:
|
||||
print("{}^2 == {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
And the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4 -v 3
|
||||
usage: prog.py [-h] [-v {0,1,2}] square
|
||||
prog.py: error: argument -v/--verbosity: invalid choice: 3 (choose from 0, 1, 2)
|
||||
$ python3 prog.py 4 -h
|
||||
usage: prog.py [-h] [-v {0,1,2}] square
|
||||
|
||||
positional arguments:
|
||||
square display a square of a given number
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-v {0,1,2}, --verbosity {0,1,2}
|
||||
increase output verbosity
|
||||
|
||||
Note that the change also reflects both in the error message as well as the
|
||||
help string.
|
||||
|
||||
Now, let's use a different approach of playing with verbosity, which is pretty
|
||||
common. It also matches the way the CPython executable handles its own
|
||||
verbosity argument (check the output of ``python --help``)::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display the square of a given number")
|
||||
parser.add_argument("-v", "--verbosity", action="count",
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
if args.verbosity == 2:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
elif args.verbosity == 1:
|
||||
print("{}^2 == {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
We have introduced another action, "count",
|
||||
to count the number of occurrences of a specific optional arguments:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4
|
||||
16
|
||||
$ python3 prog.py 4 -v
|
||||
4^2 == 16
|
||||
$ python3 prog.py 4 -vv
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py 4 --verbosity --verbosity
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py 4 -v 1
|
||||
usage: prog.py [-h] [-v] square
|
||||
prog.py: error: unrecognized arguments: 1
|
||||
$ python3 prog.py 4 -h
|
||||
usage: prog.py [-h] [-v] square
|
||||
|
||||
positional arguments:
|
||||
square display a square of a given number
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-v, --verbosity increase output verbosity
|
||||
$ python3 prog.py 4 -vvv
|
||||
16
|
||||
|
||||
* Yes, it's now more of a flag (similar to ``action="store_true"``) in the
|
||||
previous version of our script. That should explain the complaint.
|
||||
|
||||
* It also behaves similar to "store_true" action.
|
||||
|
||||
* Now here's a demonstration of what the "count" action gives. You've probably
|
||||
seen this sort of usage before.
|
||||
|
||||
* And if you don't specify the ``-v`` flag, that flag is considered to have
|
||||
``None`` value.
|
||||
|
||||
* As should be expected, specifying the long form of the flag, we should get
|
||||
the same output.
|
||||
|
||||
* Sadly, our help output isn't very informative on the new ability our script
|
||||
has acquired, but that can always be fixed by improving the documentation for
|
||||
our script (e.g. via the ``help`` keyword argument).
|
||||
|
||||
* That last output exposes a bug in our program.
|
||||
|
||||
|
||||
Let's fix::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display a square of a given number")
|
||||
parser.add_argument("-v", "--verbosity", action="count",
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
|
||||
# bugfix: replace == with >=
|
||||
if args.verbosity >= 2:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
elif args.verbosity >= 1:
|
||||
print("{}^2 == {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
And this is what it gives:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4 -vvv
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py 4 -vvvv
|
||||
the square of 4 equals 16
|
||||
$ python3 prog.py 4
|
||||
Traceback (most recent call last):
|
||||
File "prog.py", line 11, in <module>
|
||||
if args.verbosity >= 2:
|
||||
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
|
||||
|
||||
|
||||
* First output went well, and fixes the bug we had before.
|
||||
That is, we want any value >= 2 to be as verbose as possible.
|
||||
|
||||
* Third output not so good.
|
||||
|
||||
Let's fix that bug::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("square", type=int,
|
||||
help="display a square of a given number")
|
||||
parser.add_argument("-v", "--verbosity", action="count", default=0,
|
||||
help="increase output verbosity")
|
||||
args = parser.parse_args()
|
||||
answer = args.square**2
|
||||
if args.verbosity >= 2:
|
||||
print("the square of {} equals {}".format(args.square, answer))
|
||||
elif args.verbosity >= 1:
|
||||
print("{}^2 == {}".format(args.square, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
We've just introduced yet another keyword, ``default``.
|
||||
We've set it to ``0`` in order to make it comparable to the other int values.
|
||||
Remember that by default,
|
||||
if an optional argument isn't specified,
|
||||
it gets the ``None`` value, and that cannot be compared to an int value
|
||||
(hence the :exc:`TypeError` exception).
|
||||
|
||||
And:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4
|
||||
16
|
||||
|
||||
You can go quite far just with what we've learned so far,
|
||||
and we have only scratched the surface.
|
||||
The :mod:`argparse` module is very powerful,
|
||||
and we'll explore a bit more of it before we end this tutorial.
|
||||
|
||||
|
||||
Getting a little more advanced
|
||||
==============================
|
||||
|
||||
What if we wanted to expand our tiny program to perform other powers,
|
||||
not just squares::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("x", type=int, help="the base")
|
||||
parser.add_argument("y", type=int, help="the exponent")
|
||||
parser.add_argument("-v", "--verbosity", action="count", default=0)
|
||||
args = parser.parse_args()
|
||||
answer = args.x**args.y
|
||||
if args.verbosity >= 2:
|
||||
print("{} to the power {} equals {}".format(args.x, args.y, answer))
|
||||
elif args.verbosity >= 1:
|
||||
print("{}^{} == {}".format(args.x, args.y, answer))
|
||||
else:
|
||||
print(answer)
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py
|
||||
usage: prog.py [-h] [-v] x y
|
||||
prog.py: error: the following arguments are required: x, y
|
||||
$ python3 prog.py -h
|
||||
usage: prog.py [-h] [-v] x y
|
||||
|
||||
positional arguments:
|
||||
x the base
|
||||
y the exponent
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-v, --verbosity
|
||||
$ python3 prog.py 4 2 -v
|
||||
4^2 == 16
|
||||
|
||||
|
||||
Notice that so far we've been using verbosity level to *change* the text
|
||||
that gets displayed. The following example instead uses verbosity level
|
||||
to display *more* text instead::
|
||||
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("x", type=int, help="the base")
|
||||
parser.add_argument("y", type=int, help="the exponent")
|
||||
parser.add_argument("-v", "--verbosity", action="count", default=0)
|
||||
args = parser.parse_args()
|
||||
answer = args.x**args.y
|
||||
if args.verbosity >= 2:
|
||||
print("Running '{}'".format(__file__))
|
||||
if args.verbosity >= 1:
|
||||
print("{}^{} == ".format(args.x, args.y), end="")
|
||||
print(answer)
|
||||
|
||||
Output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4 2
|
||||
16
|
||||
$ python3 prog.py 4 2 -v
|
||||
4^2 == 16
|
||||
$ python3 prog.py 4 2 -vv
|
||||
Running 'prog.py'
|
||||
4^2 == 16
|
||||
|
||||
|
||||
Conflicting options
|
||||
-------------------
|
||||
|
||||
So far, we have been working with two methods of an
|
||||
:class:`argparse.ArgumentParser` instance. Let's introduce a third one,
|
||||
:meth:`add_mutually_exclusive_group`. It allows for us to specify options that
|
||||
conflict with each other. Let's also change the rest of the program so that
|
||||
the new functionality makes more sense:
|
||||
we'll introduce the ``--quiet`` option,
|
||||
which will be the opposite of the ``--verbose`` one::
|
||||
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
group = parser.add_mutually_exclusive_group()
|
||||
group.add_argument("-v", "--verbose", action="store_true")
|
||||
group.add_argument("-q", "--quiet", action="store_true")
|
||||
parser.add_argument("x", type=int, help="the base")
|
||||
parser.add_argument("y", type=int, help="the exponent")
|
||||
args = parser.parse_args()
|
||||
answer = args.x**args.y
|
||||
|
||||
if args.quiet:
|
||||
print(answer)
|
||||
elif args.verbose:
|
||||
print("{} to the power {} equals {}".format(args.x, args.y, answer))
|
||||
else:
|
||||
print("{}^{} == {}".format(args.x, args.y, answer))
|
||||
|
||||
Our program is now simpler, and we've lost some functionality for the sake of
|
||||
demonstration. Anyways, here's the output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py 4 2
|
||||
4^2 == 16
|
||||
$ python3 prog.py 4 2 -q
|
||||
16
|
||||
$ python3 prog.py 4 2 -v
|
||||
4 to the power 2 equals 16
|
||||
$ python3 prog.py 4 2 -vq
|
||||
usage: prog.py [-h] [-v | -q] x y
|
||||
prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
|
||||
$ python3 prog.py 4 2 -v --quiet
|
||||
usage: prog.py [-h] [-v | -q] x y
|
||||
prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
|
||||
|
||||
That should be easy to follow. I've added that last output so you can see the
|
||||
sort of flexibility you get, i.e. mixing long form options with short form
|
||||
ones.
|
||||
|
||||
Before we conclude, you probably want to tell your users the main purpose of
|
||||
your program, just in case they don't know::
|
||||
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="calculate X to the power of Y")
|
||||
group = parser.add_mutually_exclusive_group()
|
||||
group.add_argument("-v", "--verbose", action="store_true")
|
||||
group.add_argument("-q", "--quiet", action="store_true")
|
||||
parser.add_argument("x", type=int, help="the base")
|
||||
parser.add_argument("y", type=int, help="the exponent")
|
||||
args = parser.parse_args()
|
||||
answer = args.x**args.y
|
||||
|
||||
if args.quiet:
|
||||
print(answer)
|
||||
elif args.verbose:
|
||||
print("{} to the power {} equals {}".format(args.x, args.y, answer))
|
||||
else:
|
||||
print("{}^{} == {}".format(args.x, args.y, answer))
|
||||
|
||||
Note that slight difference in the usage text. Note the ``[-v | -q]``,
|
||||
which tells us that we can either use ``-v`` or ``-q``,
|
||||
but not both at the same time:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ python3 prog.py --help
|
||||
usage: prog.py [-h] [-v | -q] x y
|
||||
|
||||
calculate X to the power of Y
|
||||
|
||||
positional arguments:
|
||||
x the base
|
||||
y the exponent
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-v, --verbose
|
||||
-q, --quiet
|
||||
|
||||
|
||||
Conclusion
|
||||
==========
|
||||
|
||||
The :mod:`argparse` module offers a lot more than shown here.
|
||||
Its docs are quite detailed and thorough, and full of examples.
|
||||
Having gone through this tutorial, you should easily digest them
|
||||
without feeling overwhelmed.
|
1734
third_party/python/Doc/howto/clinic.rst
vendored
Normal file
1734
third_party/python/Doc/howto/clinic.rst
vendored
Normal file
File diff suppressed because it is too large
Load diff
257
third_party/python/Doc/howto/cporting.rst
vendored
Normal file
257
third_party/python/Doc/howto/cporting.rst
vendored
Normal file
|
@ -0,0 +1,257 @@
|
|||
.. highlightlang:: c
|
||||
|
||||
.. _cporting-howto:
|
||||
|
||||
*************************************
|
||||
Porting Extension Modules to Python 3
|
||||
*************************************
|
||||
|
||||
:author: Benjamin Peterson
|
||||
|
||||
|
||||
.. topic:: Abstract
|
||||
|
||||
Although changing the C-API was not one of Python 3's objectives,
|
||||
the many Python-level changes made leaving Python 2's API intact
|
||||
impossible. In fact, some changes such as :func:`int` and
|
||||
:func:`long` unification are more obvious on the C level. This
|
||||
document endeavors to document incompatibilities and how they can
|
||||
be worked around.
|
||||
|
||||
|
||||
Conditional compilation
|
||||
=======================
|
||||
|
||||
The easiest way to compile only some code for Python 3 is to check
|
||||
if :c:macro:`PY_MAJOR_VERSION` is greater than or equal to 3. ::
|
||||
|
||||
#if PY_MAJOR_VERSION >= 3
|
||||
#define IS_PY3K
|
||||
#endif
|
||||
|
||||
API functions that are not present can be aliased to their equivalents within
|
||||
conditional blocks.
|
||||
|
||||
|
||||
Changes to Object APIs
|
||||
======================
|
||||
|
||||
Python 3 merged together some types with similar functions while cleanly
|
||||
separating others.
|
||||
|
||||
|
||||
str/unicode Unification
|
||||
-----------------------
|
||||
|
||||
Python 3's :func:`str` type is equivalent to Python 2's :func:`unicode`; the C
|
||||
functions are called ``PyUnicode_*`` for both. The old 8-bit string type has become
|
||||
:func:`bytes`, with C functions called ``PyBytes_*``. Python 2.6 and later provide a compatibility header,
|
||||
:file:`bytesobject.h`, mapping ``PyBytes`` names to ``PyString`` ones. For best
|
||||
compatibility with Python 3, :c:type:`PyUnicode` should be used for textual data and
|
||||
:c:type:`PyBytes` for binary data. It's also important to remember that
|
||||
:c:type:`PyBytes` and :c:type:`PyUnicode` in Python 3 are not interchangeable like
|
||||
:c:type:`PyString` and :c:type:`PyUnicode` are in Python 2. The following example
|
||||
shows best practices with regards to :c:type:`PyUnicode`, :c:type:`PyString`,
|
||||
and :c:type:`PyBytes`. ::
|
||||
|
||||
#include "stdlib.h"
|
||||
#include "Python.h"
|
||||
#include "bytesobject.h"
|
||||
|
||||
/* text example */
|
||||
static PyObject *
|
||||
say_hello(PyObject *self, PyObject *args) {
|
||||
PyObject *name, *result;
|
||||
|
||||
if (!PyArg_ParseTuple(args, "U:say_hello", &name))
|
||||
return NULL;
|
||||
|
||||
result = PyUnicode_FromFormat("Hello, %S!", name);
|
||||
return result;
|
||||
}
|
||||
|
||||
/* just a forward */
|
||||
static char * do_encode(PyObject *);
|
||||
|
||||
/* bytes example */
|
||||
static PyObject *
|
||||
encode_object(PyObject *self, PyObject *args) {
|
||||
char *encoded;
|
||||
PyObject *result, *myobj;
|
||||
|
||||
if (!PyArg_ParseTuple(args, "O:encode_object", &myobj))
|
||||
return NULL;
|
||||
|
||||
encoded = do_encode(myobj);
|
||||
if (encoded == NULL)
|
||||
return NULL;
|
||||
result = PyBytes_FromString(encoded);
|
||||
free(encoded);
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
long/int Unification
|
||||
--------------------
|
||||
|
||||
Python 3 has only one integer type, :func:`int`. But it actually
|
||||
corresponds to Python 2's :func:`long` type—the :func:`int` type
|
||||
used in Python 2 was removed. In the C-API, ``PyInt_*`` functions
|
||||
are replaced by their ``PyLong_*`` equivalents.
|
||||
|
||||
|
||||
Module initialization and state
|
||||
===============================
|
||||
|
||||
Python 3 has a revamped extension module initialization system. (See
|
||||
:pep:`3121`.) Instead of storing module state in globals, they should
|
||||
be stored in an interpreter specific structure. Creating modules that
|
||||
act correctly in both Python 2 and Python 3 is tricky. The following
|
||||
simple example demonstrates how. ::
|
||||
|
||||
#include "Python.h"
|
||||
|
||||
struct module_state {
|
||||
PyObject *error;
|
||||
};
|
||||
|
||||
#if PY_MAJOR_VERSION >= 3
|
||||
#define GETSTATE(m) ((struct module_state*)PyModule_GetState(m))
|
||||
#else
|
||||
#define GETSTATE(m) (&_state)
|
||||
static struct module_state _state;
|
||||
#endif
|
||||
|
||||
static PyObject *
|
||||
error_out(PyObject *m) {
|
||||
struct module_state *st = GETSTATE(m);
|
||||
PyErr_SetString(st->error, "something bad happened");
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static PyMethodDef myextension_methods[] = {
|
||||
{"error_out", (PyCFunction)error_out, METH_NOARGS, NULL},
|
||||
{NULL, NULL}
|
||||
};
|
||||
|
||||
#if PY_MAJOR_VERSION >= 3
|
||||
|
||||
static int myextension_traverse(PyObject *m, visitproc visit, void *arg) {
|
||||
Py_VISIT(GETSTATE(m)->error);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int myextension_clear(PyObject *m) {
|
||||
Py_CLEAR(GETSTATE(m)->error);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
static struct PyModuleDef moduledef = {
|
||||
PyModuleDef_HEAD_INIT,
|
||||
"myextension",
|
||||
NULL,
|
||||
sizeof(struct module_state),
|
||||
myextension_methods,
|
||||
NULL,
|
||||
myextension_traverse,
|
||||
myextension_clear,
|
||||
NULL
|
||||
};
|
||||
|
||||
#define INITERROR return NULL
|
||||
|
||||
PyMODINIT_FUNC
|
||||
PyInit_myextension(void)
|
||||
|
||||
#else
|
||||
#define INITERROR return
|
||||
|
||||
void
|
||||
initmyextension(void)
|
||||
#endif
|
||||
{
|
||||
#if PY_MAJOR_VERSION >= 3
|
||||
PyObject *module = PyModule_Create(&moduledef);
|
||||
#else
|
||||
PyObject *module = Py_InitModule("myextension", myextension_methods);
|
||||
#endif
|
||||
|
||||
if (module == NULL)
|
||||
INITERROR;
|
||||
struct module_state *st = GETSTATE(module);
|
||||
|
||||
st->error = PyErr_NewException("myextension.Error", NULL, NULL);
|
||||
if (st->error == NULL) {
|
||||
Py_DECREF(module);
|
||||
INITERROR;
|
||||
}
|
||||
|
||||
#if PY_MAJOR_VERSION >= 3
|
||||
return module;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
CObject replaced with Capsule
|
||||
=============================
|
||||
|
||||
The :c:type:`Capsule` object was introduced in Python 3.1 and 2.7 to replace
|
||||
:c:type:`CObject`. CObjects were useful,
|
||||
but the :c:type:`CObject` API was problematic: it didn't permit distinguishing
|
||||
between valid CObjects, which allowed mismatched CObjects to crash the
|
||||
interpreter, and some of its APIs relied on undefined behavior in C.
|
||||
(For further reading on the rationale behind Capsules, please see :issue:`5630`.)
|
||||
|
||||
If you're currently using CObjects, and you want to migrate to 3.1 or newer,
|
||||
you'll need to switch to Capsules.
|
||||
:c:type:`CObject` was deprecated in 3.1 and 2.7 and completely removed in
|
||||
Python 3.2. If you only support 2.7, or 3.1 and above, you
|
||||
can simply switch to :c:type:`Capsule`. If you need to support Python 3.0,
|
||||
or versions of Python earlier than 2.7,
|
||||
you'll have to support both CObjects and Capsules.
|
||||
(Note that Python 3.0 is no longer supported, and it is not recommended
|
||||
for production use.)
|
||||
|
||||
The following example header file :file:`capsulethunk.h` may
|
||||
solve the problem for you. Simply write your code against the
|
||||
:c:type:`Capsule` API and include this header file after
|
||||
:file:`Python.h`. Your code will automatically use Capsules
|
||||
in versions of Python with Capsules, and switch to CObjects
|
||||
when Capsules are unavailable.
|
||||
|
||||
:file:`capsulethunk.h` simulates Capsules using CObjects. However,
|
||||
:c:type:`CObject` provides no place to store the capsule's "name". As a
|
||||
result the simulated :c:type:`Capsule` objects created by :file:`capsulethunk.h`
|
||||
behave slightly differently from real Capsules. Specifically:
|
||||
|
||||
* The name parameter passed in to :c:func:`PyCapsule_New` is ignored.
|
||||
|
||||
* The name parameter passed in to :c:func:`PyCapsule_IsValid` and
|
||||
:c:func:`PyCapsule_GetPointer` is ignored, and no error checking
|
||||
of the name is performed.
|
||||
|
||||
* :c:func:`PyCapsule_GetName` always returns NULL.
|
||||
|
||||
* :c:func:`PyCapsule_SetName` always raises an exception and
|
||||
returns failure. (Since there's no way to store a name
|
||||
in a CObject, noisy failure of :c:func:`PyCapsule_SetName`
|
||||
was deemed preferable to silent failure here. If this is
|
||||
inconvenient, feel free to modify your local
|
||||
copy as you see fit.)
|
||||
|
||||
You can find :file:`capsulethunk.h` in the Python source distribution
|
||||
as :source:`Doc/includes/capsulethunk.h`. We also include it here for
|
||||
your convenience:
|
||||
|
||||
.. literalinclude:: ../includes/capsulethunk.h
|
||||
|
||||
|
||||
|
||||
Other options
|
||||
=============
|
||||
|
||||
If you are writing a new extension module, you might consider `Cython
|
||||
<http://cython.org/>`_. It translates a Python-like language to C. The
|
||||
extension modules it creates are compatible with Python 3 and Python 2.
|
||||
|
552
third_party/python/Doc/howto/curses.rst
vendored
Normal file
552
third_party/python/Doc/howto/curses.rst
vendored
Normal file
|
@ -0,0 +1,552 @@
|
|||
.. _curses-howto:
|
||||
|
||||
**********************************
|
||||
Curses Programming with Python
|
||||
**********************************
|
||||
|
||||
:Author: A.M. Kuchling, Eric S. Raymond
|
||||
:Release: 2.04
|
||||
|
||||
|
||||
.. topic:: Abstract
|
||||
|
||||
This document describes how to use the :mod:`curses` extension
|
||||
module to control text-mode displays.
|
||||
|
||||
|
||||
What is curses?
|
||||
===============
|
||||
|
||||
The curses library supplies a terminal-independent screen-painting and
|
||||
keyboard-handling facility for text-based terminals; such terminals
|
||||
include VT100s, the Linux console, and the simulated terminal provided
|
||||
by various programs. Display terminals support various control codes
|
||||
to perform common operations such as moving the cursor, scrolling the
|
||||
screen, and erasing areas. Different terminals use widely differing
|
||||
codes, and often have their own minor quirks.
|
||||
|
||||
In a world of graphical displays, one might ask "why bother"? It's
|
||||
true that character-cell display terminals are an obsolete technology,
|
||||
but there are niches in which being able to do fancy things with them
|
||||
are still valuable. One niche is on small-footprint or embedded
|
||||
Unixes that don't run an X server. Another is tools such as OS
|
||||
installers and kernel configurators that may have to run before any
|
||||
graphical support is available.
|
||||
|
||||
The curses library provides fairly basic functionality, providing the
|
||||
programmer with an abstraction of a display containing multiple
|
||||
non-overlapping windows of text. The contents of a window can be
|
||||
changed in various ways---adding text, erasing it, changing its
|
||||
appearance---and the curses library will figure out what control codes
|
||||
need to be sent to the terminal to produce the right output. curses
|
||||
doesn't provide many user-interface concepts such as buttons, checkboxes,
|
||||
or dialogs; if you need such features, consider a user interface library such as
|
||||
`Urwid <https://pypi.org/project/urwid/>`_.
|
||||
|
||||
The curses library was originally written for BSD Unix; the later System V
|
||||
versions of Unix from AT&T added many enhancements and new functions. BSD curses
|
||||
is no longer maintained, having been replaced by ncurses, which is an
|
||||
open-source implementation of the AT&T interface. If you're using an
|
||||
open-source Unix such as Linux or FreeBSD, your system almost certainly uses
|
||||
ncurses. Since most current commercial Unix versions are based on System V
|
||||
code, all the functions described here will probably be available. The older
|
||||
versions of curses carried by some proprietary Unixes may not support
|
||||
everything, though.
|
||||
|
||||
The Windows version of Python doesn't include the :mod:`curses`
|
||||
module. A ported version called `UniCurses
|
||||
<https://pypi.org/project/UniCurses>`_ is available. You could
|
||||
also try `the Console module <http://effbot.org/zone/console-index.htm>`_
|
||||
written by Fredrik Lundh, which doesn't
|
||||
use the same API as curses but provides cursor-addressable text output
|
||||
and full support for mouse and keyboard input.
|
||||
|
||||
|
||||
The Python curses module
|
||||
------------------------
|
||||
|
||||
The Python module is a fairly simple wrapper over the C functions provided by
|
||||
curses; if you're already familiar with curses programming in C, it's really
|
||||
easy to transfer that knowledge to Python. The biggest difference is that the
|
||||
Python interface makes things simpler by merging different C functions such as
|
||||
:c:func:`addstr`, :c:func:`mvaddstr`, and :c:func:`mvwaddstr` into a single
|
||||
:meth:`~curses.window.addstr` method. You'll see this covered in more
|
||||
detail later.
|
||||
|
||||
This HOWTO is an introduction to writing text-mode programs with curses
|
||||
and Python. It doesn't attempt to be a complete guide to the curses API; for
|
||||
that, see the Python library guide's section on ncurses, and the C manual pages
|
||||
for ncurses. It will, however, give you the basic ideas.
|
||||
|
||||
|
||||
Starting and ending a curses application
|
||||
========================================
|
||||
|
||||
Before doing anything, curses must be initialized. This is done by
|
||||
calling the :func:`~curses.initscr` function, which will determine the
|
||||
terminal type, send any required setup codes to the terminal, and
|
||||
create various internal data structures. If successful,
|
||||
:func:`initscr` returns a window object representing the entire
|
||||
screen; this is usually called ``stdscr`` after the name of the
|
||||
corresponding C variable. ::
|
||||
|
||||
import curses
|
||||
stdscr = curses.initscr()
|
||||
|
||||
Usually curses applications turn off automatic echoing of keys to the
|
||||
screen, in order to be able to read keys and only display them under
|
||||
certain circumstances. This requires calling the
|
||||
:func:`~curses.noecho` function. ::
|
||||
|
||||
curses.noecho()
|
||||
|
||||
Applications will also commonly need to react to keys instantly,
|
||||
without requiring the Enter key to be pressed; this is called cbreak
|
||||
mode, as opposed to the usual buffered input mode. ::
|
||||
|
||||
curses.cbreak()
|
||||
|
||||
Terminals usually return special keys, such as the cursor keys or navigation
|
||||
keys such as Page Up and Home, as a multibyte escape sequence. While you could
|
||||
write your application to expect such sequences and process them accordingly,
|
||||
curses can do it for you, returning a special value such as
|
||||
:const:`curses.KEY_LEFT`. To get curses to do the job, you'll have to enable
|
||||
keypad mode. ::
|
||||
|
||||
stdscr.keypad(True)
|
||||
|
||||
Terminating a curses application is much easier than starting one. You'll need
|
||||
to call::
|
||||
|
||||
curses.nocbreak()
|
||||
stdscr.keypad(False)
|
||||
curses.echo()
|
||||
|
||||
to reverse the curses-friendly terminal settings. Then call the
|
||||
:func:`~curses.endwin` function to restore the terminal to its original
|
||||
operating mode. ::
|
||||
|
||||
curses.endwin()
|
||||
|
||||
A common problem when debugging a curses application is to get your terminal
|
||||
messed up when the application dies without restoring the terminal to its
|
||||
previous state. In Python this commonly happens when your code is buggy and
|
||||
raises an uncaught exception. Keys are no longer echoed to the screen when
|
||||
you type them, for example, which makes using the shell difficult.
|
||||
|
||||
In Python you can avoid these complications and make debugging much easier by
|
||||
importing the :func:`curses.wrapper` function and using it like this::
|
||||
|
||||
from curses import wrapper
|
||||
|
||||
def main(stdscr):
|
||||
# Clear screen
|
||||
stdscr.clear()
|
||||
|
||||
# This raises ZeroDivisionError when i == 10.
|
||||
for i in range(0, 11):
|
||||
v = i-10
|
||||
stdscr.addstr(i, 0, '10 divided by {} is {}'.format(v, 10/v))
|
||||
|
||||
stdscr.refresh()
|
||||
stdscr.getkey()
|
||||
|
||||
wrapper(main)
|
||||
|
||||
The :func:`~curses.wrapper` function takes a callable object and does the
|
||||
initializations described above, also initializing colors if color
|
||||
support is present. :func:`wrapper` then runs your provided callable.
|
||||
Once the callable returns, :func:`wrapper` will restore the original
|
||||
state of the terminal. The callable is called inside a
|
||||
:keyword:`try`...\ :keyword:`except` that catches exceptions, restores
|
||||
the state of the terminal, and then re-raises the exception. Therefore
|
||||
your terminal won't be left in a funny state on exception and you'll be
|
||||
able to read the exception's message and traceback.
|
||||
|
||||
|
||||
Windows and Pads
|
||||
================
|
||||
|
||||
Windows are the basic abstraction in curses. A window object represents a
|
||||
rectangular area of the screen, and supports methods to display text,
|
||||
erase it, allow the user to input strings, and so forth.
|
||||
|
||||
The ``stdscr`` object returned by the :func:`~curses.initscr` function is a
|
||||
window object that covers the entire screen. Many programs may need
|
||||
only this single window, but you might wish to divide the screen into
|
||||
smaller windows, in order to redraw or clear them separately. The
|
||||
:func:`~curses.newwin` function creates a new window of a given size,
|
||||
returning the new window object. ::
|
||||
|
||||
begin_x = 20; begin_y = 7
|
||||
height = 5; width = 40
|
||||
win = curses.newwin(height, width, begin_y, begin_x)
|
||||
|
||||
Note that the coordinate system used in curses is unusual.
|
||||
Coordinates are always passed in the order *y,x*, and the top-left
|
||||
corner of a window is coordinate (0,0). This breaks the normal
|
||||
convention for handling coordinates where the *x* coordinate comes
|
||||
first. This is an unfortunate difference from most other computer
|
||||
applications, but it's been part of curses since it was first written,
|
||||
and it's too late to change things now.
|
||||
|
||||
Your application can determine the size of the screen by using the
|
||||
:data:`curses.LINES` and :data:`curses.COLS` variables to obtain the *y* and
|
||||
*x* sizes. Legal coordinates will then extend from ``(0,0)`` to
|
||||
``(curses.LINES - 1, curses.COLS - 1)``.
|
||||
|
||||
When you call a method to display or erase text, the effect doesn't
|
||||
immediately show up on the display. Instead you must call the
|
||||
:meth:`~curses.window.refresh` method of window objects to update the
|
||||
screen.
|
||||
|
||||
This is because curses was originally written with slow 300-baud
|
||||
terminal connections in mind; with these terminals, minimizing the
|
||||
time required to redraw the screen was very important. Instead curses
|
||||
accumulates changes to the screen and displays them in the most
|
||||
efficient manner when you call :meth:`refresh`. For example, if your
|
||||
program displays some text in a window and then clears the window,
|
||||
there's no need to send the original text because they're never
|
||||
visible.
|
||||
|
||||
In practice, explicitly telling curses to redraw a window doesn't
|
||||
really complicate programming with curses much. Most programs go into a flurry
|
||||
of activity, and then pause waiting for a keypress or some other action on the
|
||||
part of the user. All you have to do is to be sure that the screen has been
|
||||
redrawn before pausing to wait for user input, by first calling
|
||||
``stdscr.refresh()`` or the :meth:`refresh` method of some other relevant
|
||||
window.
|
||||
|
||||
A pad is a special case of a window; it can be larger than the actual display
|
||||
screen, and only a portion of the pad displayed at a time. Creating a pad
|
||||
requires the pad's height and width, while refreshing a pad requires giving the
|
||||
coordinates of the on-screen area where a subsection of the pad will be
|
||||
displayed. ::
|
||||
|
||||
pad = curses.newpad(100, 100)
|
||||
# These loops fill the pad with letters; addch() is
|
||||
# explained in the next section
|
||||
for y in range(0, 99):
|
||||
for x in range(0, 99):
|
||||
pad.addch(y,x, ord('a') + (x*x+y*y) % 26)
|
||||
|
||||
# Displays a section of the pad in the middle of the screen.
|
||||
# (0,0) : coordinate of upper-left corner of pad area to display.
|
||||
# (5,5) : coordinate of upper-left corner of window area to be filled
|
||||
# with pad content.
|
||||
# (20, 75) : coordinate of lower-right corner of window area to be
|
||||
# : filled with pad content.
|
||||
pad.refresh( 0,0, 5,5, 20,75)
|
||||
|
||||
The :meth:`refresh` call displays a section of the pad in the rectangle
|
||||
extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
|
||||
left corner of the displayed section is coordinate (0,0) on the pad. Beyond
|
||||
that difference, pads are exactly like ordinary windows and support the same
|
||||
methods.
|
||||
|
||||
If you have multiple windows and pads on screen there is a more
|
||||
efficient way to update the screen and prevent annoying screen flicker
|
||||
as each part of the screen gets updated. :meth:`refresh` actually
|
||||
does two things:
|
||||
|
||||
1) Calls the :meth:`~curses.window.noutrefresh` method of each window
|
||||
to update an underlying data structure representing the desired
|
||||
state of the screen.
|
||||
2) Calls the function :func:`~curses.doupdate` function to change the
|
||||
physical screen to match the desired state recorded in the data structure.
|
||||
|
||||
Instead you can call :meth:`noutrefresh` on a number of windows to
|
||||
update the data structure, and then call :func:`doupdate` to update
|
||||
the screen.
|
||||
|
||||
|
||||
Displaying Text
|
||||
===============
|
||||
|
||||
From a C programmer's point of view, curses may sometimes look like a
|
||||
twisty maze of functions, all subtly different. For example,
|
||||
:c:func:`addstr` displays a string at the current cursor location in
|
||||
the ``stdscr`` window, while :c:func:`mvaddstr` moves to a given y,x
|
||||
coordinate first before displaying the string. :c:func:`waddstr` is just
|
||||
like :c:func:`addstr`, but allows specifying a window to use instead of
|
||||
using ``stdscr`` by default. :c:func:`mvwaddstr` allows specifying both
|
||||
a window and a coordinate.
|
||||
|
||||
Fortunately the Python interface hides all these details. ``stdscr``
|
||||
is a window object like any other, and methods such as
|
||||
:meth:`~curses.window.addstr` accept multiple argument forms. Usually there
|
||||
are four different forms.
|
||||
|
||||
+---------------------------------+-----------------------------------------------+
|
||||
| Form | Description |
|
||||
+=================================+===============================================+
|
||||
| *str* or *ch* | Display the string *str* or character *ch* at |
|
||||
| | the current position |
|
||||
+---------------------------------+-----------------------------------------------+
|
||||
| *str* or *ch*, *attr* | Display the string *str* or character *ch*, |
|
||||
| | using attribute *attr* at the current |
|
||||
| | position |
|
||||
+---------------------------------+-----------------------------------------------+
|
||||
| *y*, *x*, *str* or *ch* | Move to position *y,x* within the window, and |
|
||||
| | display *str* or *ch* |
|
||||
+---------------------------------+-----------------------------------------------+
|
||||
| *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and |
|
||||
| | display *str* or *ch*, using attribute *attr* |
|
||||
+---------------------------------+-----------------------------------------------+
|
||||
|
||||
Attributes allow displaying text in highlighted forms such as boldface,
|
||||
underline, reverse code, or in color. They'll be explained in more detail in
|
||||
the next subsection.
|
||||
|
||||
|
||||
The :meth:`~curses.window.addstr` method takes a Python string or
|
||||
bytestring as the value to be displayed. The contents of bytestrings
|
||||
are sent to the terminal as-is. Strings are encoded to bytes using
|
||||
the value of the window's :attr:`encoding` attribute; this defaults to
|
||||
the default system encoding as returned by
|
||||
:func:`locale.getpreferredencoding`.
|
||||
|
||||
The :meth:`~curses.window.addch` methods take a character, which can be
|
||||
either a string of length 1, a bytestring of length 1, or an integer.
|
||||
|
||||
Constants are provided for extension characters; these constants are
|
||||
integers greater than 255. For example, :const:`ACS_PLMINUS` is a +/-
|
||||
symbol, and :const:`ACS_ULCORNER` is the upper left corner of a box
|
||||
(handy for drawing borders). You can also use the appropriate Unicode
|
||||
character.
|
||||
|
||||
Windows remember where the cursor was left after the last operation, so if you
|
||||
leave out the *y,x* coordinates, the string or character will be displayed
|
||||
wherever the last operation left off. You can also move the cursor with the
|
||||
``move(y,x)`` method. Because some terminals always display a flashing cursor,
|
||||
you may want to ensure that the cursor is positioned in some location where it
|
||||
won't be distracting; it can be confusing to have the cursor blinking at some
|
||||
apparently random location.
|
||||
|
||||
If your application doesn't need a blinking cursor at all, you can
|
||||
call ``curs_set(False)`` to make it invisible. For compatibility
|
||||
with older curses versions, there's a ``leaveok(bool)`` function
|
||||
that's a synonym for :func:`~curses.curs_set`. When *bool* is true, the
|
||||
curses library will attempt to suppress the flashing cursor, and you
|
||||
won't need to worry about leaving it in odd locations.
|
||||
|
||||
|
||||
Attributes and Color
|
||||
--------------------
|
||||
|
||||
Characters can be displayed in different ways. Status lines in a text-based
|
||||
application are commonly shown in reverse video, or a text viewer may need to
|
||||
highlight certain words. curses supports this by allowing you to specify an
|
||||
attribute for each cell on the screen.
|
||||
|
||||
An attribute is an integer, each bit representing a different
|
||||
attribute. You can try to display text with multiple attribute bits
|
||||
set, but curses doesn't guarantee that all the possible combinations
|
||||
are available, or that they're all visually distinct. That depends on
|
||||
the ability of the terminal being used, so it's safest to stick to the
|
||||
most commonly available attributes, listed here.
|
||||
|
||||
+----------------------+--------------------------------------+
|
||||
| Attribute | Description |
|
||||
+======================+======================================+
|
||||
| :const:`A_BLINK` | Blinking text |
|
||||
+----------------------+--------------------------------------+
|
||||
| :const:`A_BOLD` | Extra bright or bold text |
|
||||
+----------------------+--------------------------------------+
|
||||
| :const:`A_DIM` | Half bright text |
|
||||
+----------------------+--------------------------------------+
|
||||
| :const:`A_REVERSE` | Reverse-video text |
|
||||
+----------------------+--------------------------------------+
|
||||
| :const:`A_STANDOUT` | The best highlighting mode available |
|
||||
+----------------------+--------------------------------------+
|
||||
| :const:`A_UNDERLINE` | Underlined text |
|
||||
+----------------------+--------------------------------------+
|
||||
|
||||
So, to display a reverse-video status line on the top line of the screen, you
|
||||
could code::
|
||||
|
||||
stdscr.addstr(0, 0, "Current mode: Typing mode",
|
||||
curses.A_REVERSE)
|
||||
stdscr.refresh()
|
||||
|
||||
The curses library also supports color on those terminals that provide it. The
|
||||
most common such terminal is probably the Linux console, followed by color
|
||||
xterms.
|
||||
|
||||
To use color, you must call the :func:`~curses.start_color` function soon
|
||||
after calling :func:`~curses.initscr`, to initialize the default color set
|
||||
(the :func:`curses.wrapper` function does this automatically). Once that's
|
||||
done, the :func:`~curses.has_colors` function returns TRUE if the terminal
|
||||
in use can
|
||||
actually display color. (Note: curses uses the American spelling 'color',
|
||||
instead of the Canadian/British spelling 'colour'. If you're used to the
|
||||
British spelling, you'll have to resign yourself to misspelling it for the sake
|
||||
of these functions.)
|
||||
|
||||
The curses library maintains a finite number of color pairs, containing a
|
||||
foreground (or text) color and a background color. You can get the attribute
|
||||
value corresponding to a color pair with the :func:`~curses.color_pair`
|
||||
function; this can be bitwise-OR'ed with other attributes such as
|
||||
:const:`A_REVERSE`, but again, such combinations are not guaranteed to work
|
||||
on all terminals.
|
||||
|
||||
An example, which displays a line of text using color pair 1::
|
||||
|
||||
stdscr.addstr("Pretty text", curses.color_pair(1))
|
||||
stdscr.refresh()
|
||||
|
||||
As I said before, a color pair consists of a foreground and background color.
|
||||
The ``init_pair(n, f, b)`` function changes the definition of color pair *n*, to
|
||||
foreground color f and background color b. Color pair 0 is hard-wired to white
|
||||
on black, and cannot be changed.
|
||||
|
||||
Colors are numbered, and :func:`start_color` initializes 8 basic
|
||||
colors when it activates color mode. They are: 0:black, 1:red,
|
||||
2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white. The :mod:`curses`
|
||||
module defines named constants for each of these colors:
|
||||
:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
|
||||
|
||||
Let's put all this together. To change color 1 to red text on a white
|
||||
background, you would call::
|
||||
|
||||
curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE)
|
||||
|
||||
When you change a color pair, any text already displayed using that color pair
|
||||
will change to the new colors. You can also display new text in this color
|
||||
with::
|
||||
|
||||
stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1))
|
||||
|
||||
Very fancy terminals can change the definitions of the actual colors to a given
|
||||
RGB value. This lets you change color 1, which is usually red, to purple or
|
||||
blue or any other color you like. Unfortunately, the Linux console doesn't
|
||||
support this, so I'm unable to try it out, and can't provide any examples. You
|
||||
can check if your terminal can do this by calling
|
||||
:func:`~curses.can_change_color`, which returns ``True`` if the capability is
|
||||
there. If you're lucky enough to have such a talented terminal, consult your
|
||||
system's man pages for more information.
|
||||
|
||||
|
||||
User Input
|
||||
==========
|
||||
|
||||
The C curses library offers only very simple input mechanisms. Python's
|
||||
:mod:`curses` module adds a basic text-input widget. (Other libraries
|
||||
such as `Urwid <https://pypi.org/project/urwid/>`_ have more extensive
|
||||
collections of widgets.)
|
||||
|
||||
There are two methods for getting input from a window:
|
||||
|
||||
* :meth:`~curses.window.getch` refreshes the screen and then waits for
|
||||
the user to hit a key, displaying the key if :func:`~curses.echo` has been
|
||||
called earlier. You can optionally specify a coordinate to which
|
||||
the cursor should be moved before pausing.
|
||||
|
||||
* :meth:`~curses.window.getkey` does the same thing but converts the
|
||||
integer to a string. Individual characters are returned as
|
||||
1-character strings, and special keys such as function keys return
|
||||
longer strings containing a key name such as ``KEY_UP`` or ``^G``.
|
||||
|
||||
It's possible to not wait for the user using the
|
||||
:meth:`~curses.window.nodelay` window method. After ``nodelay(True)``,
|
||||
:meth:`getch` and :meth:`getkey` for the window become
|
||||
non-blocking. To signal that no input is ready, :meth:`getch` returns
|
||||
``curses.ERR`` (a value of -1) and :meth:`getkey` raises an exception.
|
||||
There's also a :func:`~curses.halfdelay` function, which can be used to (in
|
||||
effect) set a timer on each :meth:`getch`; if no input becomes
|
||||
available within a specified delay (measured in tenths of a second),
|
||||
curses raises an exception.
|
||||
|
||||
The :meth:`getch` method returns an integer; if it's between 0 and 255, it
|
||||
represents the ASCII code of the key pressed. Values greater than 255 are
|
||||
special keys such as Page Up, Home, or the cursor keys. You can compare the
|
||||
value returned to constants such as :const:`curses.KEY_PPAGE`,
|
||||
:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`. The main loop of
|
||||
your program may look something like this::
|
||||
|
||||
while True:
|
||||
c = stdscr.getch()
|
||||
if c == ord('p'):
|
||||
PrintDocument()
|
||||
elif c == ord('q'):
|
||||
break # Exit the while loop
|
||||
elif c == curses.KEY_HOME:
|
||||
x = y = 0
|
||||
|
||||
The :mod:`curses.ascii` module supplies ASCII class membership functions that
|
||||
take either integer or 1-character string arguments; these may be useful in
|
||||
writing more readable tests for such loops. It also supplies
|
||||
conversion functions that take either integer or 1-character-string arguments
|
||||
and return the same type. For example, :func:`curses.ascii.ctrl` returns the
|
||||
control character corresponding to its argument.
|
||||
|
||||
There's also a method to retrieve an entire string,
|
||||
:meth:`~curses.window.getstr`. It isn't used very often, because its
|
||||
functionality is quite limited; the only editing keys available are
|
||||
the backspace key and the Enter key, which terminates the string. It
|
||||
can optionally be limited to a fixed number of characters. ::
|
||||
|
||||
curses.echo() # Enable echoing of characters
|
||||
|
||||
# Get a 15-character string, with the cursor on the top line
|
||||
s = stdscr.getstr(0,0, 15)
|
||||
|
||||
The :mod:`curses.textpad` module supplies a text box that supports an
|
||||
Emacs-like set of keybindings. Various methods of the
|
||||
:class:`~curses.textpad.Textbox` class support editing with input
|
||||
validation and gathering the edit results either with or without
|
||||
trailing spaces. Here's an example::
|
||||
|
||||
import curses
|
||||
from curses.textpad import Textbox, rectangle
|
||||
|
||||
def main(stdscr):
|
||||
stdscr.addstr(0, 0, "Enter IM message: (hit Ctrl-G to send)")
|
||||
|
||||
editwin = curses.newwin(5,30, 2,1)
|
||||
rectangle(stdscr, 1,0, 1+5+1, 1+30+1)
|
||||
stdscr.refresh()
|
||||
|
||||
box = Textbox(editwin)
|
||||
|
||||
# Let the user edit until Ctrl-G is struck.
|
||||
box.edit()
|
||||
|
||||
# Get resulting contents
|
||||
message = box.gather()
|
||||
|
||||
See the library documentation on :mod:`curses.textpad` for more details.
|
||||
|
||||
|
||||
For More Information
|
||||
====================
|
||||
|
||||
This HOWTO doesn't cover some advanced topics, such as reading the
|
||||
contents of the screen or capturing mouse events from an xterm
|
||||
instance, but the Python library page for the :mod:`curses` module is now
|
||||
reasonably complete. You should browse it next.
|
||||
|
||||
If you're in doubt about the detailed behavior of the curses
|
||||
functions, consult the manual pages for your curses implementation,
|
||||
whether it's ncurses or a proprietary Unix vendor's. The manual pages
|
||||
will document any quirks, and provide complete lists of all the
|
||||
functions, attributes, and :const:`ACS_\*` characters available to
|
||||
you.
|
||||
|
||||
Because the curses API is so large, some functions aren't supported in
|
||||
the Python interface. Often this isn't because they're difficult to
|
||||
implement, but because no one has needed them yet. Also, Python
|
||||
doesn't yet support the menu library associated with ncurses.
|
||||
Patches adding support for these would be welcome; see
|
||||
`the Python Developer's Guide <https://devguide.python.org/>`_ to
|
||||
learn more about submitting patches to Python.
|
||||
|
||||
* `Writing Programs with NCURSES <http://invisible-island.net/ncurses/ncurses-intro.html>`_:
|
||||
a lengthy tutorial for C programmers.
|
||||
* `The ncurses man page <http://linux.die.net/man/3/ncurses>`_
|
||||
* `The ncurses FAQ <http://invisible-island.net/ncurses/ncurses.faq.html>`_
|
||||
* `"Use curses... don't swear" <https://www.youtube.com/watch?v=eN1eZtjLEnU>`_:
|
||||
video of a PyCon 2013 talk on controlling terminals using curses or Urwid.
|
||||
* `"Console Applications with Urwid" <http://www.pyvideo.org/video/1568/console-applications-with-urwid>`_:
|
||||
video of a PyCon CA 2012 talk demonstrating some applications written using
|
||||
Urwid.
|
443
third_party/python/Doc/howto/descriptor.rst
vendored
Normal file
443
third_party/python/Doc/howto/descriptor.rst
vendored
Normal file
|
@ -0,0 +1,443 @@
|
|||
======================
|
||||
Descriptor HowTo Guide
|
||||
======================
|
||||
|
||||
:Author: Raymond Hettinger
|
||||
:Contact: <python at rcn dot com>
|
||||
|
||||
.. Contents::
|
||||
|
||||
Abstract
|
||||
--------
|
||||
|
||||
Defines descriptors, summarizes the protocol, and shows how descriptors are
|
||||
called. Examines a custom descriptor and several built-in python descriptors
|
||||
including functions, properties, static methods, and class methods. Shows how
|
||||
each works by giving a pure Python equivalent and a sample application.
|
||||
|
||||
Learning about descriptors not only provides access to a larger toolset, it
|
||||
creates a deeper understanding of how Python works and an appreciation for the
|
||||
elegance of its design.
|
||||
|
||||
|
||||
Definition and Introduction
|
||||
---------------------------
|
||||
|
||||
In general, a descriptor is an object attribute with "binding behavior", one
|
||||
whose attribute access has been overridden by methods in the descriptor
|
||||
protocol. Those methods are :meth:`__get__`, :meth:`__set__`, and
|
||||
:meth:`__delete__`. If any of those methods are defined for an object, it is
|
||||
said to be a descriptor.
|
||||
|
||||
The default behavior for attribute access is to get, set, or delete the
|
||||
attribute from an object's dictionary. For instance, ``a.x`` has a lookup chain
|
||||
starting with ``a.__dict__['x']``, then ``type(a).__dict__['x']``, and
|
||||
continuing through the base classes of ``type(a)`` excluding metaclasses. If the
|
||||
looked-up value is an object defining one of the descriptor methods, then Python
|
||||
may override the default behavior and invoke the descriptor method instead.
|
||||
Where this occurs in the precedence chain depends on which descriptor methods
|
||||
were defined.
|
||||
|
||||
Descriptors are a powerful, general purpose protocol. They are the mechanism
|
||||
behind properties, methods, static methods, class methods, and :func:`super()`.
|
||||
They are used throughout Python itself to implement the new style classes
|
||||
introduced in version 2.2. Descriptors simplify the underlying C-code and offer
|
||||
a flexible set of new tools for everyday Python programs.
|
||||
|
||||
|
||||
Descriptor Protocol
|
||||
-------------------
|
||||
|
||||
``descr.__get__(self, obj, type=None) --> value``
|
||||
|
||||
``descr.__set__(self, obj, value) --> None``
|
||||
|
||||
``descr.__delete__(self, obj) --> None``
|
||||
|
||||
That is all there is to it. Define any of these methods and an object is
|
||||
considered a descriptor and can override default behavior upon being looked up
|
||||
as an attribute.
|
||||
|
||||
If an object defines both :meth:`__get__` and :meth:`__set__`, it is considered
|
||||
a data descriptor. Descriptors that only define :meth:`__get__` are called
|
||||
non-data descriptors (they are typically used for methods but other uses are
|
||||
possible).
|
||||
|
||||
Data and non-data descriptors differ in how overrides are calculated with
|
||||
respect to entries in an instance's dictionary. If an instance's dictionary
|
||||
has an entry with the same name as a data descriptor, the data descriptor
|
||||
takes precedence. If an instance's dictionary has an entry with the same
|
||||
name as a non-data descriptor, the dictionary entry takes precedence.
|
||||
|
||||
To make a read-only data descriptor, define both :meth:`__get__` and
|
||||
:meth:`__set__` with the :meth:`__set__` raising an :exc:`AttributeError` when
|
||||
called. Defining the :meth:`__set__` method with an exception raising
|
||||
placeholder is enough to make it a data descriptor.
|
||||
|
||||
|
||||
Invoking Descriptors
|
||||
--------------------
|
||||
|
||||
A descriptor can be called directly by its method name. For example,
|
||||
``d.__get__(obj)``.
|
||||
|
||||
Alternatively, it is more common for a descriptor to be invoked automatically
|
||||
upon attribute access. For example, ``obj.d`` looks up ``d`` in the dictionary
|
||||
of ``obj``. If ``d`` defines the method :meth:`__get__`, then ``d.__get__(obj)``
|
||||
is invoked according to the precedence rules listed below.
|
||||
|
||||
The details of invocation depend on whether ``obj`` is an object or a class.
|
||||
|
||||
For objects, the machinery is in :meth:`object.__getattribute__` which
|
||||
transforms ``b.x`` into ``type(b).__dict__['x'].__get__(b, type(b))``. The
|
||||
implementation works through a precedence chain that gives data descriptors
|
||||
priority over instance variables, instance variables priority over non-data
|
||||
descriptors, and assigns lowest priority to :meth:`__getattr__` if provided.
|
||||
The full C implementation can be found in :c:func:`PyObject_GenericGetAttr()` in
|
||||
:source:`Objects/object.c`.
|
||||
|
||||
For classes, the machinery is in :meth:`type.__getattribute__` which transforms
|
||||
``B.x`` into ``B.__dict__['x'].__get__(None, B)``. In pure Python, it looks
|
||||
like::
|
||||
|
||||
def __getattribute__(self, key):
|
||||
"Emulate type_getattro() in Objects/typeobject.c"
|
||||
v = object.__getattribute__(self, key)
|
||||
if hasattr(v, '__get__'):
|
||||
return v.__get__(None, self)
|
||||
return v
|
||||
|
||||
The important points to remember are:
|
||||
|
||||
* descriptors are invoked by the :meth:`__getattribute__` method
|
||||
* overriding :meth:`__getattribute__` prevents automatic descriptor calls
|
||||
* :meth:`object.__getattribute__` and :meth:`type.__getattribute__` make
|
||||
different calls to :meth:`__get__`.
|
||||
* data descriptors always override instance dictionaries.
|
||||
* non-data descriptors may be overridden by instance dictionaries.
|
||||
|
||||
The object returned by ``super()`` also has a custom :meth:`__getattribute__`
|
||||
method for invoking descriptors. The call ``super(B, obj).m()`` searches
|
||||
``obj.__class__.__mro__`` for the base class ``A`` immediately following ``B``
|
||||
and then returns ``A.__dict__['m'].__get__(obj, B)``. If not a descriptor,
|
||||
``m`` is returned unchanged. If not in the dictionary, ``m`` reverts to a
|
||||
search using :meth:`object.__getattribute__`.
|
||||
|
||||
The implementation details are in :c:func:`super_getattro()` in
|
||||
:source:`Objects/typeobject.c`. and a pure Python equivalent can be found in
|
||||
`Guido's Tutorial`_.
|
||||
|
||||
.. _`Guido's Tutorial`: https://www.python.org/download/releases/2.2.3/descrintro/#cooperation
|
||||
|
||||
The details above show that the mechanism for descriptors is embedded in the
|
||||
:meth:`__getattribute__()` methods for :class:`object`, :class:`type`, and
|
||||
:func:`super`. Classes inherit this machinery when they derive from
|
||||
:class:`object` or if they have a meta-class providing similar functionality.
|
||||
Likewise, classes can turn-off descriptor invocation by overriding
|
||||
:meth:`__getattribute__()`.
|
||||
|
||||
|
||||
Descriptor Example
|
||||
------------------
|
||||
|
||||
The following code creates a class whose objects are data descriptors which
|
||||
print a message for each get or set. Overriding :meth:`__getattribute__` is
|
||||
alternate approach that could do this for every attribute. However, this
|
||||
descriptor is useful for monitoring just a few chosen attributes::
|
||||
|
||||
class RevealAccess(object):
|
||||
"""A data descriptor that sets and returns values
|
||||
normally and prints a message logging their access.
|
||||
"""
|
||||
|
||||
def __init__(self, initval=None, name='var'):
|
||||
self.val = initval
|
||||
self.name = name
|
||||
|
||||
def __get__(self, obj, objtype):
|
||||
print('Retrieving', self.name)
|
||||
return self.val
|
||||
|
||||
def __set__(self, obj, val):
|
||||
print('Updating', self.name)
|
||||
self.val = val
|
||||
|
||||
>>> class MyClass(object):
|
||||
... x = RevealAccess(10, 'var "x"')
|
||||
... y = 5
|
||||
...
|
||||
>>> m = MyClass()
|
||||
>>> m.x
|
||||
Retrieving var "x"
|
||||
10
|
||||
>>> m.x = 20
|
||||
Updating var "x"
|
||||
>>> m.x
|
||||
Retrieving var "x"
|
||||
20
|
||||
>>> m.y
|
||||
5
|
||||
|
||||
The protocol is simple and offers exciting possibilities. Several use cases are
|
||||
so common that they have been packaged into individual function calls.
|
||||
Properties, bound methods, static methods, and class methods are all
|
||||
based on the descriptor protocol.
|
||||
|
||||
|
||||
Properties
|
||||
----------
|
||||
|
||||
Calling :func:`property` is a succinct way of building a data descriptor that
|
||||
triggers function calls upon access to an attribute. Its signature is::
|
||||
|
||||
property(fget=None, fset=None, fdel=None, doc=None) -> property attribute
|
||||
|
||||
The documentation shows a typical use to define a managed attribute ``x``::
|
||||
|
||||
class C(object):
|
||||
def getx(self): return self.__x
|
||||
def setx(self, value): self.__x = value
|
||||
def delx(self): del self.__x
|
||||
x = property(getx, setx, delx, "I'm the 'x' property.")
|
||||
|
||||
To see how :func:`property` is implemented in terms of the descriptor protocol,
|
||||
here is a pure Python equivalent::
|
||||
|
||||
class Property(object):
|
||||
"Emulate PyProperty_Type() in Objects/descrobject.c"
|
||||
|
||||
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
|
||||
self.fget = fget
|
||||
self.fset = fset
|
||||
self.fdel = fdel
|
||||
if doc is None and fget is not None:
|
||||
doc = fget.__doc__
|
||||
self.__doc__ = doc
|
||||
|
||||
def __get__(self, obj, objtype=None):
|
||||
if obj is None:
|
||||
return self
|
||||
if self.fget is None:
|
||||
raise AttributeError("unreadable attribute")
|
||||
return self.fget(obj)
|
||||
|
||||
def __set__(self, obj, value):
|
||||
if self.fset is None:
|
||||
raise AttributeError("can't set attribute")
|
||||
self.fset(obj, value)
|
||||
|
||||
def __delete__(self, obj):
|
||||
if self.fdel is None:
|
||||
raise AttributeError("can't delete attribute")
|
||||
self.fdel(obj)
|
||||
|
||||
def getter(self, fget):
|
||||
return type(self)(fget, self.fset, self.fdel, self.__doc__)
|
||||
|
||||
def setter(self, fset):
|
||||
return type(self)(self.fget, fset, self.fdel, self.__doc__)
|
||||
|
||||
def deleter(self, fdel):
|
||||
return type(self)(self.fget, self.fset, fdel, self.__doc__)
|
||||
|
||||
The :func:`property` builtin helps whenever a user interface has granted
|
||||
attribute access and then subsequent changes require the intervention of a
|
||||
method.
|
||||
|
||||
For instance, a spreadsheet class may grant access to a cell value through
|
||||
``Cell('b10').value``. Subsequent improvements to the program require the cell
|
||||
to be recalculated on every access; however, the programmer does not want to
|
||||
affect existing client code accessing the attribute directly. The solution is
|
||||
to wrap access to the value attribute in a property data descriptor::
|
||||
|
||||
class Cell(object):
|
||||
. . .
|
||||
def getvalue(self):
|
||||
"Recalculate the cell before returning value"
|
||||
self.recalc()
|
||||
return self._value
|
||||
value = property(getvalue)
|
||||
|
||||
|
||||
Functions and Methods
|
||||
---------------------
|
||||
|
||||
Python's object oriented features are built upon a function based environment.
|
||||
Using non-data descriptors, the two are merged seamlessly.
|
||||
|
||||
Class dictionaries store methods as functions. In a class definition, methods
|
||||
are written using :keyword:`def` or :keyword:`lambda`, the usual tools for
|
||||
creating functions. Methods only differ from regular functions in that the
|
||||
first argument is reserved for the object instance. By Python convention, the
|
||||
instance reference is called *self* but may be called *this* or any other
|
||||
variable name.
|
||||
|
||||
To support method calls, functions include the :meth:`__get__` method for
|
||||
binding methods during attribute access. This means that all functions are
|
||||
non-data descriptors which return bound methods when they are invoked from an
|
||||
object. In pure python, it works like this::
|
||||
|
||||
class Function(object):
|
||||
. . .
|
||||
def __get__(self, obj, objtype=None):
|
||||
"Simulate func_descr_get() in Objects/funcobject.c"
|
||||
if obj is None:
|
||||
return self
|
||||
return types.MethodType(self, obj)
|
||||
|
||||
Running the interpreter shows how the function descriptor works in practice::
|
||||
|
||||
>>> class D(object):
|
||||
... def f(self, x):
|
||||
... return x
|
||||
...
|
||||
>>> d = D()
|
||||
|
||||
# Access through the class dictionary does not invoke __get__.
|
||||
# It just returns the underlying function object.
|
||||
>>> D.__dict__['f']
|
||||
<function D.f at 0x00C45070>
|
||||
|
||||
# Dotted access from a class calls __get__() which just returns
|
||||
# the underlying function unchanged.
|
||||
>>> D.f
|
||||
<function D.f at 0x00C45070>
|
||||
|
||||
# The function has a __qualname__ attribute to support introspection
|
||||
>>> D.f.__qualname__
|
||||
'D.f'
|
||||
|
||||
# Dotted access from an instance calls __get__() which returns the
|
||||
# function wrapped in a bound method object
|
||||
>>> d.f
|
||||
<bound method D.f of <__main__.D object at 0x00B18C90>>
|
||||
|
||||
# Internally, the bound method stores the underlying function,
|
||||
# the bound instance, and the class of the bound instance.
|
||||
>>> d.f.__func__
|
||||
<function D.f at 0x1012e5ae8>
|
||||
>>> d.f.__self__
|
||||
<__main__.D object at 0x1012e1f98>
|
||||
>>> d.f.__class__
|
||||
<class 'method'>
|
||||
|
||||
|
||||
Static Methods and Class Methods
|
||||
--------------------------------
|
||||
|
||||
Non-data descriptors provide a simple mechanism for variations on the usual
|
||||
patterns of binding functions into methods.
|
||||
|
||||
To recap, functions have a :meth:`__get__` method so that they can be converted
|
||||
to a method when accessed as attributes. The non-data descriptor transforms an
|
||||
``obj.f(*args)`` call into ``f(obj, *args)``. Calling ``klass.f(*args)``
|
||||
becomes ``f(*args)``.
|
||||
|
||||
This chart summarizes the binding and its two most useful variants:
|
||||
|
||||
+-----------------+----------------------+------------------+
|
||||
| Transformation | Called from an | Called from a |
|
||||
| | Object | Class |
|
||||
+=================+======================+==================+
|
||||
| function | f(obj, \*args) | f(\*args) |
|
||||
+-----------------+----------------------+------------------+
|
||||
| staticmethod | f(\*args) | f(\*args) |
|
||||
+-----------------+----------------------+------------------+
|
||||
| classmethod | f(type(obj), \*args) | f(klass, \*args) |
|
||||
+-----------------+----------------------+------------------+
|
||||
|
||||
Static methods return the underlying function without changes. Calling either
|
||||
``c.f`` or ``C.f`` is the equivalent of a direct lookup into
|
||||
``object.__getattribute__(c, "f")`` or ``object.__getattribute__(C, "f")``. As a
|
||||
result, the function becomes identically accessible from either an object or a
|
||||
class.
|
||||
|
||||
Good candidates for static methods are methods that do not reference the
|
||||
``self`` variable.
|
||||
|
||||
For instance, a statistics package may include a container class for
|
||||
experimental data. The class provides normal methods for computing the average,
|
||||
mean, median, and other descriptive statistics that depend on the data. However,
|
||||
there may be useful functions which are conceptually related but do not depend
|
||||
on the data. For instance, ``erf(x)`` is handy conversion routine that comes up
|
||||
in statistical work but does not directly depend on a particular dataset.
|
||||
It can be called either from an object or the class: ``s.erf(1.5) --> .9332`` or
|
||||
``Sample.erf(1.5) --> .9332``.
|
||||
|
||||
Since staticmethods return the underlying function with no changes, the example
|
||||
calls are unexciting::
|
||||
|
||||
>>> class E(object):
|
||||
... def f(x):
|
||||
... print(x)
|
||||
... f = staticmethod(f)
|
||||
...
|
||||
>>> print(E.f(3))
|
||||
3
|
||||
>>> print(E().f(3))
|
||||
3
|
||||
|
||||
Using the non-data descriptor protocol, a pure Python version of
|
||||
:func:`staticmethod` would look like this::
|
||||
|
||||
class StaticMethod(object):
|
||||
"Emulate PyStaticMethod_Type() in Objects/funcobject.c"
|
||||
|
||||
def __init__(self, f):
|
||||
self.f = f
|
||||
|
||||
def __get__(self, obj, objtype=None):
|
||||
return self.f
|
||||
|
||||
Unlike static methods, class methods prepend the class reference to the
|
||||
argument list before calling the function. This format is the same
|
||||
for whether the caller is an object or a class::
|
||||
|
||||
>>> class E(object):
|
||||
... def f(klass, x):
|
||||
... return klass.__name__, x
|
||||
... f = classmethod(f)
|
||||
...
|
||||
>>> print(E.f(3))
|
||||
('E', 3)
|
||||
>>> print(E().f(3))
|
||||
('E', 3)
|
||||
|
||||
|
||||
This behavior is useful whenever the function only needs to have a class
|
||||
reference and does not care about any underlying data. One use for classmethods
|
||||
is to create alternate class constructors. In Python 2.3, the classmethod
|
||||
:func:`dict.fromkeys` creates a new dictionary from a list of keys. The pure
|
||||
Python equivalent is::
|
||||
|
||||
class Dict(object):
|
||||
. . .
|
||||
def fromkeys(klass, iterable, value=None):
|
||||
"Emulate dict_fromkeys() in Objects/dictobject.c"
|
||||
d = klass()
|
||||
for key in iterable:
|
||||
d[key] = value
|
||||
return d
|
||||
fromkeys = classmethod(fromkeys)
|
||||
|
||||
Now a new dictionary of unique keys can be constructed like this::
|
||||
|
||||
>>> Dict.fromkeys('abracadabra')
|
||||
{'a': None, 'r': None, 'b': None, 'c': None, 'd': None}
|
||||
|
||||
Using the non-data descriptor protocol, a pure Python version of
|
||||
:func:`classmethod` would look like this::
|
||||
|
||||
class ClassMethod(object):
|
||||
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
|
||||
|
||||
def __init__(self, f):
|
||||
self.f = f
|
||||
|
||||
def __get__(self, obj, klass=None):
|
||||
if klass is None:
|
||||
klass = type(obj)
|
||||
def newfunc(*args):
|
||||
return self.f(klass, *args)
|
||||
return newfunc
|
||||
|
1261
third_party/python/Doc/howto/functional.rst
vendored
Normal file
1261
third_party/python/Doc/howto/functional.rst
vendored
Normal file
File diff suppressed because it is too large
Load diff
32
third_party/python/Doc/howto/index.rst
vendored
Normal file
32
third_party/python/Doc/howto/index.rst
vendored
Normal file
|
@ -0,0 +1,32 @@
|
|||
***************
|
||||
Python HOWTOs
|
||||
***************
|
||||
|
||||
Python HOWTOs are documents that cover a single, specific topic,
|
||||
and attempt to cover it fairly completely. Modelled on the Linux
|
||||
Documentation Project's HOWTO collection, this collection is an
|
||||
effort to foster documentation that's more detailed than the
|
||||
Python Library Reference.
|
||||
|
||||
Currently, the HOWTOs are:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
pyporting.rst
|
||||
cporting.rst
|
||||
curses.rst
|
||||
descriptor.rst
|
||||
functional.rst
|
||||
logging.rst
|
||||
logging-cookbook.rst
|
||||
regex.rst
|
||||
sockets.rst
|
||||
sorting.rst
|
||||
unicode.rst
|
||||
urllib2.rst
|
||||
argparse.rst
|
||||
ipaddress.rst
|
||||
clinic.rst
|
||||
instrumentation.rst
|
||||
|
412
third_party/python/Doc/howto/instrumentation.rst
vendored
Normal file
412
third_party/python/Doc/howto/instrumentation.rst
vendored
Normal file
|
@ -0,0 +1,412 @@
|
|||
.. highlight:: shell-session
|
||||
|
||||
.. _instrumentation:
|
||||
|
||||
===============================================
|
||||
Instrumenting CPython with DTrace and SystemTap
|
||||
===============================================
|
||||
|
||||
:author: David Malcolm
|
||||
:author: Łukasz Langa
|
||||
|
||||
DTrace and SystemTap are monitoring tools, each providing a way to inspect
|
||||
what the processes on a computer system are doing. They both use
|
||||
domain-specific languages allowing a user to write scripts which:
|
||||
|
||||
- filter which processes are to be observed
|
||||
- gather data from the processes of interest
|
||||
- generate reports on the data
|
||||
|
||||
As of Python 3.6, CPython can be built with embedded "markers", also
|
||||
known as "probes", that can be observed by a DTrace or SystemTap script,
|
||||
making it easier to monitor what the CPython processes on a system are
|
||||
doing.
|
||||
|
||||
.. impl-detail::
|
||||
|
||||
DTrace markers are implementation details of the CPython interpreter.
|
||||
No guarantees are made about probe compatibility between versions of
|
||||
CPython. DTrace scripts can stop working or work incorrectly without
|
||||
warning when changing CPython versions.
|
||||
|
||||
|
||||
Enabling the static markers
|
||||
---------------------------
|
||||
|
||||
macOS comes with built-in support for DTrace. On Linux, in order to
|
||||
build CPython with the embedded markers for SystemTap, the SystemTap
|
||||
development tools must be installed.
|
||||
|
||||
On a Linux machine, this can be done via::
|
||||
|
||||
$ yum install systemtap-sdt-devel
|
||||
|
||||
or::
|
||||
|
||||
$ sudo apt-get install systemtap-sdt-dev
|
||||
|
||||
|
||||
CPython must then be configured ``--with-dtrace``:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
checking for --with-dtrace... yes
|
||||
|
||||
On macOS, you can list available DTrace probes by running a Python
|
||||
process in the background and listing all probes made available by the
|
||||
Python provider::
|
||||
|
||||
$ python3.6 -q &
|
||||
$ sudo dtrace -l -P python$! # or: dtrace -l -m python3.6
|
||||
|
||||
ID PROVIDER MODULE FUNCTION NAME
|
||||
29564 python18035 python3.6 _PyEval_EvalFrameDefault function-entry
|
||||
29565 python18035 python3.6 dtrace_function_entry function-entry
|
||||
29566 python18035 python3.6 _PyEval_EvalFrameDefault function-return
|
||||
29567 python18035 python3.6 dtrace_function_return function-return
|
||||
29568 python18035 python3.6 collect gc-done
|
||||
29569 python18035 python3.6 collect gc-start
|
||||
29570 python18035 python3.6 _PyEval_EvalFrameDefault line
|
||||
29571 python18035 python3.6 maybe_dtrace_line line
|
||||
|
||||
On Linux, you can verify if the SystemTap static markers are present in
|
||||
the built binary by seeing if it contains a ".note.stapsdt" section.
|
||||
|
||||
::
|
||||
|
||||
$ readelf -S ./python | grep .note.stapsdt
|
||||
[30] .note.stapsdt NOTE 0000000000000000 00308d78
|
||||
|
||||
If you've built Python as a shared library (with --enable-shared), you
|
||||
need to look instead within the shared library. For example::
|
||||
|
||||
$ readelf -S libpython3.3dm.so.1.0 | grep .note.stapsdt
|
||||
[29] .note.stapsdt NOTE 0000000000000000 00365b68
|
||||
|
||||
Sufficiently modern readelf can print the metadata::
|
||||
|
||||
$ readelf -n ./python
|
||||
|
||||
Displaying notes found at file offset 0x00000254 with length 0x00000020:
|
||||
Owner Data size Description
|
||||
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
|
||||
OS: Linux, ABI: 2.6.32
|
||||
|
||||
Displaying notes found at file offset 0x00000274 with length 0x00000024:
|
||||
Owner Data size Description
|
||||
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
|
||||
Build ID: df924a2b08a7e89f6e11251d4602022977af2670
|
||||
|
||||
Displaying notes found at file offset 0x002d6c30 with length 0x00000144:
|
||||
Owner Data size Description
|
||||
stapsdt 0x00000031 NT_STAPSDT (SystemTap probe descriptors)
|
||||
Provider: python
|
||||
Name: gc__start
|
||||
Location: 0x00000000004371c3, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf6
|
||||
Arguments: -4@%ebx
|
||||
stapsdt 0x00000030 NT_STAPSDT (SystemTap probe descriptors)
|
||||
Provider: python
|
||||
Name: gc__done
|
||||
Location: 0x00000000004374e1, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf8
|
||||
Arguments: -8@%rax
|
||||
stapsdt 0x00000045 NT_STAPSDT (SystemTap probe descriptors)
|
||||
Provider: python
|
||||
Name: function__entry
|
||||
Location: 0x000000000053db6c, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6be8
|
||||
Arguments: 8@%rbp 8@%r12 -4@%eax
|
||||
stapsdt 0x00000046 NT_STAPSDT (SystemTap probe descriptors)
|
||||
Provider: python
|
||||
Name: function__return
|
||||
Location: 0x000000000053dba8, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bea
|
||||
Arguments: 8@%rbp 8@%r12 -4@%eax
|
||||
|
||||
The above metadata contains information for SystemTap describing how it
|
||||
can patch strategically-placed machine code instructions to enable the
|
||||
tracing hooks used by a SystemTap script.
|
||||
|
||||
|
||||
Static DTrace probes
|
||||
--------------------
|
||||
|
||||
The following example DTrace script can be used to show the call/return
|
||||
hierarchy of a Python script, only tracing within the invocation of
|
||||
a function called "start". In other words, import-time function
|
||||
invocations are not going to be listed:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
self int indent;
|
||||
|
||||
python$target:::function-entry
|
||||
/copyinstr(arg1) == "start"/
|
||||
{
|
||||
self->trace = 1;
|
||||
}
|
||||
|
||||
python$target:::function-entry
|
||||
/self->trace/
|
||||
{
|
||||
printf("%d\t%*s:", timestamp, 15, probename);
|
||||
printf("%*s", self->indent, "");
|
||||
printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2);
|
||||
self->indent++;
|
||||
}
|
||||
|
||||
python$target:::function-return
|
||||
/self->trace/
|
||||
{
|
||||
self->indent--;
|
||||
printf("%d\t%*s:", timestamp, 15, probename);
|
||||
printf("%*s", self->indent, "");
|
||||
printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2);
|
||||
}
|
||||
|
||||
python$target:::function-return
|
||||
/copyinstr(arg1) == "start"/
|
||||
{
|
||||
self->trace = 0;
|
||||
}
|
||||
|
||||
It can be invoked like this::
|
||||
|
||||
$ sudo dtrace -q -s call_stack.d -c "python3.6 script.py"
|
||||
|
||||
The output looks like this:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
156641360502280 function-entry:call_stack.py:start:23
|
||||
156641360518804 function-entry: call_stack.py:function_1:1
|
||||
156641360532797 function-entry: call_stack.py:function_3:9
|
||||
156641360546807 function-return: call_stack.py:function_3:10
|
||||
156641360563367 function-return: call_stack.py:function_1:2
|
||||
156641360578365 function-entry: call_stack.py:function_2:5
|
||||
156641360591757 function-entry: call_stack.py:function_1:1
|
||||
156641360605556 function-entry: call_stack.py:function_3:9
|
||||
156641360617482 function-return: call_stack.py:function_3:10
|
||||
156641360629814 function-return: call_stack.py:function_1:2
|
||||
156641360642285 function-return: call_stack.py:function_2:6
|
||||
156641360656770 function-entry: call_stack.py:function_3:9
|
||||
156641360669707 function-return: call_stack.py:function_3:10
|
||||
156641360687853 function-entry: call_stack.py:function_4:13
|
||||
156641360700719 function-return: call_stack.py:function_4:14
|
||||
156641360719640 function-entry: call_stack.py:function_5:18
|
||||
156641360732567 function-return: call_stack.py:function_5:21
|
||||
156641360747370 function-return:call_stack.py:start:28
|
||||
|
||||
|
||||
Static SystemTap markers
|
||||
------------------------
|
||||
|
||||
The low-level way to use the SystemTap integration is to use the static
|
||||
markers directly. This requires you to explicitly state the binary file
|
||||
containing them.
|
||||
|
||||
For example, this SystemTap script can be used to show the call/return
|
||||
hierarchy of a Python script:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
probe process("python").mark("function__entry") {
|
||||
filename = user_string($arg1);
|
||||
funcname = user_string($arg2);
|
||||
lineno = $arg3;
|
||||
|
||||
printf("%s => %s in %s:%d\\n",
|
||||
thread_indent(1), funcname, filename, lineno);
|
||||
}
|
||||
|
||||
probe process("python").mark("function__return") {
|
||||
filename = user_string($arg1);
|
||||
funcname = user_string($arg2);
|
||||
lineno = $arg3;
|
||||
|
||||
printf("%s <= %s in %s:%d\\n",
|
||||
thread_indent(-1), funcname, filename, lineno);
|
||||
}
|
||||
|
||||
It can be invoked like this::
|
||||
|
||||
$ stap \
|
||||
show-call-hierarchy.stp \
|
||||
-c "./python test.py"
|
||||
|
||||
The output looks like this:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
11408 python(8274): => __contains__ in Lib/_abcoll.py:362
|
||||
11414 python(8274): => __getitem__ in Lib/os.py:425
|
||||
11418 python(8274): => encode in Lib/os.py:490
|
||||
11424 python(8274): <= encode in Lib/os.py:493
|
||||
11428 python(8274): <= __getitem__ in Lib/os.py:426
|
||||
11433 python(8274): <= __contains__ in Lib/_abcoll.py:366
|
||||
|
||||
where the columns are:
|
||||
|
||||
- time in microseconds since start of script
|
||||
|
||||
- name of executable
|
||||
|
||||
- PID of process
|
||||
|
||||
and the remainder indicates the call/return hierarchy as the script executes.
|
||||
|
||||
For a `--enable-shared` build of CPython, the markers are contained within the
|
||||
libpython shared library, and the probe's dotted path needs to reflect this. For
|
||||
example, this line from the above example:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
probe process("python").mark("function__entry") {
|
||||
|
||||
should instead read:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
probe process("python").library("libpython3.6dm.so.1.0").mark("function__entry") {
|
||||
|
||||
(assuming a debug build of CPython 3.6)
|
||||
|
||||
|
||||
Available static markers
|
||||
------------------------
|
||||
|
||||
.. I'm reusing the "c:function" type for markers
|
||||
|
||||
.. c:function:: function__entry(str filename, str funcname, int lineno)
|
||||
|
||||
This marker indicates that execution of a Python function has begun.
|
||||
It is only triggered for pure-Python (bytecode) functions.
|
||||
|
||||
The filename, function name, and line number are provided back to the
|
||||
tracing script as positional arguments, which must be accessed using
|
||||
``$arg1``, ``$arg2``, ``$arg3``:
|
||||
|
||||
* ``$arg1`` : ``(const char *)`` filename, accessible using ``user_string($arg1)``
|
||||
|
||||
* ``$arg2`` : ``(const char *)`` function name, accessible using
|
||||
``user_string($arg2)``
|
||||
|
||||
* ``$arg3`` : ``int`` line number
|
||||
|
||||
.. c:function:: function__return(str filename, str funcname, int lineno)
|
||||
|
||||
This marker is the converse of :c:func:`function__entry`, and indicates that
|
||||
execution of a Python function has ended (either via ``return``, or via an
|
||||
exception). It is only triggered for pure-Python (bytecode) functions.
|
||||
|
||||
The arguments are the same as for :c:func:`function__entry`
|
||||
|
||||
.. c:function:: line(str filename, str funcname, int lineno)
|
||||
|
||||
This marker indicates a Python line is about to be executed. It is
|
||||
the equivalent of line-by-line tracing with a Python profiler. It is
|
||||
not triggered within C functions.
|
||||
|
||||
The arguments are the same as for :c:func:`function__entry`.
|
||||
|
||||
.. c:function:: gc__start(int generation)
|
||||
|
||||
Fires when the Python interpreter starts a garbage collection cycle.
|
||||
``arg0`` is the generation to scan, like :func:`gc.collect()`.
|
||||
|
||||
.. c:function:: gc__done(long collected)
|
||||
|
||||
Fires when the Python interpreter finishes a garbage collection
|
||||
cycle. ``arg0`` is the number of collected objects.
|
||||
|
||||
|
||||
SystemTap Tapsets
|
||||
-----------------
|
||||
|
||||
The higher-level way to use the SystemTap integration is to use a "tapset":
|
||||
SystemTap's equivalent of a library, which hides some of the lower-level
|
||||
details of the static markers.
|
||||
|
||||
Here is a tapset file, based on a non-shared build of CPython:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
/*
|
||||
Provide a higher-level wrapping around the function__entry and
|
||||
function__return markers:
|
||||
\*/
|
||||
probe python.function.entry = process("python").mark("function__entry")
|
||||
{
|
||||
filename = user_string($arg1);
|
||||
funcname = user_string($arg2);
|
||||
lineno = $arg3;
|
||||
frameptr = $arg4
|
||||
}
|
||||
probe python.function.return = process("python").mark("function__return")
|
||||
{
|
||||
filename = user_string($arg1);
|
||||
funcname = user_string($arg2);
|
||||
lineno = $arg3;
|
||||
frameptr = $arg4
|
||||
}
|
||||
|
||||
If this file is installed in SystemTap's tapset directory (e.g.
|
||||
``/usr/share/systemtap/tapset``), then these additional probepoints become
|
||||
available:
|
||||
|
||||
.. c:function:: python.function.entry(str filename, str funcname, int lineno, frameptr)
|
||||
|
||||
This probe point indicates that execution of a Python function has begun.
|
||||
It is only triggered for pure-python (bytecode) functions.
|
||||
|
||||
.. c:function:: python.function.return(str filename, str funcname, int lineno, frameptr)
|
||||
|
||||
This probe point is the converse of :c:func:`python.function.return`, and
|
||||
indicates that execution of a Python function has ended (either via
|
||||
``return``, or via an exception). It is only triggered for pure-python
|
||||
(bytecode) functions.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
This SystemTap script uses the tapset above to more cleanly implement the
|
||||
example given above of tracing the Python function-call hierarchy, without
|
||||
needing to directly name the static markers:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
probe python.function.entry
|
||||
{
|
||||
printf("%s => %s in %s:%d\n",
|
||||
thread_indent(1), funcname, filename, lineno);
|
||||
}
|
||||
|
||||
probe python.function.return
|
||||
{
|
||||
printf("%s <= %s in %s:%d\n",
|
||||
thread_indent(-1), funcname, filename, lineno);
|
||||
}
|
||||
|
||||
|
||||
The following script uses the tapset above to provide a top-like view of all
|
||||
running CPython code, showing the top 20 most frequently-entered bytecode
|
||||
frames, each second, across the whole system:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
global fn_calls;
|
||||
|
||||
probe python.function.entry
|
||||
{
|
||||
fn_calls[pid(), filename, funcname, lineno] += 1;
|
||||
}
|
||||
|
||||
probe timer.ms(1000) {
|
||||
printf("\033[2J\033[1;1H") /* clear screen \*/
|
||||
printf("%6s %80s %6s %30s %6s\n",
|
||||
"PID", "FILENAME", "LINE", "FUNCTION", "CALLS")
|
||||
foreach ([pid, filename, funcname, lineno] in fn_calls- limit 20) {
|
||||
printf("%6d %80s %6d %30s %6d\n",
|
||||
pid, filename, lineno, funcname,
|
||||
fn_calls[pid, filename, funcname, lineno]);
|
||||
}
|
||||
delete fn_calls;
|
||||
}
|
||||
|
340
third_party/python/Doc/howto/ipaddress.rst
vendored
Normal file
340
third_party/python/Doc/howto/ipaddress.rst
vendored
Normal file
|
@ -0,0 +1,340 @@
|
|||
.. testsetup::
|
||||
|
||||
import ipaddress
|
||||
|
||||
.. _ipaddress-howto:
|
||||
|
||||
***************************************
|
||||
An introduction to the ipaddress module
|
||||
***************************************
|
||||
|
||||
:author: Peter Moody
|
||||
:author: Nick Coghlan
|
||||
|
||||
.. topic:: Overview
|
||||
|
||||
This document aims to provide a gentle introduction to the
|
||||
:mod:`ipaddress` module. It is aimed primarily at users that aren't
|
||||
already familiar with IP networking terminology, but may also be useful
|
||||
to network engineers wanting an overview of how :mod:`ipaddress`
|
||||
represents IP network addressing concepts.
|
||||
|
||||
|
||||
Creating Address/Network/Interface objects
|
||||
==========================================
|
||||
|
||||
Since :mod:`ipaddress` is a module for inspecting and manipulating IP addresses,
|
||||
the first thing you'll want to do is create some objects. You can use
|
||||
:mod:`ipaddress` to create objects from strings and integers.
|
||||
|
||||
|
||||
A Note on IP Versions
|
||||
---------------------
|
||||
|
||||
For readers that aren't particularly familiar with IP addressing, it's
|
||||
important to know that the Internet Protocol is currently in the process
|
||||
of moving from version 4 of the protocol to version 6. This transition is
|
||||
occurring largely because version 4 of the protocol doesn't provide enough
|
||||
addresses to handle the needs of the whole world, especially given the
|
||||
increasing number of devices with direct connections to the internet.
|
||||
|
||||
Explaining the details of the differences between the two versions of the
|
||||
protocol is beyond the scope of this introduction, but readers need to at
|
||||
least be aware that these two versions exist, and it will sometimes be
|
||||
necessary to force the use of one version or the other.
|
||||
|
||||
|
||||
IP Host Addresses
|
||||
-----------------
|
||||
|
||||
Addresses, often referred to as "host addresses" are the most basic unit
|
||||
when working with IP addressing. The simplest way to create addresses is
|
||||
to use the :func:`ipaddress.ip_address` factory function, which automatically
|
||||
determines whether to create an IPv4 or IPv6 address based on the passed in
|
||||
value:
|
||||
|
||||
>>> ipaddress.ip_address('192.0.2.1')
|
||||
IPv4Address('192.0.2.1')
|
||||
>>> ipaddress.ip_address('2001:DB8::1')
|
||||
IPv6Address('2001:db8::1')
|
||||
|
||||
Addresses can also be created directly from integers. Values that will
|
||||
fit within 32 bits are assumed to be IPv4 addresses::
|
||||
|
||||
>>> ipaddress.ip_address(3221225985)
|
||||
IPv4Address('192.0.2.1')
|
||||
>>> ipaddress.ip_address(42540766411282592856903984951653826561)
|
||||
IPv6Address('2001:db8::1')
|
||||
|
||||
To force the use of IPv4 or IPv6 addresses, the relevant classes can be
|
||||
invoked directly. This is particularly useful to force creation of IPv6
|
||||
addresses for small integers::
|
||||
|
||||
>>> ipaddress.ip_address(1)
|
||||
IPv4Address('0.0.0.1')
|
||||
>>> ipaddress.IPv4Address(1)
|
||||
IPv4Address('0.0.0.1')
|
||||
>>> ipaddress.IPv6Address(1)
|
||||
IPv6Address('::1')
|
||||
|
||||
|
||||
Defining Networks
|
||||
-----------------
|
||||
|
||||
Host addresses are usually grouped together into IP networks, so
|
||||
:mod:`ipaddress` provides a way to create, inspect and manipulate network
|
||||
definitions. IP network objects are constructed from strings that define the
|
||||
range of host addresses that are part of that network. The simplest form
|
||||
for that information is a "network address/network prefix" pair, where the
|
||||
prefix defines the number of leading bits that are compared to determine
|
||||
whether or not an address is part of the network and the network address
|
||||
defines the expected value of those bits.
|
||||
|
||||
As for addresses, a factory function is provided that determines the correct
|
||||
IP version automatically::
|
||||
|
||||
>>> ipaddress.ip_network('192.0.2.0/24')
|
||||
IPv4Network('192.0.2.0/24')
|
||||
>>> ipaddress.ip_network('2001:db8::0/96')
|
||||
IPv6Network('2001:db8::/96')
|
||||
|
||||
Network objects cannot have any host bits set. The practical effect of this
|
||||
is that ``192.0.2.1/24`` does not describe a network. Such definitions are
|
||||
referred to as interface objects since the ip-on-a-network notation is
|
||||
commonly used to describe network interfaces of a computer on a given network
|
||||
and are described further in the next section.
|
||||
|
||||
By default, attempting to create a network object with host bits set will
|
||||
result in :exc:`ValueError` being raised. To request that the
|
||||
additional bits instead be coerced to zero, the flag ``strict=False`` can
|
||||
be passed to the constructor::
|
||||
|
||||
>>> ipaddress.ip_network('192.0.2.1/24')
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ValueError: 192.0.2.1/24 has host bits set
|
||||
>>> ipaddress.ip_network('192.0.2.1/24', strict=False)
|
||||
IPv4Network('192.0.2.0/24')
|
||||
|
||||
While the string form offers significantly more flexibility, networks can
|
||||
also be defined with integers, just like host addresses. In this case, the
|
||||
network is considered to contain only the single address identified by the
|
||||
integer, so the network prefix includes the entire network address::
|
||||
|
||||
>>> ipaddress.ip_network(3221225984)
|
||||
IPv4Network('192.0.2.0/32')
|
||||
>>> ipaddress.ip_network(42540766411282592856903984951653826560)
|
||||
IPv6Network('2001:db8::/128')
|
||||
|
||||
As with addresses, creation of a particular kind of network can be forced
|
||||
by calling the class constructor directly instead of using the factory
|
||||
function.
|
||||
|
||||
|
||||
Host Interfaces
|
||||
---------------
|
||||
|
||||
As mentioned just above, if you need to describe an address on a particular
|
||||
network, neither the address nor the network classes are sufficient.
|
||||
Notation like ``192.0.2.1/24`` is commonly used by network engineers and the
|
||||
people who write tools for firewalls and routers as shorthand for "the host
|
||||
``192.0.2.1`` on the network ``192.0.2.0/24``", Accordingly, :mod:`ipaddress`
|
||||
provides a set of hybrid classes that associate an address with a particular
|
||||
network. The interface for creation is identical to that for defining network
|
||||
objects, except that the address portion isn't constrained to being a network
|
||||
address.
|
||||
|
||||
>>> ipaddress.ip_interface('192.0.2.1/24')
|
||||
IPv4Interface('192.0.2.1/24')
|
||||
>>> ipaddress.ip_interface('2001:db8::1/96')
|
||||
IPv6Interface('2001:db8::1/96')
|
||||
|
||||
Integer inputs are accepted (as with networks), and use of a particular IP
|
||||
version can be forced by calling the relevant constructor directly.
|
||||
|
||||
|
||||
Inspecting Address/Network/Interface Objects
|
||||
============================================
|
||||
|
||||
You've gone to the trouble of creating an IPv(4|6)(Address|Network|Interface)
|
||||
object, so you probably want to get information about it. :mod:`ipaddress`
|
||||
tries to make doing this easy and intuitive.
|
||||
|
||||
Extracting the IP version::
|
||||
|
||||
>>> addr4 = ipaddress.ip_address('192.0.2.1')
|
||||
>>> addr6 = ipaddress.ip_address('2001:db8::1')
|
||||
>>> addr6.version
|
||||
6
|
||||
>>> addr4.version
|
||||
4
|
||||
|
||||
Obtaining the network from an interface::
|
||||
|
||||
>>> host4 = ipaddress.ip_interface('192.0.2.1/24')
|
||||
>>> host4.network
|
||||
IPv4Network('192.0.2.0/24')
|
||||
>>> host6 = ipaddress.ip_interface('2001:db8::1/96')
|
||||
>>> host6.network
|
||||
IPv6Network('2001:db8::/96')
|
||||
|
||||
Finding out how many individual addresses are in a network::
|
||||
|
||||
>>> net4 = ipaddress.ip_network('192.0.2.0/24')
|
||||
>>> net4.num_addresses
|
||||
256
|
||||
>>> net6 = ipaddress.ip_network('2001:db8::0/96')
|
||||
>>> net6.num_addresses
|
||||
4294967296
|
||||
|
||||
Iterating through the "usable" addresses on a network::
|
||||
|
||||
>>> net4 = ipaddress.ip_network('192.0.2.0/24')
|
||||
>>> for x in net4.hosts():
|
||||
... print(x) # doctest: +ELLIPSIS
|
||||
192.0.2.1
|
||||
192.0.2.2
|
||||
192.0.2.3
|
||||
192.0.2.4
|
||||
...
|
||||
192.0.2.252
|
||||
192.0.2.253
|
||||
192.0.2.254
|
||||
|
||||
|
||||
Obtaining the netmask (i.e. set bits corresponding to the network prefix) or
|
||||
the hostmask (any bits that are not part of the netmask):
|
||||
|
||||
>>> net4 = ipaddress.ip_network('192.0.2.0/24')
|
||||
>>> net4.netmask
|
||||
IPv4Address('255.255.255.0')
|
||||
>>> net4.hostmask
|
||||
IPv4Address('0.0.0.255')
|
||||
>>> net6 = ipaddress.ip_network('2001:db8::0/96')
|
||||
>>> net6.netmask
|
||||
IPv6Address('ffff:ffff:ffff:ffff:ffff:ffff::')
|
||||
>>> net6.hostmask
|
||||
IPv6Address('::ffff:ffff')
|
||||
|
||||
|
||||
Exploding or compressing the address::
|
||||
|
||||
>>> addr6.exploded
|
||||
'2001:0db8:0000:0000:0000:0000:0000:0001'
|
||||
>>> addr6.compressed
|
||||
'2001:db8::1'
|
||||
>>> net6.exploded
|
||||
'2001:0db8:0000:0000:0000:0000:0000:0000/96'
|
||||
>>> net6.compressed
|
||||
'2001:db8::/96'
|
||||
|
||||
While IPv4 doesn't support explosion or compression, the associated objects
|
||||
still provide the relevant properties so that version neutral code can
|
||||
easily ensure the most concise or most verbose form is used for IPv6
|
||||
addresses while still correctly handling IPv4 addresses.
|
||||
|
||||
|
||||
Networks as lists of Addresses
|
||||
==============================
|
||||
|
||||
It's sometimes useful to treat networks as lists. This means it is possible
|
||||
to index them like this::
|
||||
|
||||
>>> net4[1]
|
||||
IPv4Address('192.0.2.1')
|
||||
>>> net4[-1]
|
||||
IPv4Address('192.0.2.255')
|
||||
>>> net6[1]
|
||||
IPv6Address('2001:db8::1')
|
||||
>>> net6[-1]
|
||||
IPv6Address('2001:db8::ffff:ffff')
|
||||
|
||||
|
||||
It also means that network objects lend themselves to using the list
|
||||
membership test syntax like this::
|
||||
|
||||
if address in network:
|
||||
# do something
|
||||
|
||||
Containment testing is done efficiently based on the network prefix::
|
||||
|
||||
>>> addr4 = ipaddress.ip_address('192.0.2.1')
|
||||
>>> addr4 in ipaddress.ip_network('192.0.2.0/24')
|
||||
True
|
||||
>>> addr4 in ipaddress.ip_network('192.0.3.0/24')
|
||||
False
|
||||
|
||||
|
||||
Comparisons
|
||||
===========
|
||||
|
||||
:mod:`ipaddress` provides some simple, hopefully intuitive ways to compare
|
||||
objects, where it makes sense::
|
||||
|
||||
>>> ipaddress.ip_address('192.0.2.1') < ipaddress.ip_address('192.0.2.2')
|
||||
True
|
||||
|
||||
A :exc:`TypeError` exception is raised if you try to compare objects of
|
||||
different versions or different types.
|
||||
|
||||
|
||||
Using IP Addresses with other modules
|
||||
=====================================
|
||||
|
||||
Other modules that use IP addresses (such as :mod:`socket`) usually won't
|
||||
accept objects from this module directly. Instead, they must be coerced to
|
||||
an integer or string that the other module will accept::
|
||||
|
||||
>>> addr4 = ipaddress.ip_address('192.0.2.1')
|
||||
>>> str(addr4)
|
||||
'192.0.2.1'
|
||||
>>> int(addr4)
|
||||
3221225985
|
||||
|
||||
|
||||
Getting more detail when instance creation fails
|
||||
================================================
|
||||
|
||||
When creating address/network/interface objects using the version-agnostic
|
||||
factory functions, any errors will be reported as :exc:`ValueError` with
|
||||
a generic error message that simply says the passed in value was not
|
||||
recognized as an object of that type. The lack of a specific error is
|
||||
because it's necessary to know whether the value is *supposed* to be IPv4
|
||||
or IPv6 in order to provide more detail on why it has been rejected.
|
||||
|
||||
To support use cases where it is useful to have access to this additional
|
||||
detail, the individual class constructors actually raise the
|
||||
:exc:`ValueError` subclasses :exc:`ipaddress.AddressValueError` and
|
||||
:exc:`ipaddress.NetmaskValueError` to indicate exactly which part of
|
||||
the definition failed to parse correctly.
|
||||
|
||||
The error messages are significantly more detailed when using the
|
||||
class constructors directly. For example::
|
||||
|
||||
>>> ipaddress.ip_address("192.168.0.256")
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ValueError: '192.168.0.256' does not appear to be an IPv4 or IPv6 address
|
||||
>>> ipaddress.IPv4Address("192.168.0.256")
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ipaddress.AddressValueError: Octet 256 (> 255) not permitted in '192.168.0.256'
|
||||
|
||||
>>> ipaddress.ip_network("192.168.0.1/64")
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ValueError: '192.168.0.1/64' does not appear to be an IPv4 or IPv6 network
|
||||
>>> ipaddress.IPv4Network("192.168.0.1/64")
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
ipaddress.NetmaskValueError: '64' is not a valid netmask
|
||||
|
||||
However, both of the module specific exceptions have :exc:`ValueError` as their
|
||||
parent class, so if you're not concerned with the particular type of error,
|
||||
you can still write code like the following::
|
||||
|
||||
try:
|
||||
network = ipaddress.IPv4Network(address)
|
||||
except ValueError:
|
||||
print('address/netmask is invalid for IPv4:', address)
|
||||
|
2551
third_party/python/Doc/howto/logging-cookbook.rst
vendored
Normal file
2551
third_party/python/Doc/howto/logging-cookbook.rst
vendored
Normal file
File diff suppressed because it is too large
Load diff
1103
third_party/python/Doc/howto/logging.rst
vendored
Normal file
1103
third_party/python/Doc/howto/logging.rst
vendored
Normal file
File diff suppressed because it is too large
Load diff
BIN
third_party/python/Doc/howto/logging_flow.png
vendored
Executable file
BIN
third_party/python/Doc/howto/logging_flow.png
vendored
Executable file
Binary file not shown.
After Width: | Height: | Size: 48 KiB |
452
third_party/python/Doc/howto/pyporting.rst
vendored
Normal file
452
third_party/python/Doc/howto/pyporting.rst
vendored
Normal file
|
@ -0,0 +1,452 @@
|
|||
.. _pyporting-howto:
|
||||
|
||||
*********************************
|
||||
Porting Python 2 Code to Python 3
|
||||
*********************************
|
||||
|
||||
:author: Brett Cannon
|
||||
|
||||
.. topic:: Abstract
|
||||
|
||||
With Python 3 being the future of Python while Python 2 is still in active
|
||||
use, it is good to have your project available for both major releases of
|
||||
Python. This guide is meant to help you figure out how best to support both
|
||||
Python 2 & 3 simultaneously.
|
||||
|
||||
If you are looking to port an extension module instead of pure Python code,
|
||||
please see :ref:`cporting-howto`.
|
||||
|
||||
If you would like to read one core Python developer's take on why Python 3
|
||||
came into existence, you can read Nick Coghlan's `Python 3 Q & A`_ or
|
||||
Brett Cannon's `Why Python 3 exists`_.
|
||||
|
||||
For help with porting, you can email the python-porting_ mailing list with
|
||||
questions.
|
||||
|
||||
The Short Explanation
|
||||
=====================
|
||||
|
||||
To make your project be single-source Python 2/3 compatible, the basic steps
|
||||
are:
|
||||
|
||||
#. Only worry about supporting Python 2.7
|
||||
#. Make sure you have good test coverage (coverage.py_ can help;
|
||||
``pip install coverage``)
|
||||
#. Learn the differences between Python 2 & 3
|
||||
#. Use Futurize_ (or Modernize_) to update your code (e.g. ``pip install future``)
|
||||
#. Use Pylint_ to help make sure you don't regress on your Python 3 support
|
||||
(``pip install pylint``)
|
||||
#. Use caniusepython3_ to find out which of your dependencies are blocking your
|
||||
use of Python 3 (``pip install caniusepython3``)
|
||||
#. Once your dependencies are no longer blocking you, use continuous integration
|
||||
to make sure you stay compatible with Python 2 & 3 (tox_ can help test
|
||||
against multiple versions of Python; ``pip install tox``)
|
||||
#. Consider using optional static type checking to make sure your type usage
|
||||
works in both Python 2 & 3 (e.g. use mypy_ to check your typing under both
|
||||
Python 2 & Python 3).
|
||||
|
||||
|
||||
Details
|
||||
=======
|
||||
|
||||
A key point about supporting Python 2 & 3 simultaneously is that you can start
|
||||
**today**! Even if your dependencies are not supporting Python 3 yet that does
|
||||
not mean you can't modernize your code **now** to support Python 3. Most changes
|
||||
required to support Python 3 lead to cleaner code using newer practices even in
|
||||
Python 2 code.
|
||||
|
||||
Another key point is that modernizing your Python 2 code to also support
|
||||
Python 3 is largely automated for you. While you might have to make some API
|
||||
decisions thanks to Python 3 clarifying text data versus binary data, the
|
||||
lower-level work is now mostly done for you and thus can at least benefit from
|
||||
the automated changes immediately.
|
||||
|
||||
Keep those key points in mind while you read on about the details of porting
|
||||
your code to support Python 2 & 3 simultaneously.
|
||||
|
||||
|
||||
Drop support for Python 2.6 and older
|
||||
-------------------------------------
|
||||
|
||||
While you can make Python 2.5 work with Python 3, it is **much** easier if you
|
||||
only have to work with Python 2.7. If dropping Python 2.5 is not an
|
||||
option then the six_ project can help you support Python 2.5 & 3 simultaneously
|
||||
(``pip install six``). Do realize, though, that nearly all the projects listed
|
||||
in this HOWTO will not be available to you.
|
||||
|
||||
If you are able to skip Python 2.5 and older, then the required changes
|
||||
to your code should continue to look and feel like idiomatic Python code. At
|
||||
worst you will have to use a function instead of a method in some instances or
|
||||
have to import a function instead of using a built-in one, but otherwise the
|
||||
overall transformation should not feel foreign to you.
|
||||
|
||||
But you should aim for only supporting Python 2.7. Python 2.6 is no longer
|
||||
freely supported and thus is not receiving bugfixes. This means **you** will have
|
||||
to work around any issues you come across with Python 2.6. There are also some
|
||||
tools mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint_),
|
||||
and this will become more commonplace as time goes on. It will simply be easier
|
||||
for you if you only support the versions of Python that you have to support.
|
||||
|
||||
|
||||
Make sure you specify the proper version support in your ``setup.py`` file
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
In your ``setup.py`` file you should have the proper `trove classifier`_
|
||||
specifying what versions of Python you support. As your project does not support
|
||||
Python 3 yet you should at least have
|
||||
``Programming Language :: Python :: 2 :: Only`` specified. Ideally you should
|
||||
also specify each major/minor version of Python that you do support, e.g.
|
||||
``Programming Language :: Python :: 2.7``.
|
||||
|
||||
|
||||
Have good test coverage
|
||||
-----------------------
|
||||
|
||||
Once you have your code supporting the oldest version of Python 2 you want it
|
||||
to, you will want to make sure your test suite has good coverage. A good rule of
|
||||
thumb is that if you want to be confident enough in your test suite that any
|
||||
failures that appear after having tools rewrite your code are actual bugs in the
|
||||
tools and not in your code. If you want a number to aim for, try to get over 80%
|
||||
coverage (and don't feel bad if you find it hard to get better than 90%
|
||||
coverage). If you don't already have a tool to measure test coverage then
|
||||
coverage.py_ is recommended.
|
||||
|
||||
|
||||
Learn the differences between Python 2 & 3
|
||||
-------------------------------------------
|
||||
|
||||
Once you have your code well-tested you are ready to begin porting your code to
|
||||
Python 3! But to fully understand how your code is going to change and what
|
||||
you want to look out for while you code, you will want to learn what changes
|
||||
Python 3 makes in terms of Python 2. Typically the two best ways of doing that
|
||||
is reading the `"What's New"`_ doc for each release of Python 3 and the
|
||||
`Porting to Python 3`_ book (which is free online). There is also a handy
|
||||
`cheat sheet`_ from the Python-Future project.
|
||||
|
||||
|
||||
Update your code
|
||||
----------------
|
||||
|
||||
Once you feel like you know what is different in Python 3 compared to Python 2,
|
||||
it's time to update your code! You have a choice between two tools in porting
|
||||
your code automatically: Futurize_ and Modernize_. Which tool you choose will
|
||||
depend on how much like Python 3 you want your code to be. Futurize_ does its
|
||||
best to make Python 3 idioms and practices exist in Python 2, e.g. backporting
|
||||
the ``bytes`` type from Python 3 so that you have semantic parity between the
|
||||
major versions of Python. Modernize_,
|
||||
on the other hand, is more conservative and targets a Python 2/3 subset of
|
||||
Python, directly relying on six_ to help provide compatibility. As Python 3 is
|
||||
the future, it might be best to consider Futurize to begin adjusting to any new
|
||||
practices that Python 3 introduces which you are not accustomed to yet.
|
||||
|
||||
Regardless of which tool you choose, they will update your code to run under
|
||||
Python 3 while staying compatible with the version of Python 2 you started with.
|
||||
Depending on how conservative you want to be, you may want to run the tool over
|
||||
your test suite first and visually inspect the diff to make sure the
|
||||
transformation is accurate. After you have transformed your test suite and
|
||||
verified that all the tests still pass as expected, then you can transform your
|
||||
application code knowing that any tests which fail is a translation failure.
|
||||
|
||||
Unfortunately the tools can't automate everything to make your code work under
|
||||
Python 3 and so there are a handful of things you will need to update manually
|
||||
to get full Python 3 support (which of these steps are necessary vary between
|
||||
the tools). Read the documentation for the tool you choose to use to see what it
|
||||
fixes by default and what it can do optionally to know what will (not) be fixed
|
||||
for you and what you may have to fix on your own (e.g. using ``io.open()`` over
|
||||
the built-in ``open()`` function is off by default in Modernize). Luckily,
|
||||
though, there are only a couple of things to watch out for which can be
|
||||
considered large issues that may be hard to debug if not watched for.
|
||||
|
||||
|
||||
Division
|
||||
++++++++
|
||||
|
||||
In Python 3, ``5 / 2 == 2.5`` and not ``2``; all division between ``int`` values
|
||||
result in a ``float``. This change has actually been planned since Python 2.2
|
||||
which was released in 2002. Since then users have been encouraged to add
|
||||
``from __future__ import division`` to any and all files which use the ``/`` and
|
||||
``//`` operators or to be running the interpreter with the ``-Q`` flag. If you
|
||||
have not been doing this then you will need to go through your code and do two
|
||||
things:
|
||||
|
||||
#. Add ``from __future__ import division`` to your files
|
||||
#. Update any division operator as necessary to either use ``//`` to use floor
|
||||
division or continue using ``/`` and expect a float
|
||||
|
||||
The reason that ``/`` isn't simply translated to ``//`` automatically is that if
|
||||
an object defines a ``__truediv__`` method but not ``__floordiv__`` then your
|
||||
code would begin to fail (e.g. a user-defined class that uses ``/`` to
|
||||
signify some operation but not ``//`` for the same thing or at all).
|
||||
|
||||
|
||||
Text versus binary data
|
||||
+++++++++++++++++++++++
|
||||
|
||||
In Python 2 you could use the ``str`` type for both text and binary data.
|
||||
Unfortunately this confluence of two different concepts could lead to brittle
|
||||
code which sometimes worked for either kind of data, sometimes not. It also
|
||||
could lead to confusing APIs if people didn't explicitly state that something
|
||||
that accepted ``str`` accepted either text or binary data instead of one
|
||||
specific type. This complicated the situation especially for anyone supporting
|
||||
multiple languages as APIs wouldn't bother explicitly supporting ``unicode``
|
||||
when they claimed text data support.
|
||||
|
||||
To make the distinction between text and binary data clearer and more
|
||||
pronounced, Python 3 did what most languages created in the age of the internet
|
||||
have done and made text and binary data distinct types that cannot blindly be
|
||||
mixed together (Python predates widespread access to the internet). For any code
|
||||
that deals only with text or only binary data, this separation doesn't pose an
|
||||
issue. But for code that has to deal with both, it does mean you might have to
|
||||
now care about when you are using text compared to binary data, which is why
|
||||
this cannot be entirely automated.
|
||||
|
||||
To start, you will need to decide which APIs take text and which take binary
|
||||
(it is **highly** recommended you don't design APIs that can take both due to
|
||||
the difficulty of keeping the code working; as stated earlier it is difficult to
|
||||
do well). In Python 2 this means making sure the APIs that take text can work
|
||||
with ``unicode`` and those that work with binary data work with the
|
||||
``bytes`` type from Python 3 (which is a subset of ``str`` in Python 2 and acts
|
||||
as an alias for ``bytes`` type in Python 2). Usually the biggest issue is
|
||||
realizing which methods exist on which types in Python 2 & 3 simultaneously
|
||||
(for text that's ``unicode`` in Python 2 and ``str`` in Python 3, for binary
|
||||
that's ``str``/``bytes`` in Python 2 and ``bytes`` in Python 3). The following
|
||||
table lists the **unique** methods of each data type across Python 2 & 3
|
||||
(e.g., the ``decode()`` method is usable on the equivalent binary data type in
|
||||
either Python 2 or 3, but it can't be used by the textual data type consistently
|
||||
between Python 2 and 3 because ``str`` in Python 3 doesn't have the method). Do
|
||||
note that as of Python 3.5 the ``__mod__`` method was added to the bytes type.
|
||||
|
||||
======================== =====================
|
||||
**Text data** **Binary data**
|
||||
------------------------ ---------------------
|
||||
\ decode
|
||||
------------------------ ---------------------
|
||||
encode
|
||||
------------------------ ---------------------
|
||||
format
|
||||
------------------------ ---------------------
|
||||
isdecimal
|
||||
------------------------ ---------------------
|
||||
isnumeric
|
||||
======================== =====================
|
||||
|
||||
Making the distinction easier to handle can be accomplished by encoding and
|
||||
decoding between binary data and text at the edge of your code. This means that
|
||||
when you receive text in binary data, you should immediately decode it. And if
|
||||
your code needs to send text as binary data then encode it as late as possible.
|
||||
This allows your code to work with only text internally and thus eliminates
|
||||
having to keep track of what type of data you are working with.
|
||||
|
||||
The next issue is making sure you know whether the string literals in your code
|
||||
represent text or binary data. You should add a ``b`` prefix to any
|
||||
literal that presents binary data. For text you should add a ``u`` prefix to
|
||||
the text literal. (there is a :mod:`__future__` import to force all unspecified
|
||||
literals to be Unicode, but usage has shown it isn't as effective as adding a
|
||||
``b`` or ``u`` prefix to all literals explicitly)
|
||||
|
||||
As part of this dichotomy you also need to be careful about opening files.
|
||||
Unless you have been working on Windows, there is a chance you have not always
|
||||
bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for
|
||||
binary reading). Under Python 3, binary files and text files are clearly
|
||||
distinct and mutually incompatible; see the :mod:`io` module for details.
|
||||
Therefore, you **must** make a decision of whether a file will be used for
|
||||
binary access (allowing binary data to be read and/or written) or textual access
|
||||
(allowing text data to be read and/or written). You should also use :func:`io.open`
|
||||
for opening files instead of the built-in :func:`open` function as the :mod:`io`
|
||||
module is consistent from Python 2 to 3 while the built-in :func:`open` function
|
||||
is not (in Python 3 it's actually :func:`io.open`). Do not bother with the
|
||||
outdated practice of using :func:`codecs.open` as that's only necessary for
|
||||
keeping compatibility with Python 2.5.
|
||||
|
||||
The constructors of both ``str`` and ``bytes`` have different semantics for the
|
||||
same arguments between Python 2 & 3. Passing an integer to ``bytes`` in Python 2
|
||||
will give you the string representation of the integer: ``bytes(3) == '3'``.
|
||||
But in Python 3, an integer argument to ``bytes`` will give you a bytes object
|
||||
as long as the integer specified, filled with null bytes:
|
||||
``bytes(3) == b'\x00\x00\x00'``. A similar worry is necessary when passing a
|
||||
bytes object to ``str``. In Python 2 you just get the bytes object back:
|
||||
``str(b'3') == b'3'``. But in Python 3 you get the string representation of the
|
||||
bytes object: ``str(b'3') == "b'3'"``.
|
||||
|
||||
Finally, the indexing of binary data requires careful handling (slicing does
|
||||
**not** require any special handling). In Python 2,
|
||||
``b'123'[1] == b'2'`` while in Python 3 ``b'123'[1] == 50``. Because binary data
|
||||
is simply a collection of binary numbers, Python 3 returns the integer value for
|
||||
the byte you index on. But in Python 2 because ``bytes == str``, indexing
|
||||
returns a one-item slice of bytes. The six_ project has a function
|
||||
named ``six.indexbytes()`` which will return an integer like in Python 3:
|
||||
``six.indexbytes(b'123', 1)``.
|
||||
|
||||
To summarize:
|
||||
|
||||
#. Decide which of your APIs take text and which take binary data
|
||||
#. Make sure that your code that works with text also works with ``unicode`` and
|
||||
code for binary data works with ``bytes`` in Python 2 (see the table above
|
||||
for what methods you cannot use for each type)
|
||||
#. Mark all binary literals with a ``b`` prefix, textual literals with a ``u``
|
||||
prefix
|
||||
#. Decode binary data to text as soon as possible, encode text as binary data as
|
||||
late as possible
|
||||
#. Open files using :func:`io.open` and make sure to specify the ``b`` mode when
|
||||
appropriate
|
||||
#. Be careful when indexing into binary data
|
||||
|
||||
|
||||
Use feature detection instead of version detection
|
||||
++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Inevitably you will have code that has to choose what to do based on what
|
||||
version of Python is running. The best way to do this is with feature detection
|
||||
of whether the version of Python you're running under supports what you need.
|
||||
If for some reason that doesn't work then you should make the version check be
|
||||
against Python 2 and not Python 3. To help explain this, let's look at an
|
||||
example.
|
||||
|
||||
Let's pretend that you need access to a feature of importlib_ that
|
||||
is available in Python's standard library since Python 3.3 and available for
|
||||
Python 2 through importlib2_ on PyPI. You might be tempted to write code to
|
||||
access e.g. the ``importlib.abc`` module by doing the following::
|
||||
|
||||
import sys
|
||||
|
||||
if sys.version_info[0] == 3:
|
||||
from importlib import abc
|
||||
else:
|
||||
from importlib2 import abc
|
||||
|
||||
The problem with this code is what happens when Python 4 comes out? It would
|
||||
be better to treat Python 2 as the exceptional case instead of Python 3 and
|
||||
assume that future Python versions will be more compatible with Python 3 than
|
||||
Python 2::
|
||||
|
||||
import sys
|
||||
|
||||
if sys.version_info[0] > 2:
|
||||
from importlib import abc
|
||||
else:
|
||||
from importlib2 import abc
|
||||
|
||||
The best solution, though, is to do no version detection at all and instead rely
|
||||
on feature detection. That avoids any potential issues of getting the version
|
||||
detection wrong and helps keep you future-compatible::
|
||||
|
||||
try:
|
||||
from importlib import abc
|
||||
except ImportError:
|
||||
from importlib2 import abc
|
||||
|
||||
|
||||
Prevent compatibility regressions
|
||||
---------------------------------
|
||||
|
||||
Once you have fully translated your code to be compatible with Python 3, you
|
||||
will want to make sure your code doesn't regress and stop working under
|
||||
Python 3. This is especially true if you have a dependency which is blocking you
|
||||
from actually running under Python 3 at the moment.
|
||||
|
||||
To help with staying compatible, any new modules you create should have
|
||||
at least the following block of code at the top of it::
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
You can also run Python 2 with the ``-3`` flag to be warned about various
|
||||
compatibility issues your code triggers during execution. If you turn warnings
|
||||
into errors with ``-Werror`` then you can make sure that you don't accidentally
|
||||
miss a warning.
|
||||
|
||||
You can also use the Pylint_ project and its ``--py3k`` flag to lint your code
|
||||
to receive warnings when your code begins to deviate from Python 3
|
||||
compatibility. This also prevents you from having to run Modernize_ or Futurize_
|
||||
over your code regularly to catch compatibility regressions. This does require
|
||||
you only support Python 2.7 and Python 3.4 or newer as that is Pylint's
|
||||
minimum Python version support.
|
||||
|
||||
|
||||
Check which dependencies block your transition
|
||||
----------------------------------------------
|
||||
|
||||
**After** you have made your code compatible with Python 3 you should begin to
|
||||
care about whether your dependencies have also been ported. The caniusepython3_
|
||||
project was created to help you determine which projects
|
||||
-- directly or indirectly -- are blocking you from supporting Python 3. There
|
||||
is both a command-line tool as well as a web interface at
|
||||
https://caniusepython3.com.
|
||||
|
||||
The project also provides code which you can integrate into your test suite so
|
||||
that you will have a failing test when you no longer have dependencies blocking
|
||||
you from using Python 3. This allows you to avoid having to manually check your
|
||||
dependencies and to be notified quickly when you can start running on Python 3.
|
||||
|
||||
|
||||
Update your ``setup.py`` file to denote Python 3 compatibility
|
||||
--------------------------------------------------------------
|
||||
|
||||
Once your code works under Python 3, you should update the classifiers in
|
||||
your ``setup.py`` to contain ``Programming Language :: Python :: 3`` and to not
|
||||
specify sole Python 2 support. This will tell anyone using your code that you
|
||||
support Python 2 **and** 3. Ideally you will also want to add classifiers for
|
||||
each major/minor version of Python you now support.
|
||||
|
||||
|
||||
Use continuous integration to stay compatible
|
||||
---------------------------------------------
|
||||
|
||||
Once you are able to fully run under Python 3 you will want to make sure your
|
||||
code always works under both Python 2 & 3. Probably the best tool for running
|
||||
your tests under multiple Python interpreters is tox_. You can then integrate
|
||||
tox with your continuous integration system so that you never accidentally break
|
||||
Python 2 or 3 support.
|
||||
|
||||
You may also want to use the ``-bb`` flag with the Python 3 interpreter to
|
||||
trigger an exception when you are comparing bytes to strings or bytes to an int
|
||||
(the latter is available starting in Python 3.5). By default type-differing
|
||||
comparisons simply return ``False``, but if you made a mistake in your
|
||||
separation of text/binary data handling or indexing on bytes you wouldn't easily
|
||||
find the mistake. This flag will raise an exception when these kinds of
|
||||
comparisons occur, making the mistake much easier to track down.
|
||||
|
||||
And that's mostly it! At this point your code base is compatible with both
|
||||
Python 2 and 3 simultaneously. Your testing will also be set up so that you
|
||||
don't accidentally break Python 2 or 3 compatibility regardless of which version
|
||||
you typically run your tests under while developing.
|
||||
|
||||
|
||||
Consider using optional static type checking
|
||||
--------------------------------------------
|
||||
|
||||
Another way to help port your code is to use a static type checker like
|
||||
mypy_ or pytype_ on your code. These tools can be used to analyze your code as
|
||||
if it's being run under Python 2, then you can run the tool a second time as if
|
||||
your code is running under Python 3. By running a static type checker twice like
|
||||
this you can discover if you're e.g. misusing binary data type in one version
|
||||
of Python compared to another. If you add optional type hints to your code you
|
||||
can also explicitly state whether your APIs use textual or binary data, helping
|
||||
to make sure everything functions as expected in both versions of Python.
|
||||
|
||||
|
||||
.. _2to3: https://docs.python.org/3/library/2to3.html
|
||||
.. _caniusepython3: https://pypi.org/project/caniusepython3
|
||||
.. _cheat sheet: http://python-future.org/compatible_idioms.html
|
||||
.. _coverage.py: https://pypi.org/project/coverage
|
||||
.. _Futurize: http://python-future.org/automatic_conversion.html
|
||||
.. _importlib: https://docs.python.org/3/library/importlib.html#module-importlib
|
||||
.. _importlib2: https://pypi.org/project/importlib2
|
||||
.. _Modernize: https://python-modernize.readthedocs.org/en/latest/
|
||||
.. _mypy: http://mypy-lang.org/
|
||||
.. _Porting to Python 3: http://python3porting.com/
|
||||
.. _Pylint: https://pypi.org/project/pylint
|
||||
|
||||
.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html
|
||||
|
||||
.. _pytype: https://github.com/google/pytype
|
||||
.. _python-future: http://python-future.org/
|
||||
.. _python-porting: https://mail.python.org/mailman/listinfo/python-porting
|
||||
.. _six: https://pypi.org/project/six
|
||||
.. _tox: https://pypi.org/project/tox
|
||||
.. _trove classifier: https://pypi.org/classifiers
|
||||
|
||||
.. _"What's New": https://docs.python.org/3/whatsnew/index.html
|
||||
|
||||
.. _Why Python 3 exists: http://www.snarky.ca/why-python-3-exists
|
1385
third_party/python/Doc/howto/regex.rst
vendored
Normal file
1385
third_party/python/Doc/howto/regex.rst
vendored
Normal file
File diff suppressed because it is too large
Load diff
383
third_party/python/Doc/howto/sockets.rst
vendored
Normal file
383
third_party/python/Doc/howto/sockets.rst
vendored
Normal file
|
@ -0,0 +1,383 @@
|
|||
.. _socket-howto:
|
||||
|
||||
****************************
|
||||
Socket Programming HOWTO
|
||||
****************************
|
||||
|
||||
:Author: Gordon McMillan
|
||||
|
||||
|
||||
.. topic:: Abstract
|
||||
|
||||
Sockets are used nearly everywhere, but are one of the most severely
|
||||
misunderstood technologies around. This is a 10,000 foot overview of sockets.
|
||||
It's not really a tutorial - you'll still have work to do in getting things
|
||||
operational. It doesn't cover the fine points (and there are a lot of them), but
|
||||
I hope it will give you enough background to begin using them decently.
|
||||
|
||||
|
||||
Sockets
|
||||
=======
|
||||
|
||||
I'm only going to talk about INET (i.e. IPv4) sockets, but they account for at least 99% of
|
||||
the sockets in use. And I'll only talk about STREAM (i.e. TCP) sockets - unless you really
|
||||
know what you're doing (in which case this HOWTO isn't for you!), you'll get
|
||||
better behavior and performance from a STREAM socket than anything else. I will
|
||||
try to clear up the mystery of what a socket is, as well as some hints on how to
|
||||
work with blocking and non-blocking sockets. But I'll start by talking about
|
||||
blocking sockets. You'll need to know how they work before dealing with
|
||||
non-blocking sockets.
|
||||
|
||||
Part of the trouble with understanding these things is that "socket" can mean a
|
||||
number of subtly different things, depending on context. So first, let's make a
|
||||
distinction between a "client" socket - an endpoint of a conversation, and a
|
||||
"server" socket, which is more like a switchboard operator. The client
|
||||
application (your browser, for example) uses "client" sockets exclusively; the
|
||||
web server it's talking to uses both "server" sockets and "client" sockets.
|
||||
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
Of the various forms of :abbr:`IPC (Inter Process Communication)`,
|
||||
sockets are by far the most popular. On any given platform, there are
|
||||
likely to be other forms of IPC that are faster, but for
|
||||
cross-platform communication, sockets are about the only game in town.
|
||||
|
||||
They were invented in Berkeley as part of the BSD flavor of Unix. They spread
|
||||
like wildfire with the Internet. With good reason --- the combination of sockets
|
||||
with INET makes talking to arbitrary machines around the world unbelievably easy
|
||||
(at least compared to other schemes).
|
||||
|
||||
|
||||
Creating a Socket
|
||||
=================
|
||||
|
||||
Roughly speaking, when you clicked on the link that brought you to this page,
|
||||
your browser did something like the following::
|
||||
|
||||
# create an INET, STREAMing socket
|
||||
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
# now connect to the web server on port 80 - the normal http port
|
||||
s.connect(("www.python.org", 80))
|
||||
|
||||
When the ``connect`` completes, the socket ``s`` can be used to send
|
||||
in a request for the text of the page. The same socket will read the
|
||||
reply, and then be destroyed. That's right, destroyed. Client sockets
|
||||
are normally only used for one exchange (or a small set of sequential
|
||||
exchanges).
|
||||
|
||||
What happens in the web server is a bit more complex. First, the web server
|
||||
creates a "server socket"::
|
||||
|
||||
# create an INET, STREAMing socket
|
||||
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
# bind the socket to a public host, and a well-known port
|
||||
serversocket.bind((socket.gethostname(), 80))
|
||||
# become a server socket
|
||||
serversocket.listen(5)
|
||||
|
||||
A couple things to notice: we used ``socket.gethostname()`` so that the socket
|
||||
would be visible to the outside world. If we had used ``s.bind(('localhost',
|
||||
80))`` or ``s.bind(('127.0.0.1', 80))`` we would still have a "server" socket,
|
||||
but one that was only visible within the same machine. ``s.bind(('', 80))``
|
||||
specifies that the socket is reachable by any address the machine happens to
|
||||
have.
|
||||
|
||||
A second thing to note: low number ports are usually reserved for "well known"
|
||||
services (HTTP, SNMP etc). If you're playing around, use a nice high number (4
|
||||
digits).
|
||||
|
||||
Finally, the argument to ``listen`` tells the socket library that we want it to
|
||||
queue up as many as 5 connect requests (the normal max) before refusing outside
|
||||
connections. If the rest of the code is written properly, that should be plenty.
|
||||
|
||||
Now that we have a "server" socket, listening on port 80, we can enter the
|
||||
mainloop of the web server::
|
||||
|
||||
while True:
|
||||
# accept connections from outside
|
||||
(clientsocket, address) = serversocket.accept()
|
||||
# now do something with the clientsocket
|
||||
# in this case, we'll pretend this is a threaded server
|
||||
ct = client_thread(clientsocket)
|
||||
ct.run()
|
||||
|
||||
There's actually 3 general ways in which this loop could work - dispatching a
|
||||
thread to handle ``clientsocket``, create a new process to handle
|
||||
``clientsocket``, or restructure this app to use non-blocking sockets, and
|
||||
multiplex between our "server" socket and any active ``clientsocket``\ s using
|
||||
``select``. More about that later. The important thing to understand now is
|
||||
this: this is *all* a "server" socket does. It doesn't send any data. It doesn't
|
||||
receive any data. It just produces "client" sockets. Each ``clientsocket`` is
|
||||
created in response to some *other* "client" socket doing a ``connect()`` to the
|
||||
host and port we're bound to. As soon as we've created that ``clientsocket``, we
|
||||
go back to listening for more connections. The two "clients" are free to chat it
|
||||
up - they are using some dynamically allocated port which will be recycled when
|
||||
the conversation ends.
|
||||
|
||||
|
||||
IPC
|
||||
---
|
||||
|
||||
If you need fast IPC between two processes on one machine, you should look into
|
||||
pipes or shared memory. If you do decide to use AF_INET sockets, bind the
|
||||
"server" socket to ``'localhost'``. On most platforms, this will take a
|
||||
shortcut around a couple of layers of network code and be quite a bit faster.
|
||||
|
||||
.. seealso::
|
||||
The :mod:`multiprocessing` integrates cross-platform IPC into a higher-level
|
||||
API.
|
||||
|
||||
|
||||
Using a Socket
|
||||
==============
|
||||
|
||||
The first thing to note, is that the web browser's "client" socket and the web
|
||||
server's "client" socket are identical beasts. That is, this is a "peer to peer"
|
||||
conversation. Or to put it another way, *as the designer, you will have to
|
||||
decide what the rules of etiquette are for a conversation*. Normally, the
|
||||
``connect``\ ing socket starts the conversation, by sending in a request, or
|
||||
perhaps a signon. But that's a design decision - it's not a rule of sockets.
|
||||
|
||||
Now there are two sets of verbs to use for communication. You can use ``send``
|
||||
and ``recv``, or you can transform your client socket into a file-like beast and
|
||||
use ``read`` and ``write``. The latter is the way Java presents its sockets.
|
||||
I'm not going to talk about it here, except to warn you that you need to use
|
||||
``flush`` on sockets. These are buffered "files", and a common mistake is to
|
||||
``write`` something, and then ``read`` for a reply. Without a ``flush`` in
|
||||
there, you may wait forever for the reply, because the request may still be in
|
||||
your output buffer.
|
||||
|
||||
Now we come to the major stumbling block of sockets - ``send`` and ``recv`` operate
|
||||
on the network buffers. They do not necessarily handle all the bytes you hand
|
||||
them (or expect from them), because their major focus is handling the network
|
||||
buffers. In general, they return when the associated network buffers have been
|
||||
filled (``send``) or emptied (``recv``). They then tell you how many bytes they
|
||||
handled. It is *your* responsibility to call them again until your message has
|
||||
been completely dealt with.
|
||||
|
||||
When a ``recv`` returns 0 bytes, it means the other side has closed (or is in
|
||||
the process of closing) the connection. You will not receive any more data on
|
||||
this connection. Ever. You may be able to send data successfully; I'll talk
|
||||
more about this later.
|
||||
|
||||
A protocol like HTTP uses a socket for only one transfer. The client sends a
|
||||
request, then reads a reply. That's it. The socket is discarded. This means that
|
||||
a client can detect the end of the reply by receiving 0 bytes.
|
||||
|
||||
But if you plan to reuse your socket for further transfers, you need to realize
|
||||
that *there is no* :abbr:`EOT (End of Transfer)` *on a socket.* I repeat: if a socket
|
||||
``send`` or ``recv`` returns after handling 0 bytes, the connection has been
|
||||
broken. If the connection has *not* been broken, you may wait on a ``recv``
|
||||
forever, because the socket will *not* tell you that there's nothing more to
|
||||
read (for now). Now if you think about that a bit, you'll come to realize a
|
||||
fundamental truth of sockets: *messages must either be fixed length* (yuck), *or
|
||||
be delimited* (shrug), *or indicate how long they are* (much better), *or end by
|
||||
shutting down the connection*. The choice is entirely yours, (but some ways are
|
||||
righter than others).
|
||||
|
||||
Assuming you don't want to end the connection, the simplest solution is a fixed
|
||||
length message::
|
||||
|
||||
class MySocket:
|
||||
"""demonstration class only
|
||||
- coded for clarity, not efficiency
|
||||
"""
|
||||
|
||||
def __init__(self, sock=None):
|
||||
if sock is None:
|
||||
self.sock = socket.socket(
|
||||
socket.AF_INET, socket.SOCK_STREAM)
|
||||
else:
|
||||
self.sock = sock
|
||||
|
||||
def connect(self, host, port):
|
||||
self.sock.connect((host, port))
|
||||
|
||||
def mysend(self, msg):
|
||||
totalsent = 0
|
||||
while totalsent < MSGLEN:
|
||||
sent = self.sock.send(msg[totalsent:])
|
||||
if sent == 0:
|
||||
raise RuntimeError("socket connection broken")
|
||||
totalsent = totalsent + sent
|
||||
|
||||
def myreceive(self):
|
||||
chunks = []
|
||||
bytes_recd = 0
|
||||
while bytes_recd < MSGLEN:
|
||||
chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048))
|
||||
if chunk == b'':
|
||||
raise RuntimeError("socket connection broken")
|
||||
chunks.append(chunk)
|
||||
bytes_recd = bytes_recd + len(chunk)
|
||||
return b''.join(chunks)
|
||||
|
||||
The sending code here is usable for almost any messaging scheme - in Python you
|
||||
send strings, and you can use ``len()`` to determine its length (even if it has
|
||||
embedded ``\0`` characters). It's mostly the receiving code that gets more
|
||||
complex. (And in C, it's not much worse, except you can't use ``strlen`` if the
|
||||
message has embedded ``\0``\ s.)
|
||||
|
||||
The easiest enhancement is to make the first character of the message an
|
||||
indicator of message type, and have the type determine the length. Now you have
|
||||
two ``recv``\ s - the first to get (at least) that first character so you can
|
||||
look up the length, and the second in a loop to get the rest. If you decide to
|
||||
go the delimited route, you'll be receiving in some arbitrary chunk size, (4096
|
||||
or 8192 is frequently a good match for network buffer sizes), and scanning what
|
||||
you've received for a delimiter.
|
||||
|
||||
One complication to be aware of: if your conversational protocol allows multiple
|
||||
messages to be sent back to back (without some kind of reply), and you pass
|
||||
``recv`` an arbitrary chunk size, you may end up reading the start of a
|
||||
following message. You'll need to put that aside and hold onto it, until it's
|
||||
needed.
|
||||
|
||||
Prefixing the message with its length (say, as 5 numeric characters) gets more
|
||||
complex, because (believe it or not), you may not get all 5 characters in one
|
||||
``recv``. In playing around, you'll get away with it; but in high network loads,
|
||||
your code will very quickly break unless you use two ``recv`` loops - the first
|
||||
to determine the length, the second to get the data part of the message. Nasty.
|
||||
This is also when you'll discover that ``send`` does not always manage to get
|
||||
rid of everything in one pass. And despite having read this, you will eventually
|
||||
get bit by it!
|
||||
|
||||
In the interests of space, building your character, (and preserving my
|
||||
competitive position), these enhancements are left as an exercise for the
|
||||
reader. Lets move on to cleaning up.
|
||||
|
||||
|
||||
Binary Data
|
||||
-----------
|
||||
|
||||
It is perfectly possible to send binary data over a socket. The major problem is
|
||||
that not all machines use the same formats for binary data. For example, a
|
||||
Motorola chip will represent a 16 bit integer with the value 1 as the two hex
|
||||
bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
|
||||
Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
|
||||
htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
|
||||
*short* and "l" means *long*. Where network order is host order, these do
|
||||
nothing, but where the machine is byte-reversed, these swap the bytes around
|
||||
appropriately.
|
||||
|
||||
In these days of 32 bit machines, the ascii representation of binary data is
|
||||
frequently smaller than the binary representation. That's because a surprising
|
||||
amount of the time, all those longs have the value 0, or maybe 1. The string "0"
|
||||
would be two bytes, while binary is four. Of course, this doesn't fit well with
|
||||
fixed-length messages. Decisions, decisions.
|
||||
|
||||
|
||||
Disconnecting
|
||||
=============
|
||||
|
||||
Strictly speaking, you're supposed to use ``shutdown`` on a socket before you
|
||||
``close`` it. The ``shutdown`` is an advisory to the socket at the other end.
|
||||
Depending on the argument you pass it, it can mean "I'm not going to send
|
||||
anymore, but I'll still listen", or "I'm not listening, good riddance!". Most
|
||||
socket libraries, however, are so used to programmers neglecting to use this
|
||||
piece of etiquette that normally a ``close`` is the same as ``shutdown();
|
||||
close()``. So in most situations, an explicit ``shutdown`` is not needed.
|
||||
|
||||
One way to use ``shutdown`` effectively is in an HTTP-like exchange. The client
|
||||
sends a request and then does a ``shutdown(1)``. This tells the server "This
|
||||
client is done sending, but can still receive." The server can detect "EOF" by
|
||||
a receive of 0 bytes. It can assume it has the complete request. The server
|
||||
sends a reply. If the ``send`` completes successfully then, indeed, the client
|
||||
was still receiving.
|
||||
|
||||
Python takes the automatic shutdown a step further, and says that when a socket
|
||||
is garbage collected, it will automatically do a ``close`` if it's needed. But
|
||||
relying on this is a very bad habit. If your socket just disappears without
|
||||
doing a ``close``, the socket at the other end may hang indefinitely, thinking
|
||||
you're just being slow. *Please* ``close`` your sockets when you're done.
|
||||
|
||||
|
||||
When Sockets Die
|
||||
----------------
|
||||
|
||||
Probably the worst thing about using blocking sockets is what happens when the
|
||||
other side comes down hard (without doing a ``close``). Your socket is likely to
|
||||
hang. TCP is a reliable protocol, and it will wait a long, long time
|
||||
before giving up on a connection. If you're using threads, the entire thread is
|
||||
essentially dead. There's not much you can do about it. As long as you aren't
|
||||
doing something dumb, like holding a lock while doing a blocking read, the
|
||||
thread isn't really consuming much in the way of resources. Do *not* try to kill
|
||||
the thread - part of the reason that threads are more efficient than processes
|
||||
is that they avoid the overhead associated with the automatic recycling of
|
||||
resources. In other words, if you do manage to kill the thread, your whole
|
||||
process is likely to be screwed up.
|
||||
|
||||
|
||||
Non-blocking Sockets
|
||||
====================
|
||||
|
||||
If you've understood the preceding, you already know most of what you need to
|
||||
know about the mechanics of using sockets. You'll still use the same calls, in
|
||||
much the same ways. It's just that, if you do it right, your app will be almost
|
||||
inside-out.
|
||||
|
||||
In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
|
||||
more complex, (for one thing, you'll need to choose between the BSD flavor
|
||||
``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
|
||||
is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
|
||||
do this after creating the socket, but before using it. (Actually, if you're
|
||||
nuts, you can switch back and forth.)
|
||||
|
||||
The major mechanical difference is that ``send``, ``recv``, ``connect`` and
|
||||
``accept`` can return without having done anything. You have (of course) a
|
||||
number of choices. You can check return code and error codes and generally drive
|
||||
yourself crazy. If you don't believe me, try it sometime. Your app will grow
|
||||
large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
|
||||
right.
|
||||
|
||||
Use ``select``.
|
||||
|
||||
In C, coding ``select`` is fairly complex. In Python, it's a piece of cake, but
|
||||
it's close enough to the C version that if you understand ``select`` in Python,
|
||||
you'll have little trouble with it in C::
|
||||
|
||||
ready_to_read, ready_to_write, in_error = \
|
||||
select.select(
|
||||
potential_readers,
|
||||
potential_writers,
|
||||
potential_errs,
|
||||
timeout)
|
||||
|
||||
You pass ``select`` three lists: the first contains all sockets that you might
|
||||
want to try reading; the second all the sockets you might want to try writing
|
||||
to, and the last (normally left empty) those that you want to check for errors.
|
||||
You should note that a socket can go into more than one list. The ``select``
|
||||
call is blocking, but you can give it a timeout. This is generally a sensible
|
||||
thing to do - give it a nice long timeout (say a minute) unless you have good
|
||||
reason to do otherwise.
|
||||
|
||||
In return, you will get three lists. They contain the sockets that are actually
|
||||
readable, writable and in error. Each of these lists is a subset (possibly
|
||||
empty) of the corresponding list you passed in.
|
||||
|
||||
If a socket is in the output readable list, you can be
|
||||
as-close-to-certain-as-we-ever-get-in-this-business that a ``recv`` on that
|
||||
socket will return *something*. Same idea for the writable list. You'll be able
|
||||
to send *something*. Maybe not all you want to, but *something* is better than
|
||||
nothing. (Actually, any reasonably healthy socket will return as writable - it
|
||||
just means outbound network buffer space is available.)
|
||||
|
||||
If you have a "server" socket, put it in the potential_readers list. If it comes
|
||||
out in the readable list, your ``accept`` will (almost certainly) work. If you
|
||||
have created a new socket to ``connect`` to someone else, put it in the
|
||||
potential_writers list. If it shows up in the writable list, you have a decent
|
||||
chance that it has connected.
|
||||
|
||||
Actually, ``select`` can be handy even with blocking sockets. It's one way of
|
||||
determining whether you will block - the socket returns as readable when there's
|
||||
something in the buffers. However, this still doesn't help with the problem of
|
||||
determining whether the other end is done, or just busy with something else.
|
||||
|
||||
**Portability alert**: On Unix, ``select`` works both with the sockets and
|
||||
files. Don't try this on Windows. On Windows, ``select`` works with sockets
|
||||
only. Also note that in C, many of the more advanced socket options are done
|
||||
differently on Windows. In fact, on Windows I usually use threads (which work
|
||||
very, very well) with my sockets.
|
||||
|
||||
|
293
third_party/python/Doc/howto/sorting.rst
vendored
Normal file
293
third_party/python/Doc/howto/sorting.rst
vendored
Normal file
|
@ -0,0 +1,293 @@
|
|||
.. _sortinghowto:
|
||||
|
||||
Sorting HOW TO
|
||||
**************
|
||||
|
||||
:Author: Andrew Dalke and Raymond Hettinger
|
||||
:Release: 0.1
|
||||
|
||||
|
||||
Python lists have a built-in :meth:`list.sort` method that modifies the list
|
||||
in-place. There is also a :func:`sorted` built-in function that builds a new
|
||||
sorted list from an iterable.
|
||||
|
||||
In this document, we explore the various techniques for sorting data using Python.
|
||||
|
||||
|
||||
Sorting Basics
|
||||
==============
|
||||
|
||||
A simple ascending sort is very easy: just call the :func:`sorted` function. It
|
||||
returns a new sorted list::
|
||||
|
||||
>>> sorted([5, 2, 3, 1, 4])
|
||||
[1, 2, 3, 4, 5]
|
||||
|
||||
You can also use the :meth:`list.sort` method. It modifies the list
|
||||
in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
|
||||
than :func:`sorted` - but if you don't need the original list, it's slightly
|
||||
more efficient.
|
||||
|
||||
>>> a = [5, 2, 3, 1, 4]
|
||||
>>> a.sort()
|
||||
>>> a
|
||||
[1, 2, 3, 4, 5]
|
||||
|
||||
Another difference is that the :meth:`list.sort` method is only defined for
|
||||
lists. In contrast, the :func:`sorted` function accepts any iterable.
|
||||
|
||||
>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
|
||||
[1, 2, 3, 4, 5]
|
||||
|
||||
Key Functions
|
||||
=============
|
||||
|
||||
Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
|
||||
function to be called on each list element prior to making comparisons.
|
||||
|
||||
For example, here's a case-insensitive string comparison:
|
||||
|
||||
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
|
||||
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
|
||||
|
||||
The value of the *key* parameter should be a function that takes a single argument
|
||||
and returns a key to use for sorting purposes. This technique is fast because
|
||||
the key function is called exactly once for each input record.
|
||||
|
||||
A common pattern is to sort complex objects using some of the object's indices
|
||||
as keys. For example:
|
||||
|
||||
>>> student_tuples = [
|
||||
... ('john', 'A', 15),
|
||||
... ('jane', 'B', 12),
|
||||
... ('dave', 'B', 10),
|
||||
... ]
|
||||
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
The same technique works for objects with named attributes. For example:
|
||||
|
||||
>>> class Student:
|
||||
... def __init__(self, name, grade, age):
|
||||
... self.name = name
|
||||
... self.grade = grade
|
||||
... self.age = age
|
||||
... def __repr__(self):
|
||||
... return repr((self.name, self.grade, self.age))
|
||||
|
||||
>>> student_objects = [
|
||||
... Student('john', 'A', 15),
|
||||
... Student('jane', 'B', 12),
|
||||
... Student('dave', 'B', 10),
|
||||
... ]
|
||||
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
Operator Module Functions
|
||||
=========================
|
||||
|
||||
The key-function patterns shown above are very common, so Python provides
|
||||
convenience functions to make accessor functions easier and faster. The
|
||||
:mod:`operator` module has :func:`~operator.itemgetter`,
|
||||
:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
|
||||
|
||||
Using those functions, the above examples become simpler and faster:
|
||||
|
||||
>>> from operator import itemgetter, attrgetter
|
||||
|
||||
>>> sorted(student_tuples, key=itemgetter(2))
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
>>> sorted(student_objects, key=attrgetter('age'))
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
The operator module functions allow multiple levels of sorting. For example, to
|
||||
sort by *grade* then by *age*:
|
||||
|
||||
>>> sorted(student_tuples, key=itemgetter(1,2))
|
||||
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
|
||||
|
||||
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
|
||||
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
|
||||
|
||||
Ascending and Descending
|
||||
========================
|
||||
|
||||
Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
|
||||
boolean value. This is used to flag descending sorts. For example, to get the
|
||||
student data in reverse *age* order:
|
||||
|
||||
>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
|
||||
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
|
||||
|
||||
>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
|
||||
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
|
||||
|
||||
Sort Stability and Complex Sorts
|
||||
================================
|
||||
|
||||
Sorts are guaranteed to be `stable
|
||||
<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
|
||||
when multiple records have the same key, their original order is preserved.
|
||||
|
||||
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
|
||||
>>> sorted(data, key=itemgetter(0))
|
||||
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
|
||||
|
||||
Notice how the two records for *blue* retain their original order so that
|
||||
``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
|
||||
|
||||
This wonderful property lets you build complex sorts in a series of sorting
|
||||
steps. For example, to sort the student data by descending *grade* and then
|
||||
ascending *age*, do the *age* sort first and then sort again using *grade*:
|
||||
|
||||
>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
|
||||
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
|
||||
does multiple sorts efficiently because it can take advantage of any ordering
|
||||
already present in a dataset.
|
||||
|
||||
The Old Way Using Decorate-Sort-Undecorate
|
||||
==========================================
|
||||
|
||||
This idiom is called Decorate-Sort-Undecorate after its three steps:
|
||||
|
||||
* First, the initial list is decorated with new values that control the sort order.
|
||||
|
||||
* Second, the decorated list is sorted.
|
||||
|
||||
* Finally, the decorations are removed, creating a list that contains only the
|
||||
initial values in the new order.
|
||||
|
||||
For example, to sort the student data by *grade* using the DSU approach:
|
||||
|
||||
>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
|
||||
>>> decorated.sort()
|
||||
>>> [student for grade, i, student in decorated] # undecorate
|
||||
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
|
||||
|
||||
This idiom works because tuples are compared lexicographically; the first items
|
||||
are compared; if they are the same then the second items are compared, and so
|
||||
on.
|
||||
|
||||
It is not strictly necessary in all cases to include the index *i* in the
|
||||
decorated list, but including it gives two benefits:
|
||||
|
||||
* The sort is stable -- if two items have the same key, their order will be
|
||||
preserved in the sorted list.
|
||||
|
||||
* The original items do not have to be comparable because the ordering of the
|
||||
decorated tuples will be determined by at most the first two items. So for
|
||||
example the original list could contain complex numbers which cannot be sorted
|
||||
directly.
|
||||
|
||||
Another name for this idiom is
|
||||
`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
|
||||
after Randal L. Schwartz, who popularized it among Perl programmers.
|
||||
|
||||
Now that Python sorting provides key-functions, this technique is not often needed.
|
||||
|
||||
|
||||
The Old Way Using the *cmp* Parameter
|
||||
=====================================
|
||||
|
||||
Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
|
||||
there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
|
||||
arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
|
||||
handle user specified comparison functions.
|
||||
|
||||
In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
|
||||
simplify and unify the language, eliminating the conflict between rich
|
||||
comparisons and the :meth:`__cmp__` magic method).
|
||||
|
||||
In Py2.x, sort allowed an optional function which can be called for doing the
|
||||
comparisons. That function should take two arguments to be compared and then
|
||||
return a negative value for less-than, return zero if they are equal, or return
|
||||
a positive value for greater-than. For example, we can do:
|
||||
|
||||
>>> def numeric_compare(x, y):
|
||||
... return x - y
|
||||
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP
|
||||
[1, 2, 3, 4, 5]
|
||||
|
||||
Or you can reverse the order of comparison with:
|
||||
|
||||
>>> def reverse_numeric(x, y):
|
||||
... return y - x
|
||||
>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP
|
||||
[5, 4, 3, 2, 1]
|
||||
|
||||
When porting code from Python 2.x to 3.x, the situation can arise when you have
|
||||
the user supplying a comparison function and you need to convert that to a key
|
||||
function. The following wrapper makes that easy to do::
|
||||
|
||||
def cmp_to_key(mycmp):
|
||||
'Convert a cmp= function into a key= function'
|
||||
class K:
|
||||
def __init__(self, obj, *args):
|
||||
self.obj = obj
|
||||
def __lt__(self, other):
|
||||
return mycmp(self.obj, other.obj) < 0
|
||||
def __gt__(self, other):
|
||||
return mycmp(self.obj, other.obj) > 0
|
||||
def __eq__(self, other):
|
||||
return mycmp(self.obj, other.obj) == 0
|
||||
def __le__(self, other):
|
||||
return mycmp(self.obj, other.obj) <= 0
|
||||
def __ge__(self, other):
|
||||
return mycmp(self.obj, other.obj) >= 0
|
||||
def __ne__(self, other):
|
||||
return mycmp(self.obj, other.obj) != 0
|
||||
return K
|
||||
|
||||
To convert to a key function, just wrap the old comparison function:
|
||||
|
||||
.. testsetup::
|
||||
|
||||
from functools import cmp_to_key
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
|
||||
[5, 4, 3, 2, 1]
|
||||
|
||||
In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
|
||||
:mod:`functools` module in the standard library.
|
||||
|
||||
Odd and Ends
|
||||
============
|
||||
|
||||
* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
|
||||
:func:`locale.strcoll` for a comparison function.
|
||||
|
||||
* The *reverse* parameter still maintains sort stability (so that records with
|
||||
equal keys retain the original order). Interestingly, that effect can be
|
||||
simulated without the parameter by using the builtin :func:`reversed` function
|
||||
twice:
|
||||
|
||||
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
|
||||
>>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
|
||||
>>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
|
||||
>>> assert standard_way == double_reversed
|
||||
>>> standard_way
|
||||
[('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
|
||||
|
||||
* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
|
||||
between two objects. So, it is easy to add a standard sort order to a class by
|
||||
defining an :meth:`__lt__` method::
|
||||
|
||||
>>> Student.__lt__ = lambda self, other: self.age < other.age
|
||||
>>> sorted(student_objects)
|
||||
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
|
||||
|
||||
* Key functions need not depend directly on the objects being sorted. A key
|
||||
function can also access external resources. For instance, if the student grades
|
||||
are stored in a dictionary, they can be used to sort a separate list of student
|
||||
names:
|
||||
|
||||
>>> students = ['dave', 'john', 'jane']
|
||||
>>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
|
||||
>>> sorted(students, key=newgrades.__getitem__)
|
||||
['jane', 'dave', 'john']
|
733
third_party/python/Doc/howto/unicode.rst
vendored
Normal file
733
third_party/python/Doc/howto/unicode.rst
vendored
Normal file
|
@ -0,0 +1,733 @@
|
|||
.. _unicode-howto:
|
||||
|
||||
*****************
|
||||
Unicode HOWTO
|
||||
*****************
|
||||
|
||||
:Release: 1.12
|
||||
|
||||
This HOWTO discusses Python support for Unicode, and explains
|
||||
various problems that people commonly encounter when trying to work
|
||||
with Unicode.
|
||||
|
||||
Introduction to Unicode
|
||||
=======================
|
||||
|
||||
History of Character Codes
|
||||
--------------------------
|
||||
|
||||
In 1968, the American Standard Code for Information Interchange, better known by
|
||||
its acronym ASCII, was standardized. ASCII defined numeric codes for various
|
||||
characters, with the numeric values running from 0 to 127. For example, the
|
||||
lowercase letter 'a' is assigned 97 as its code value.
|
||||
|
||||
ASCII was an American-developed standard, so it only defined unaccented
|
||||
characters. There was an 'e', but no 'é' or 'Í'. This meant that languages
|
||||
which required accented characters couldn't be faithfully represented in ASCII.
|
||||
(Actually the missing accents matter for English, too, which contains words such
|
||||
as 'naïve' and 'café', and some publications have house styles which require
|
||||
spellings such as 'coöperate'.)
|
||||
|
||||
For a while people just wrote programs that didn't display accents.
|
||||
In the mid-1980s an Apple II BASIC program written by a French speaker
|
||||
might have lines like these:
|
||||
|
||||
.. code-block:: basic
|
||||
|
||||
PRINT "MISE A JOUR TERMINEE"
|
||||
PRINT "PARAMETRES ENREGISTRES"
|
||||
|
||||
Those messages should contain accents (terminée, paramètre, enregistrés) and
|
||||
they just look wrong to someone who can read French.
|
||||
|
||||
In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
|
||||
hold values ranging from 0 to 255. ASCII codes only went up to 127, so some
|
||||
machines assigned values between 128 and 255 to accented characters. Different
|
||||
machines had different codes, however, which led to problems exchanging files.
|
||||
Eventually various commonly used sets of values for the 128--255 range emerged.
|
||||
Some were true standards, defined by the International Organization for
|
||||
Standardization, and some were *de facto* conventions that were invented by one
|
||||
company or another and managed to catch on.
|
||||
|
||||
255 characters aren't very many. For example, you can't fit both the accented
|
||||
characters used in Western Europe and the Cyrillic alphabet used for Russian
|
||||
into the 128--255 range because there are more than 128 such characters.
|
||||
|
||||
You could write files using different codes (all your Russian files in a coding
|
||||
system called KOI8, all your French files in a different coding system called
|
||||
Latin1), but what if you wanted to write a French document that quotes some
|
||||
Russian text? In the 1980s people began to want to solve this problem, and the
|
||||
Unicode standardization effort began.
|
||||
|
||||
Unicode started out using 16-bit characters instead of 8-bit characters. 16
|
||||
bits means you have 2^16 = 65,536 distinct values available, making it possible
|
||||
to represent many different characters from many different alphabets; an initial
|
||||
goal was to have Unicode contain the alphabets for every single human language.
|
||||
It turns out that even 16 bits isn't enough to meet that goal, and the modern
|
||||
Unicode specification uses a wider range of codes, 0 through 1,114,111 (
|
||||
``0x10FFFF`` in base 16).
|
||||
|
||||
There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were
|
||||
originally separate efforts, but the specifications were merged with the 1.1
|
||||
revision of Unicode.
|
||||
|
||||
(This discussion of Unicode's history is highly simplified. The
|
||||
precise historical details aren't necessary for understanding how to
|
||||
use Unicode effectively, but if you're curious, consult the Unicode
|
||||
consortium site listed in the References or
|
||||
the `Wikipedia entry for Unicode <https://en.wikipedia.org/wiki/Unicode#History>`_
|
||||
for more information.)
|
||||
|
||||
|
||||
Definitions
|
||||
-----------
|
||||
|
||||
A **character** is the smallest possible component of a text. 'A', 'B', 'C',
|
||||
etc., are all different characters. So are 'È' and 'Í'. Characters are
|
||||
abstractions, and vary depending on the language or context you're talking
|
||||
about. For example, the symbol for ohms (Ω) is usually drawn much like the
|
||||
capital letter omega (Ω) in the Greek alphabet (they may even be the same in
|
||||
some fonts), but these are two different characters that have different
|
||||
meanings.
|
||||
|
||||
The Unicode standard describes how characters are represented by **code
|
||||
points**. A code point is an integer value, usually denoted in base 16. In the
|
||||
standard, a code point is written using the notation ``U+12CA`` to mean the
|
||||
character with value ``0x12ca`` (4,810 decimal). The Unicode standard contains
|
||||
a lot of tables listing characters and their corresponding code points:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
0061 'a'; LATIN SMALL LETTER A
|
||||
0062 'b'; LATIN SMALL LETTER B
|
||||
0063 'c'; LATIN SMALL LETTER C
|
||||
...
|
||||
007B '{'; LEFT CURLY BRACKET
|
||||
|
||||
Strictly, these definitions imply that it's meaningless to say 'this is
|
||||
character ``U+12CA``'. ``U+12CA`` is a code point, which represents some particular
|
||||
character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'. In
|
||||
informal contexts, this distinction between code points and characters will
|
||||
sometimes be forgotten.
|
||||
|
||||
A character is represented on a screen or on paper by a set of graphical
|
||||
elements that's called a **glyph**. The glyph for an uppercase A, for example,
|
||||
is two diagonal strokes and a horizontal stroke, though the exact details will
|
||||
depend on the font being used. Most Python code doesn't need to worry about
|
||||
glyphs; figuring out the correct glyph to display is generally the job of a GUI
|
||||
toolkit or a terminal's font renderer.
|
||||
|
||||
|
||||
Encodings
|
||||
---------
|
||||
|
||||
To summarize the previous section: a Unicode string is a sequence of code
|
||||
points, which are numbers from 0 through ``0x10FFFF`` (1,114,111 decimal). This
|
||||
sequence needs to be represented as a set of bytes (meaning, values
|
||||
from 0 through 255) in memory. The rules for translating a Unicode string
|
||||
into a sequence of bytes are called an **encoding**.
|
||||
|
||||
The first encoding you might think of is an array of 32-bit integers. In this
|
||||
representation, the string "Python" would look like this:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
P y t h o n
|
||||
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
|
||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
||||
|
||||
This representation is straightforward but using it presents a number of
|
||||
problems.
|
||||
|
||||
1. It's not portable; different processors order the bytes differently.
|
||||
|
||||
2. It's very wasteful of space. In most texts, the majority of the code points
|
||||
are less than 127, or less than 255, so a lot of space is occupied by ``0x00``
|
||||
bytes. The above string takes 24 bytes compared to the 6 bytes needed for an
|
||||
ASCII representation. Increased RAM usage doesn't matter too much (desktop
|
||||
computers have gigabytes of RAM, and strings aren't usually that large), but
|
||||
expanding our usage of disk and network bandwidth by a factor of 4 is
|
||||
intolerable.
|
||||
|
||||
3. It's not compatible with existing C functions such as ``strlen()``, so a new
|
||||
family of wide string functions would need to be used.
|
||||
|
||||
4. Many Internet standards are defined in terms of textual data, and can't
|
||||
handle content with embedded zero bytes.
|
||||
|
||||
Generally people don't use this encoding, instead choosing other
|
||||
encodings that are more efficient and convenient. UTF-8 is probably
|
||||
the most commonly supported encoding; it will be discussed below.
|
||||
|
||||
Encodings don't have to handle every possible Unicode character, and most
|
||||
encodings don't. The rules for converting a Unicode string into the ASCII
|
||||
encoding, for example, are simple; for each code point:
|
||||
|
||||
1. If the code point is < 128, each byte is the same as the value of the code
|
||||
point.
|
||||
|
||||
2. If the code point is 128 or greater, the Unicode string can't be represented
|
||||
in this encoding. (Python raises a :exc:`UnicodeEncodeError` exception in this
|
||||
case.)
|
||||
|
||||
Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points
|
||||
0--255 are identical to the Latin-1 values, so converting to this encoding simply
|
||||
requires converting code points to byte values; if a code point larger than 255
|
||||
is encountered, the string can't be encoded into Latin-1.
|
||||
|
||||
Encodings don't have to be simple one-to-one mappings like Latin-1. Consider
|
||||
IBM's EBCDIC, which was used on IBM mainframes. Letter values weren't in one
|
||||
block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r' were 145
|
||||
through 153. If you wanted to use EBCDIC as an encoding, you'd probably use
|
||||
some sort of lookup table to perform the conversion, but this is largely an
|
||||
internal detail.
|
||||
|
||||
UTF-8 is one of the most commonly used encodings. UTF stands for "Unicode
|
||||
Transformation Format", and the '8' means that 8-bit numbers are used in the
|
||||
encoding. (There are also a UTF-16 and UTF-32 encodings, but they are less
|
||||
frequently used than UTF-8.) UTF-8 uses the following rules:
|
||||
|
||||
1. If the code point is < 128, it's represented by the corresponding byte value.
|
||||
2. If the code point is >= 128, it's turned into a sequence of two, three, or
|
||||
four bytes, where each byte of the sequence is between 128 and 255.
|
||||
|
||||
UTF-8 has several convenient properties:
|
||||
|
||||
1. It can handle any Unicode code point.
|
||||
2. A Unicode string is turned into a sequence of bytes containing no embedded zero
|
||||
bytes. This avoids byte-ordering issues, and means UTF-8 strings can be
|
||||
processed by C functions such as ``strcpy()`` and sent through protocols that
|
||||
can't handle zero bytes.
|
||||
3. A string of ASCII text is also valid UTF-8 text.
|
||||
4. UTF-8 is fairly compact; the majority of commonly used characters can be
|
||||
represented with one or two bytes.
|
||||
5. If bytes are corrupted or lost, it's possible to determine the start of the
|
||||
next UTF-8-encoded code point and resynchronize. It's also unlikely that
|
||||
random 8-bit data will look like valid UTF-8.
|
||||
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
The `Unicode Consortium site <http://www.unicode.org>`_ has character charts, a
|
||||
glossary, and PDF versions of the Unicode specification. Be prepared for some
|
||||
difficult reading. `A chronology <http://www.unicode.org/history/>`_ of the
|
||||
origin and development of Unicode is also available on the site.
|
||||
|
||||
To help understand the standard, Jukka Korpela has written `an introductory
|
||||
guide <https://www.cs.tut.fi/~jkorpela/unicode/guide.html>`_ to reading the
|
||||
Unicode character tables.
|
||||
|
||||
Another `good introductory article <http://www.joelonsoftware.com/articles/Unicode.html>`_
|
||||
was written by Joel Spolsky.
|
||||
If this introduction didn't make things clear to you, you should try
|
||||
reading this alternate article before continuing.
|
||||
|
||||
Wikipedia entries are often helpful; see the entries for "`character encoding
|
||||
<https://en.wikipedia.org/wiki/Character_encoding>`_" and `UTF-8
|
||||
<https://en.wikipedia.org/wiki/UTF-8>`_, for example.
|
||||
|
||||
|
||||
Python's Unicode Support
|
||||
========================
|
||||
|
||||
Now that you've learned the rudiments of Unicode, we can look at Python's
|
||||
Unicode features.
|
||||
|
||||
The String Type
|
||||
---------------
|
||||
|
||||
Since Python 3.0, the language features a :class:`str` type that contain Unicode
|
||||
characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
|
||||
rocks!'``, or the triple-quoted string syntax is stored as Unicode.
|
||||
|
||||
The default encoding for Python source code is UTF-8, so you can simply
|
||||
include a Unicode character in a string literal::
|
||||
|
||||
try:
|
||||
with open('/tmp/input.txt', 'r') as f:
|
||||
...
|
||||
except OSError:
|
||||
# 'File not found' error message.
|
||||
print("Fichier non trouvé")
|
||||
|
||||
You can use a different encoding from UTF-8 by putting a specially-formatted
|
||||
comment as the first or second line of the source code::
|
||||
|
||||
# -*- coding: <encoding name> -*-
|
||||
|
||||
Side note: Python 3 also supports using Unicode characters in identifiers::
|
||||
|
||||
répertoire = "/tmp/records.log"
|
||||
with open(répertoire, "w") as f:
|
||||
f.write("test\n")
|
||||
|
||||
If you can't enter a particular character in your editor or want to
|
||||
keep the source code ASCII-only for some reason, you can also use
|
||||
escape sequences in string literals. (Depending on your system,
|
||||
you may see the actual capital-delta glyph instead of a \u escape.) ::
|
||||
|
||||
>>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name
|
||||
'\u0394'
|
||||
>>> "\u0394" # Using a 16-bit hex value
|
||||
'\u0394'
|
||||
>>> "\U00000394" # Using a 32-bit hex value
|
||||
'\u0394'
|
||||
|
||||
In addition, one can create a string using the :func:`~bytes.decode` method of
|
||||
:class:`bytes`. This method takes an *encoding* argument, such as ``UTF-8``,
|
||||
and optionally an *errors* argument.
|
||||
|
||||
The *errors* argument specifies the response when the input string can't be
|
||||
converted according to the encoding's rules. Legal values for this argument are
|
||||
``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
|
||||
``U+FFFD``, ``REPLACEMENT CHARACTER``), ``'ignore'`` (just leave the
|
||||
character out of the Unicode result), or ``'backslashreplace'`` (inserts a
|
||||
``\xNN`` escape sequence).
|
||||
The following examples show the differences::
|
||||
|
||||
>>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
|
||||
invalid start byte
|
||||
>>> b'\x80abc'.decode("utf-8", "replace")
|
||||
'\ufffdabc'
|
||||
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
|
||||
'\\x80abc'
|
||||
>>> b'\x80abc'.decode("utf-8", "ignore")
|
||||
'abc'
|
||||
|
||||
Encodings are specified as strings containing the encoding's name. Python 3.2
|
||||
comes with roughly 100 different encodings; see the Python Library Reference at
|
||||
:ref:`standard-encodings` for a list. Some encodings have multiple names; for
|
||||
example, ``'latin-1'``, ``'iso_8859_1'`` and ``'8859``' are all synonyms for
|
||||
the same encoding.
|
||||
|
||||
One-character Unicode strings can also be created with the :func:`chr`
|
||||
built-in function, which takes integers and returns a Unicode string of length 1
|
||||
that contains the corresponding code point. The reverse operation is the
|
||||
built-in :func:`ord` function that takes a one-character Unicode string and
|
||||
returns the code point value::
|
||||
|
||||
>>> chr(57344)
|
||||
'\ue000'
|
||||
>>> ord('\ue000')
|
||||
57344
|
||||
|
||||
Converting to Bytes
|
||||
-------------------
|
||||
|
||||
The opposite method of :meth:`bytes.decode` is :meth:`str.encode`,
|
||||
which returns a :class:`bytes` representation of the Unicode string, encoded in the
|
||||
requested *encoding*.
|
||||
|
||||
The *errors* parameter is the same as the parameter of the
|
||||
:meth:`~bytes.decode` method but supports a few more possible handlers. As well as
|
||||
``'strict'``, ``'ignore'``, and ``'replace'`` (which in this case
|
||||
inserts a question mark instead of the unencodable character), there is
|
||||
also ``'xmlcharrefreplace'`` (inserts an XML character reference),
|
||||
``backslashreplace`` (inserts a ``\uNNNN`` escape sequence) and
|
||||
``namereplace`` (inserts a ``\N{...}`` escape sequence).
|
||||
|
||||
The following example shows the different results::
|
||||
|
||||
>>> u = chr(40960) + 'abcd' + chr(1972)
|
||||
>>> u.encode('utf-8')
|
||||
b'\xea\x80\x80abcd\xde\xb4'
|
||||
>>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
|
||||
Traceback (most recent call last):
|
||||
...
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
|
||||
position 0: ordinal not in range(128)
|
||||
>>> u.encode('ascii', 'ignore')
|
||||
b'abcd'
|
||||
>>> u.encode('ascii', 'replace')
|
||||
b'?abcd?'
|
||||
>>> u.encode('ascii', 'xmlcharrefreplace')
|
||||
b'ꀀabcd޴'
|
||||
>>> u.encode('ascii', 'backslashreplace')
|
||||
b'\\ua000abcd\\u07b4'
|
||||
>>> u.encode('ascii', 'namereplace')
|
||||
b'\\N{YI SYLLABLE IT}abcd\\u07b4'
|
||||
|
||||
The low-level routines for registering and accessing the available
|
||||
encodings are found in the :mod:`codecs` module. Implementing new
|
||||
encodings also requires understanding the :mod:`codecs` module.
|
||||
However, the encoding and decoding functions returned by this module
|
||||
are usually more low-level than is comfortable, and writing new encodings
|
||||
is a specialized task, so the module won't be covered in this HOWTO.
|
||||
|
||||
|
||||
Unicode Literals in Python Source Code
|
||||
--------------------------------------
|
||||
|
||||
In Python source code, specific Unicode code points can be written using the
|
||||
``\u`` escape sequence, which is followed by four hex digits giving the code
|
||||
point. The ``\U`` escape sequence is similar, but expects eight hex digits,
|
||||
not four::
|
||||
|
||||
>>> s = "a\xac\u1234\u20ac\U00008000"
|
||||
... # ^^^^ two-digit hex escape
|
||||
... # ^^^^^^ four-digit Unicode escape
|
||||
... # ^^^^^^^^^^ eight-digit Unicode escape
|
||||
>>> [ord(c) for c in s]
|
||||
[97, 172, 4660, 8364, 32768]
|
||||
|
||||
Using escape sequences for code points greater than 127 is fine in small doses,
|
||||
but becomes an annoyance if you're using many accented characters, as you would
|
||||
in a program with messages in French or some other accent-using language. You
|
||||
can also assemble strings using the :func:`chr` built-in function, but this is
|
||||
even more tedious.
|
||||
|
||||
Ideally, you'd want to be able to write literals in your language's natural
|
||||
encoding. You could then edit Python source code with your favorite editor
|
||||
which would display the accented characters naturally, and have the right
|
||||
characters used at runtime.
|
||||
|
||||
Python supports writing source code in UTF-8 by default, but you can use almost
|
||||
any encoding if you declare the encoding being used. This is done by including
|
||||
a special comment as either the first or second line of the source file::
|
||||
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: latin-1 -*-
|
||||
|
||||
u = 'abcdé'
|
||||
print(ord(u[-1]))
|
||||
|
||||
The syntax is inspired by Emacs's notation for specifying variables local to a
|
||||
file. Emacs supports many different variables, but Python only supports
|
||||
'coding'. The ``-*-`` symbols indicate to Emacs that the comment is special;
|
||||
they have no significance to Python but are a convention. Python looks for
|
||||
``coding: name`` or ``coding=name`` in the comment.
|
||||
|
||||
If you don't include such a comment, the default encoding used will be UTF-8 as
|
||||
already mentioned. See also :pep:`263` for more information.
|
||||
|
||||
|
||||
Unicode Properties
|
||||
------------------
|
||||
|
||||
The Unicode specification includes a database of information about code points.
|
||||
For each defined code point, the information includes the character's
|
||||
name, its category, the numeric value if applicable (Unicode has characters
|
||||
representing the Roman numerals and fractions such as one-third and
|
||||
four-fifths). There are also properties related to the code point's use in
|
||||
bidirectional text and other display-related properties.
|
||||
|
||||
The following program displays some information about several characters, and
|
||||
prints the numeric value of one particular character::
|
||||
|
||||
import unicodedata
|
||||
|
||||
u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
|
||||
|
||||
for i, c in enumerate(u):
|
||||
print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
|
||||
print(unicodedata.name(c))
|
||||
|
||||
# Get numeric value of second character
|
||||
print(unicodedata.numeric(u[1]))
|
||||
|
||||
When run, this prints:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
|
||||
1 0bf2 No TAMIL NUMBER ONE THOUSAND
|
||||
2 0f84 Mn TIBETAN MARK HALANTA
|
||||
3 1770 Lo TAGBANWA LETTER SA
|
||||
4 33af So SQUARE RAD OVER S SQUARED
|
||||
1000.0
|
||||
|
||||
The category codes are abbreviations describing the nature of the character.
|
||||
These are grouped into categories such as "Letter", "Number", "Punctuation", or
|
||||
"Symbol", which in turn are broken up into subcategories. To take the codes
|
||||
from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
|
||||
"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
|
||||
other". See
|
||||
`the General Category Values section of the Unicode Character Database documentation <http://www.unicode.org/reports/tr44/#General_Category_Values>`_ for a
|
||||
list of category codes.
|
||||
|
||||
|
||||
Unicode Regular Expressions
|
||||
---------------------------
|
||||
|
||||
The regular expressions supported by the :mod:`re` module can be provided
|
||||
either as bytes or strings. Some of the special character sequences such as
|
||||
``\d`` and ``\w`` have different meanings depending on whether
|
||||
the pattern is supplied as bytes or a string. For example,
|
||||
``\d`` will match the characters ``[0-9]`` in bytes but
|
||||
in strings will match any character that's in the ``'Nd'`` category.
|
||||
|
||||
The string in this example has the number 57 written in both Thai and
|
||||
Arabic numerals::
|
||||
|
||||
import re
|
||||
p = re.compile(r'\d+')
|
||||
|
||||
s = "Over \u0e55\u0e57 57 flavours"
|
||||
m = p.search(s)
|
||||
print(repr(m.group()))
|
||||
|
||||
When executed, ``\d+`` will match the Thai numerals and print them
|
||||
out. If you supply the :const:`re.ASCII` flag to
|
||||
:func:`~re.compile`, ``\d+`` will match the substring "57" instead.
|
||||
|
||||
Similarly, ``\w`` matches a wide variety of Unicode characters but
|
||||
only ``[a-zA-Z0-9_]`` in bytes or if :const:`re.ASCII` is supplied,
|
||||
and ``\s`` will match either Unicode whitespace characters or
|
||||
``[ \t\n\r\f\v]``.
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
.. comment should these be mentioned earlier, e.g. at the start of the "introduction to Unicode" first section?
|
||||
|
||||
Some good alternative discussions of Python's Unicode support are:
|
||||
|
||||
* `Processing Text Files in Python 3 <http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html>`_, by Nick Coghlan.
|
||||
* `Pragmatic Unicode <http://nedbatchelder.com/text/unipain.html>`_, a PyCon 2012 presentation by Ned Batchelder.
|
||||
|
||||
The :class:`str` type is described in the Python library reference at
|
||||
:ref:`textseq`.
|
||||
|
||||
The documentation for the :mod:`unicodedata` module.
|
||||
|
||||
The documentation for the :mod:`codecs` module.
|
||||
|
||||
Marc-André Lemburg gave `a presentation titled "Python and Unicode" (PDF slides)
|
||||
<https://downloads.egenix.com/python/Unicode-EPC2002-Talk.pdf>`_ at
|
||||
EuroPython 2002. The slides are an excellent overview of the design of Python
|
||||
2's Unicode features (where the Unicode string type is called ``unicode`` and
|
||||
literals start with ``u``).
|
||||
|
||||
|
||||
Reading and Writing Unicode Data
|
||||
================================
|
||||
|
||||
Once you've written some code that works with Unicode data, the next problem is
|
||||
input/output. How do you get Unicode strings into your program, and how do you
|
||||
convert Unicode into a form suitable for storage or transmission?
|
||||
|
||||
It's possible that you may not need to do anything depending on your input
|
||||
sources and output destinations; you should check whether the libraries used in
|
||||
your application support Unicode natively. XML parsers often return Unicode
|
||||
data, for example. Many relational databases also support Unicode-valued
|
||||
columns and can return Unicode values from an SQL query.
|
||||
|
||||
Unicode data is usually converted to a particular encoding before it gets
|
||||
written to disk or sent over a socket. It's possible to do all the work
|
||||
yourself: open a file, read an 8-bit bytes object from it, and convert the bytes
|
||||
with ``bytes.decode(encoding)``. However, the manual approach is not recommended.
|
||||
|
||||
One problem is the multi-byte nature of encodings; one Unicode character can be
|
||||
represented by several bytes. If you want to read the file in arbitrary-sized
|
||||
chunks (say, 1024 or 4096 bytes), you need to write error-handling code to catch the case
|
||||
where only part of the bytes encoding a single Unicode character are read at the
|
||||
end of a chunk. One solution would be to read the entire file into memory and
|
||||
then perform the decoding, but that prevents you from working with files that
|
||||
are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM.
|
||||
(More, really, since for at least a moment you'd need to have both the encoded
|
||||
string and its Unicode version in memory.)
|
||||
|
||||
The solution would be to use the low-level decoding interface to catch the case
|
||||
of partial coding sequences. The work of implementing this has already been
|
||||
done for you: the built-in :func:`open` function can return a file-like object
|
||||
that assumes the file's contents are in a specified encoding and accepts Unicode
|
||||
parameters for methods such as :meth:`~io.TextIOBase.read` and
|
||||
:meth:`~io.TextIOBase.write`. This works through :func:`open`\'s *encoding* and
|
||||
*errors* parameters which are interpreted just like those in :meth:`str.encode`
|
||||
and :meth:`bytes.decode`.
|
||||
|
||||
Reading Unicode from a file is therefore simple::
|
||||
|
||||
with open('unicode.txt', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
print(repr(line))
|
||||
|
||||
It's also possible to open files in update mode, allowing both reading and
|
||||
writing::
|
||||
|
||||
with open('test', encoding='utf-8', mode='w+') as f:
|
||||
f.write('\u4500 blah blah blah\n')
|
||||
f.seek(0)
|
||||
print(repr(f.readline()[:1]))
|
||||
|
||||
The Unicode character ``U+FEFF`` is used as a byte-order mark (BOM), and is often
|
||||
written as the first character of a file in order to assist with autodetection
|
||||
of the file's byte ordering. Some encodings, such as UTF-16, expect a BOM to be
|
||||
present at the start of a file; when such an encoding is used, the BOM will be
|
||||
automatically written as the first character and will be silently dropped when
|
||||
the file is read. There are variants of these encodings, such as 'utf-16-le'
|
||||
and 'utf-16-be' for little-endian and big-endian encodings, that specify one
|
||||
particular byte ordering and don't skip the BOM.
|
||||
|
||||
In some areas, it is also convention to use a "BOM" at the start of UTF-8
|
||||
encoded files; the name is misleading since UTF-8 is not byte-order dependent.
|
||||
The mark simply announces that the file is encoded in UTF-8. Use the
|
||||
'utf-8-sig' codec to automatically skip the mark if present for reading such
|
||||
files.
|
||||
|
||||
|
||||
Unicode filenames
|
||||
-----------------
|
||||
|
||||
Most of the operating systems in common use today support filenames that contain
|
||||
arbitrary Unicode characters. Usually this is implemented by converting the
|
||||
Unicode string into some encoding that varies depending on the system. For
|
||||
example, Mac OS X uses UTF-8 while Windows uses a configurable encoding; on
|
||||
Windows, Python uses the name "mbcs" to refer to whatever the currently
|
||||
configured encoding is. On Unix systems, there will only be a filesystem
|
||||
encoding if you've set the ``LANG`` or ``LC_CTYPE`` environment variables; if
|
||||
you haven't, the default encoding is UTF-8.
|
||||
|
||||
The :func:`sys.getfilesystemencoding` function returns the encoding to use on
|
||||
your current system, in case you want to do the encoding manually, but there's
|
||||
not much reason to bother. When opening a file for reading or writing, you can
|
||||
usually just provide the Unicode string as the filename, and it will be
|
||||
automatically converted to the right encoding for you::
|
||||
|
||||
filename = 'filename\u4500abc'
|
||||
with open(filename, 'w') as f:
|
||||
f.write('blah\n')
|
||||
|
||||
Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
|
||||
filenames.
|
||||
|
||||
The :func:`os.listdir` function returns filenames and raises an issue: should it return
|
||||
the Unicode version of filenames, or should it return bytes containing
|
||||
the encoded versions? :func:`os.listdir` will do both, depending on whether you
|
||||
provided the directory path as bytes or a Unicode string. If you pass a
|
||||
Unicode string as the path, filenames will be decoded using the filesystem's
|
||||
encoding and a list of Unicode strings will be returned, while passing a byte
|
||||
path will return the filenames as bytes. For example,
|
||||
assuming the default filesystem encoding is UTF-8, running the following
|
||||
program::
|
||||
|
||||
fn = 'filename\u4500abc'
|
||||
f = open(fn, 'w')
|
||||
f.close()
|
||||
|
||||
import os
|
||||
print(os.listdir(b'.'))
|
||||
print(os.listdir('.'))
|
||||
|
||||
will produce the following output:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
amk:~$ python t.py
|
||||
[b'filename\xe4\x94\x80abc', ...]
|
||||
['filename\u4500abc', ...]
|
||||
|
||||
The first list contains UTF-8-encoded filenames, and the second list contains
|
||||
the Unicode versions.
|
||||
|
||||
Note that on most occasions, the Unicode APIs should be used. The bytes APIs
|
||||
should only be used on systems where undecodable file names can be present,
|
||||
i.e. Unix systems.
|
||||
|
||||
|
||||
Tips for Writing Unicode-aware Programs
|
||||
---------------------------------------
|
||||
|
||||
This section provides some suggestions on writing software that deals with
|
||||
Unicode.
|
||||
|
||||
The most important tip is:
|
||||
|
||||
Software should only work with Unicode strings internally, decoding the input
|
||||
data as soon as possible and encoding the output only at the end.
|
||||
|
||||
If you attempt to write processing functions that accept both Unicode and byte
|
||||
strings, you will find your program vulnerable to bugs wherever you combine the
|
||||
two different kinds of strings. There is no automatic encoding or decoding: if
|
||||
you do e.g. ``str + bytes``, a :exc:`TypeError` will be raised.
|
||||
|
||||
When using data coming from a web browser or some other untrusted source, a
|
||||
common technique is to check for illegal characters in a string before using the
|
||||
string in a generated command line or storing it in a database. If you're doing
|
||||
this, be careful to check the decoded string, not the encoded bytes data;
|
||||
some encodings may have interesting properties, such as not being bijective
|
||||
or not being fully ASCII-compatible. This is especially true if the input
|
||||
data also specifies the encoding, since the attacker can then choose a
|
||||
clever way to hide malicious text in the encoded bytestream.
|
||||
|
||||
|
||||
Converting Between File Encodings
|
||||
'''''''''''''''''''''''''''''''''
|
||||
|
||||
The :class:`~codecs.StreamRecoder` class can transparently convert between
|
||||
encodings, taking a stream that returns data in encoding #1
|
||||
and behaving like a stream returning data in encoding #2.
|
||||
|
||||
For example, if you have an input file *f* that's in Latin-1, you
|
||||
can wrap it with a :class:`~codecs.StreamRecoder` to return bytes encoded in
|
||||
UTF-8::
|
||||
|
||||
new_f = codecs.StreamRecoder(f,
|
||||
# en/decoder: used by read() to encode its results and
|
||||
# by write() to decode its input.
|
||||
codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'),
|
||||
|
||||
# reader/writer: used to read and write to the stream.
|
||||
codecs.getreader('latin-1'), codecs.getwriter('latin-1') )
|
||||
|
||||
|
||||
Files in an Unknown Encoding
|
||||
''''''''''''''''''''''''''''
|
||||
|
||||
What can you do if you need to make a change to a file, but don't know
|
||||
the file's encoding? If you know the encoding is ASCII-compatible and
|
||||
only want to examine or modify the ASCII parts, you can open the file
|
||||
with the ``surrogateescape`` error handler::
|
||||
|
||||
with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
|
||||
data = f.read()
|
||||
|
||||
# make changes to the string 'data'
|
||||
|
||||
with open(fname + '.new', 'w',
|
||||
encoding="ascii", errors="surrogateescape") as f:
|
||||
f.write(data)
|
||||
|
||||
The ``surrogateescape`` error handler will decode any non-ASCII bytes
|
||||
as code points in the Unicode Private Use Area ranging from U+DC80 to
|
||||
U+DCFF. These private code points will then be turned back into the
|
||||
same bytes when the ``surrogateescape`` error handler is used when
|
||||
encoding the data and writing it back out.
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
One section of `Mastering Python 3 Input/Output
|
||||
<http://pyvideo.org/video/289/pycon-2010--mastering-python-3-i-o>`_,
|
||||
a PyCon 2010 talk by David Beazley, discusses text processing and binary data handling.
|
||||
|
||||
The `PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
|
||||
Applications in Python"
|
||||
<https://downloads.egenix.com/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>`_
|
||||
discuss questions of character encodings as well as how to internationalize
|
||||
and localize an application. These slides cover Python 2.x only.
|
||||
|
||||
`The Guts of Unicode in Python
|
||||
<http://pyvideo.org/video/1768/the-guts-of-unicode-in-python>`_
|
||||
is a PyCon 2013 talk by Benjamin Peterson that discusses the internal Unicode
|
||||
representation in Python 3.3.
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
The initial draft of this document was written by Andrew Kuchling.
|
||||
It has since been revised further by Alexander Belopolsky, Georg Brandl,
|
||||
Andrew Kuchling, and Ezio Melotti.
|
||||
|
||||
Thanks to the following people who have noted errors or offered
|
||||
suggestions on this article: Éric Araujo, Nicholas Bastin, Nick
|
||||
Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André
|
||||
Lemburg, Martin von Löwis, Terry J. Reedy, Chad Whitacre.
|
605
third_party/python/Doc/howto/urllib2.rst
vendored
Normal file
605
third_party/python/Doc/howto/urllib2.rst
vendored
Normal file
|
@ -0,0 +1,605 @@
|
|||
.. _urllib-howto:
|
||||
|
||||
***********************************************************
|
||||
HOWTO Fetch Internet Resources Using The urllib Package
|
||||
***********************************************************
|
||||
|
||||
:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
|
||||
|
||||
.. note::
|
||||
|
||||
There is a French translation of an earlier revision of this
|
||||
HOWTO, available at `urllib2 - Le Manuel manquant
|
||||
<http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_.
|
||||
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
.. sidebar:: Related Articles
|
||||
|
||||
You may also find useful the following article on fetching web resources
|
||||
with Python:
|
||||
|
||||
* `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
|
||||
|
||||
A tutorial on *Basic Authentication*, with examples in Python.
|
||||
|
||||
**urllib.request** is a Python module for fetching URLs
|
||||
(Uniform Resource Locators). It offers a very simple interface, in the form of
|
||||
the *urlopen* function. This is capable of fetching URLs using a variety of
|
||||
different protocols. It also offers a slightly more complex interface for
|
||||
handling common situations - like basic authentication, cookies, proxies and so
|
||||
on. These are provided by objects called handlers and openers.
|
||||
|
||||
urllib.request supports fetching URLs for many "URL schemes" (identified by the string
|
||||
before the ``":"`` in URL - for example ``"ftp"`` is the URL scheme of
|
||||
``"ftp://python.org/"``) using their associated network protocols (e.g. FTP, HTTP).
|
||||
This tutorial focuses on the most common case, HTTP.
|
||||
|
||||
For straightforward situations *urlopen* is very easy to use. But as soon as you
|
||||
encounter errors or non-trivial cases when opening HTTP URLs, you will need some
|
||||
understanding of the HyperText Transfer Protocol. The most comprehensive and
|
||||
authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
|
||||
not intended to be easy to read. This HOWTO aims to illustrate using *urllib*,
|
||||
with enough detail about HTTP to help you through. It is not intended to replace
|
||||
the :mod:`urllib.request` docs, but is supplementary to them.
|
||||
|
||||
|
||||
Fetching URLs
|
||||
=============
|
||||
|
||||
The simplest way to use urllib.request is as follows::
|
||||
|
||||
import urllib.request
|
||||
with urllib.request.urlopen('http://python.org/') as response:
|
||||
html = response.read()
|
||||
|
||||
If you wish to retrieve a resource via URL and store it in a temporary
|
||||
location, you can do so via the :func:`shutil.copyfileobj` and
|
||||
:func:`tempfile.NamedTemporaryFile` functions::
|
||||
|
||||
import shutil
|
||||
import tempfile
|
||||
import urllib.request
|
||||
|
||||
with urllib.request.urlopen('http://python.org/') as response:
|
||||
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
|
||||
shutil.copyfileobj(response, tmp_file)
|
||||
|
||||
with open(tmp_file.name) as html:
|
||||
pass
|
||||
|
||||
Many uses of urllib will be that simple (note that instead of an 'http:' URL we
|
||||
could have used a URL starting with 'ftp:', 'file:', etc.). However, it's the
|
||||
purpose of this tutorial to explain the more complicated cases, concentrating on
|
||||
HTTP.
|
||||
|
||||
HTTP is based on requests and responses - the client makes requests and servers
|
||||
send responses. urllib.request mirrors this with a ``Request`` object which represents
|
||||
the HTTP request you are making. In its simplest form you create a Request
|
||||
object that specifies the URL you want to fetch. Calling ``urlopen`` with this
|
||||
Request object returns a response object for the URL requested. This response is
|
||||
a file-like object, which means you can for example call ``.read()`` on the
|
||||
response::
|
||||
|
||||
import urllib.request
|
||||
|
||||
req = urllib.request.Request('http://www.voidspace.org.uk')
|
||||
with urllib.request.urlopen(req) as response:
|
||||
the_page = response.read()
|
||||
|
||||
Note that urllib.request makes use of the same Request interface to handle all URL
|
||||
schemes. For example, you can make an FTP request like so::
|
||||
|
||||
req = urllib.request.Request('ftp://example.com/')
|
||||
|
||||
In the case of HTTP, there are two extra things that Request objects allow you
|
||||
to do: First, you can pass data to be sent to the server. Second, you can pass
|
||||
extra information ("metadata") *about* the data or the about request itself, to
|
||||
the server - this information is sent as HTTP "headers". Let's look at each of
|
||||
these in turn.
|
||||
|
||||
Data
|
||||
----
|
||||
|
||||
Sometimes you want to send data to a URL (often the URL will refer to a CGI
|
||||
(Common Gateway Interface) script or other web application). With HTTP,
|
||||
this is often done using what's known as a **POST** request. This is often what
|
||||
your browser does when you submit a HTML form that you filled in on the web. Not
|
||||
all POSTs have to come from forms: you can use a POST to transmit arbitrary data
|
||||
to your own application. In the common case of HTML forms, the data needs to be
|
||||
encoded in a standard way, and then passed to the Request object as the ``data``
|
||||
argument. The encoding is done using a function from the :mod:`urllib.parse`
|
||||
library. ::
|
||||
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
|
||||
url = 'http://www.someserver.com/cgi-bin/register.cgi'
|
||||
values = {'name' : 'Michael Foord',
|
||||
'location' : 'Northampton',
|
||||
'language' : 'Python' }
|
||||
|
||||
data = urllib.parse.urlencode(values)
|
||||
data = data.encode('ascii') # data should be bytes
|
||||
req = urllib.request.Request(url, data)
|
||||
with urllib.request.urlopen(req) as response:
|
||||
the_page = response.read()
|
||||
|
||||
Note that other encodings are sometimes required (e.g. for file upload from HTML
|
||||
forms - see `HTML Specification, Form Submission
|
||||
<https://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
|
||||
details).
|
||||
|
||||
If you do not pass the ``data`` argument, urllib uses a **GET** request. One
|
||||
way in which GET and POST requests differ is that POST requests often have
|
||||
"side-effects": they change the state of the system in some way (for example by
|
||||
placing an order with the website for a hundredweight of tinned spam to be
|
||||
delivered to your door). Though the HTTP standard makes it clear that POSTs are
|
||||
intended to *always* cause side-effects, and GET requests *never* to cause
|
||||
side-effects, nothing prevents a GET request from having side-effects, nor a
|
||||
POST requests from having no side-effects. Data can also be passed in an HTTP
|
||||
GET request by encoding it in the URL itself.
|
||||
|
||||
This is done as follows::
|
||||
|
||||
>>> import urllib.request
|
||||
>>> import urllib.parse
|
||||
>>> data = {}
|
||||
>>> data['name'] = 'Somebody Here'
|
||||
>>> data['location'] = 'Northampton'
|
||||
>>> data['language'] = 'Python'
|
||||
>>> url_values = urllib.parse.urlencode(data)
|
||||
>>> print(url_values) # The order may differ from below. #doctest: +SKIP
|
||||
name=Somebody+Here&language=Python&location=Northampton
|
||||
>>> url = 'http://www.example.com/example.cgi'
|
||||
>>> full_url = url + '?' + url_values
|
||||
>>> data = urllib.request.urlopen(full_url)
|
||||
|
||||
Notice that the full URL is created by adding a ``?`` to the URL, followed by
|
||||
the encoded values.
|
||||
|
||||
Headers
|
||||
-------
|
||||
|
||||
We'll discuss here one particular HTTP header, to illustrate how to add headers
|
||||
to your HTTP request.
|
||||
|
||||
Some websites [#]_ dislike being browsed by programs, or send different versions
|
||||
to different browsers [#]_. By default urllib identifies itself as
|
||||
``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
|
||||
numbers of the Python release,
|
||||
e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
|
||||
not work. The way a browser identifies itself is through the
|
||||
``User-Agent`` header [#]_. When you create a Request object you can
|
||||
pass a dictionary of headers in. The following example makes the same
|
||||
request as above, but identifies itself as a version of Internet
|
||||
Explorer [#]_. ::
|
||||
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
|
||||
url = 'http://www.someserver.com/cgi-bin/register.cgi'
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
|
||||
values = {'name': 'Michael Foord',
|
||||
'location': 'Northampton',
|
||||
'language': 'Python' }
|
||||
headers = {'User-Agent': user_agent}
|
||||
|
||||
data = urllib.parse.urlencode(values)
|
||||
data = data.encode('ascii')
|
||||
req = urllib.request.Request(url, data, headers)
|
||||
with urllib.request.urlopen(req) as response:
|
||||
the_page = response.read()
|
||||
|
||||
The response also has two useful methods. See the section on `info and geturl`_
|
||||
which comes after we have a look at what happens when things go wrong.
|
||||
|
||||
|
||||
Handling Exceptions
|
||||
===================
|
||||
|
||||
*urlopen* raises :exc:`URLError` when it cannot handle a response (though as
|
||||
usual with Python APIs, built-in exceptions such as :exc:`ValueError`,
|
||||
:exc:`TypeError` etc. may also be raised).
|
||||
|
||||
:exc:`HTTPError` is the subclass of :exc:`URLError` raised in the specific case of
|
||||
HTTP URLs.
|
||||
|
||||
The exception classes are exported from the :mod:`urllib.error` module.
|
||||
|
||||
URLError
|
||||
--------
|
||||
|
||||
Often, URLError is raised because there is no network connection (no route to
|
||||
the specified server), or the specified server doesn't exist. In this case, the
|
||||
exception raised will have a 'reason' attribute, which is a tuple containing an
|
||||
error code and a text error message.
|
||||
|
||||
e.g. ::
|
||||
|
||||
>>> req = urllib.request.Request('http://www.pretend_server.org')
|
||||
>>> try: urllib.request.urlopen(req)
|
||||
... except urllib.error.URLError as e:
|
||||
... print(e.reason) #doctest: +SKIP
|
||||
...
|
||||
(4, 'getaddrinfo failed')
|
||||
|
||||
|
||||
HTTPError
|
||||
---------
|
||||
|
||||
Every HTTP response from the server contains a numeric "status code". Sometimes
|
||||
the status code indicates that the server is unable to fulfil the request. The
|
||||
default handlers will handle some of these responses for you (for example, if
|
||||
the response is a "redirection" that requests the client fetch the document from
|
||||
a different URL, urllib will handle that for you). For those it can't handle,
|
||||
urlopen will raise an :exc:`HTTPError`. Typical errors include '404' (page not
|
||||
found), '403' (request forbidden), and '401' (authentication required).
|
||||
|
||||
See section 10 of :rfc:`2616` for a reference on all the HTTP error codes.
|
||||
|
||||
The :exc:`HTTPError` instance raised will have an integer 'code' attribute, which
|
||||
corresponds to the error sent by the server.
|
||||
|
||||
Error Codes
|
||||
~~~~~~~~~~~
|
||||
|
||||
Because the default handlers handle redirects (codes in the 300 range), and
|
||||
codes in the 100--299 range indicate success, you will usually only see error
|
||||
codes in the 400--599 range.
|
||||
|
||||
:attr:`http.server.BaseHTTPRequestHandler.responses` is a useful dictionary of
|
||||
response codes in that shows all the response codes used by :rfc:`2616`. The
|
||||
dictionary is reproduced here for convenience ::
|
||||
|
||||
# Table mapping response codes to messages; entries have the
|
||||
# form {code: (shortmessage, longmessage)}.
|
||||
responses = {
|
||||
100: ('Continue', 'Request received, please continue'),
|
||||
101: ('Switching Protocols',
|
||||
'Switching to new protocol; obey Upgrade header'),
|
||||
|
||||
200: ('OK', 'Request fulfilled, document follows'),
|
||||
201: ('Created', 'Document created, URL follows'),
|
||||
202: ('Accepted',
|
||||
'Request accepted, processing continues off-line'),
|
||||
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
|
||||
204: ('No Content', 'Request fulfilled, nothing follows'),
|
||||
205: ('Reset Content', 'Clear input form for further input.'),
|
||||
206: ('Partial Content', 'Partial content follows.'),
|
||||
|
||||
300: ('Multiple Choices',
|
||||
'Object has several resources -- see URI list'),
|
||||
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
|
||||
302: ('Found', 'Object moved temporarily -- see URI list'),
|
||||
303: ('See Other', 'Object moved -- see Method and URL list'),
|
||||
304: ('Not Modified',
|
||||
'Document has not changed since given time'),
|
||||
305: ('Use Proxy',
|
||||
'You must use proxy specified in Location to access this '
|
||||
'resource.'),
|
||||
307: ('Temporary Redirect',
|
||||
'Object moved temporarily -- see URI list'),
|
||||
|
||||
400: ('Bad Request',
|
||||
'Bad request syntax or unsupported method'),
|
||||
401: ('Unauthorized',
|
||||
'No permission -- see authorization schemes'),
|
||||
402: ('Payment Required',
|
||||
'No payment -- see charging schemes'),
|
||||
403: ('Forbidden',
|
||||
'Request forbidden -- authorization will not help'),
|
||||
404: ('Not Found', 'Nothing matches the given URI'),
|
||||
405: ('Method Not Allowed',
|
||||
'Specified method is invalid for this server.'),
|
||||
406: ('Not Acceptable', 'URI not available in preferred format.'),
|
||||
407: ('Proxy Authentication Required', 'You must authenticate with '
|
||||
'this proxy before proceeding.'),
|
||||
408: ('Request Timeout', 'Request timed out; try again later.'),
|
||||
409: ('Conflict', 'Request conflict.'),
|
||||
410: ('Gone',
|
||||
'URI no longer exists and has been permanently removed.'),
|
||||
411: ('Length Required', 'Client must specify Content-Length.'),
|
||||
412: ('Precondition Failed', 'Precondition in headers is false.'),
|
||||
413: ('Request Entity Too Large', 'Entity is too large.'),
|
||||
414: ('Request-URI Too Long', 'URI is too long.'),
|
||||
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
|
||||
416: ('Requested Range Not Satisfiable',
|
||||
'Cannot satisfy request range.'),
|
||||
417: ('Expectation Failed',
|
||||
'Expect condition could not be satisfied.'),
|
||||
|
||||
500: ('Internal Server Error', 'Server got itself in trouble'),
|
||||
501: ('Not Implemented',
|
||||
'Server does not support this operation'),
|
||||
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
|
||||
503: ('Service Unavailable',
|
||||
'The server cannot process the request due to a high load'),
|
||||
504: ('Gateway Timeout',
|
||||
'The gateway server did not receive a timely response'),
|
||||
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
|
||||
}
|
||||
|
||||
When an error is raised the server responds by returning an HTTP error code
|
||||
*and* an error page. You can use the :exc:`HTTPError` instance as a response on the
|
||||
page returned. This means that as well as the code attribute, it also has read,
|
||||
geturl, and info, methods as returned by the ``urllib.response`` module::
|
||||
|
||||
>>> req = urllib.request.Request('http://www.python.org/fish.html')
|
||||
>>> try:
|
||||
... urllib.request.urlopen(req)
|
||||
... except urllib.error.HTTPError as e:
|
||||
... print(e.code)
|
||||
... print(e.read()) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
|
||||
...
|
||||
404
|
||||
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
|
||||
...
|
||||
<title>Page Not Found</title>\n
|
||||
...
|
||||
|
||||
Wrapping it Up
|
||||
--------------
|
||||
|
||||
So if you want to be prepared for :exc:`HTTPError` *or* :exc:`URLError` there are two
|
||||
basic approaches. I prefer the second approach.
|
||||
|
||||
Number 1
|
||||
~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
|
||||
from urllib.request import Request, urlopen
|
||||
from urllib.error import URLError, HTTPError
|
||||
req = Request(someurl)
|
||||
try:
|
||||
response = urlopen(req)
|
||||
except HTTPError as e:
|
||||
print('The server couldn\'t fulfill the request.')
|
||||
print('Error code: ', e.code)
|
||||
except URLError as e:
|
||||
print('We failed to reach a server.')
|
||||
print('Reason: ', e.reason)
|
||||
else:
|
||||
# everything is fine
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
The ``except HTTPError`` *must* come first, otherwise ``except URLError``
|
||||
will *also* catch an :exc:`HTTPError`.
|
||||
|
||||
Number 2
|
||||
~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
from urllib.request import Request, urlopen
|
||||
from urllib.error import URLError
|
||||
req = Request(someurl)
|
||||
try:
|
||||
response = urlopen(req)
|
||||
except URLError as e:
|
||||
if hasattr(e, 'reason'):
|
||||
print('We failed to reach a server.')
|
||||
print('Reason: ', e.reason)
|
||||
elif hasattr(e, 'code'):
|
||||
print('The server couldn\'t fulfill the request.')
|
||||
print('Error code: ', e.code)
|
||||
else:
|
||||
# everything is fine
|
||||
|
||||
|
||||
info and geturl
|
||||
===============
|
||||
|
||||
The response returned by urlopen (or the :exc:`HTTPError` instance) has two
|
||||
useful methods :meth:`info` and :meth:`geturl` and is defined in the module
|
||||
:mod:`urllib.response`..
|
||||
|
||||
**geturl** - this returns the real URL of the page fetched. This is useful
|
||||
because ``urlopen`` (or the opener object used) may have followed a
|
||||
redirect. The URL of the page fetched may not be the same as the URL requested.
|
||||
|
||||
**info** - this returns a dictionary-like object that describes the page
|
||||
fetched, particularly the headers sent by the server. It is currently an
|
||||
:class:`http.client.HTTPMessage` instance.
|
||||
|
||||
Typical headers include 'Content-length', 'Content-type', and so on. See the
|
||||
`Quick Reference to HTTP Headers <https://www.cs.tut.fi/~jkorpela/http.html>`_
|
||||
for a useful listing of HTTP headers with brief explanations of their meaning
|
||||
and use.
|
||||
|
||||
|
||||
Openers and Handlers
|
||||
====================
|
||||
|
||||
When you fetch a URL you use an opener (an instance of the perhaps
|
||||
confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using
|
||||
the default opener - via ``urlopen`` - but you can create custom
|
||||
openers. Openers use handlers. All the "heavy lifting" is done by the
|
||||
handlers. Each handler knows how to open URLs for a particular URL scheme (http,
|
||||
ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
|
||||
redirections or HTTP cookies.
|
||||
|
||||
You will want to create openers if you want to fetch URLs with specific handlers
|
||||
installed, for example to get an opener that handles cookies, or to get an
|
||||
opener that does not handle redirections.
|
||||
|
||||
To create an opener, instantiate an ``OpenerDirector``, and then call
|
||||
``.add_handler(some_handler_instance)`` repeatedly.
|
||||
|
||||
Alternatively, you can use ``build_opener``, which is a convenience function for
|
||||
creating opener objects with a single function call. ``build_opener`` adds
|
||||
several handlers by default, but provides a quick way to add more and/or
|
||||
override the default handlers.
|
||||
|
||||
Other sorts of handlers you might want to can handle proxies, authentication,
|
||||
and other common but slightly specialised situations.
|
||||
|
||||
``install_opener`` can be used to make an ``opener`` object the (global) default
|
||||
opener. This means that calls to ``urlopen`` will use the opener you have
|
||||
installed.
|
||||
|
||||
Opener objects have an ``open`` method, which can be called directly to fetch
|
||||
urls in the same way as the ``urlopen`` function: there's no need to call
|
||||
``install_opener``, except as a convenience.
|
||||
|
||||
|
||||
Basic Authentication
|
||||
====================
|
||||
|
||||
To illustrate creating and installing a handler we will use the
|
||||
``HTTPBasicAuthHandler``. For a more detailed discussion of this subject --
|
||||
including an explanation of how Basic Authentication works - see the `Basic
|
||||
Authentication Tutorial
|
||||
<http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
|
||||
|
||||
When authentication is required, the server sends a header (as well as the 401
|
||||
error code) requesting authentication. This specifies the authentication scheme
|
||||
and a 'realm'. The header looks like: ``WWW-Authenticate: SCHEME
|
||||
realm="REALM"``.
|
||||
|
||||
e.g.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
WWW-Authenticate: Basic realm="cPanel Users"
|
||||
|
||||
|
||||
The client should then retry the request with the appropriate name and password
|
||||
for the realm included as a header in the request. This is 'basic
|
||||
authentication'. In order to simplify this process we can create an instance of
|
||||
``HTTPBasicAuthHandler`` and an opener to use this handler.
|
||||
|
||||
The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
|
||||
the mapping of URLs and realms to passwords and usernames. If you know what the
|
||||
realm is (from the authentication header sent by the server), then you can use a
|
||||
``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
|
||||
case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
|
||||
you to specify a default username and password for a URL. This will be supplied
|
||||
in the absence of you providing an alternative combination for a specific
|
||||
realm. We indicate this by providing ``None`` as the realm argument to the
|
||||
``add_password`` method.
|
||||
|
||||
The top-level URL is the first URL that requires authentication. URLs "deeper"
|
||||
than the URL you pass to .add_password() will also match. ::
|
||||
|
||||
# create a password manager
|
||||
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
|
||||
|
||||
# Add the username and password.
|
||||
# If we knew the realm, we could use it instead of None.
|
||||
top_level_url = "http://example.com/foo/"
|
||||
password_mgr.add_password(None, top_level_url, username, password)
|
||||
|
||||
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
|
||||
|
||||
# create "opener" (OpenerDirector instance)
|
||||
opener = urllib.request.build_opener(handler)
|
||||
|
||||
# use the opener to fetch a URL
|
||||
opener.open(a_url)
|
||||
|
||||
# Install the opener.
|
||||
# Now all calls to urllib.request.urlopen use our opener.
|
||||
urllib.request.install_opener(opener)
|
||||
|
||||
.. note::
|
||||
|
||||
In the above example we only supplied our ``HTTPBasicAuthHandler`` to
|
||||
``build_opener``. By default openers have the handlers for normal situations
|
||||
-- ``ProxyHandler`` (if a proxy setting such as an :envvar:`http_proxy`
|
||||
environment variable is set), ``UnknownHandler``, ``HTTPHandler``,
|
||||
``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
|
||||
``FileHandler``, ``DataHandler``, ``HTTPErrorProcessor``.
|
||||
|
||||
``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
|
||||
component and the hostname and optionally the port number)
|
||||
e.g. ``"http://example.com/"`` *or* an "authority" (i.e. the hostname,
|
||||
optionally including the port number) e.g. ``"example.com"`` or ``"example.com:8080"``
|
||||
(the latter example includes a port number). The authority, if present, must
|
||||
NOT contain the "userinfo" component - for example ``"joe:password@example.com"`` is
|
||||
not correct.
|
||||
|
||||
|
||||
Proxies
|
||||
=======
|
||||
|
||||
**urllib** will auto-detect your proxy settings and use those. This is through
|
||||
the ``ProxyHandler``, which is part of the normal handler chain when a proxy
|
||||
setting is detected. Normally that's a good thing, but there are occasions
|
||||
when it may not be helpful [#]_. One way to do this is to setup our own
|
||||
``ProxyHandler``, with no proxies defined. This is done using similar steps to
|
||||
setting up a `Basic Authentication`_ handler: ::
|
||||
|
||||
>>> proxy_support = urllib.request.ProxyHandler({})
|
||||
>>> opener = urllib.request.build_opener(proxy_support)
|
||||
>>> urllib.request.install_opener(opener)
|
||||
|
||||
.. note::
|
||||
|
||||
Currently ``urllib.request`` *does not* support fetching of ``https`` locations
|
||||
through a proxy. However, this can be enabled by extending urllib.request as
|
||||
shown in the recipe [#]_.
|
||||
|
||||
.. note::
|
||||
|
||||
``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; see
|
||||
the documentation on :func:`~urllib.request.getproxies`.
|
||||
|
||||
|
||||
Sockets and Layers
|
||||
==================
|
||||
|
||||
The Python support for fetching resources from the web is layered. urllib uses
|
||||
the :mod:`http.client` library, which in turn uses the socket library.
|
||||
|
||||
As of Python 2.3 you can specify how long a socket should wait for a response
|
||||
before timing out. This can be useful in applications which have to fetch web
|
||||
pages. By default the socket module has *no timeout* and can hang. Currently,
|
||||
the socket timeout is not exposed at the http.client or urllib.request levels.
|
||||
However, you can set the default timeout globally for all sockets using ::
|
||||
|
||||
import socket
|
||||
import urllib.request
|
||||
|
||||
# timeout in seconds
|
||||
timeout = 10
|
||||
socket.setdefaulttimeout(timeout)
|
||||
|
||||
# this call to urllib.request.urlopen now uses the default timeout
|
||||
# we have set in the socket module
|
||||
req = urllib.request.Request('http://www.voidspace.org.uk')
|
||||
response = urllib.request.urlopen(req)
|
||||
|
||||
|
||||
-------
|
||||
|
||||
|
||||
Footnotes
|
||||
=========
|
||||
|
||||
This document was reviewed and revised by John Lee.
|
||||
|
||||
.. [#] Google for example.
|
||||
.. [#] Browser sniffing is a very bad practice for website design - building
|
||||
sites using web standards is much more sensible. Unfortunately a lot of
|
||||
sites still send different versions to different browsers.
|
||||
.. [#] The user agent for MSIE 6 is
|
||||
*'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'*
|
||||
.. [#] For details of more HTTP request headers, see
|
||||
`Quick Reference to HTTP Headers`_.
|
||||
.. [#] In my case I have to use a proxy to access the internet at work. If you
|
||||
attempt to fetch *localhost* URLs through this proxy it blocks them. IE
|
||||
is set to use the proxy, which urllib picks up on. In order to test
|
||||
scripts with a localhost server, I have to prevent urllib from using
|
||||
the proxy.
|
||||
.. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
|
||||
<https://code.activestate.com/recipes/456195/>`_.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue