README.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212
  1. Internationalized Domain Names in Applications (IDNA)
  2. =====================================================
  3. Support for the Internationalized Domain Names in
  4. Applications (IDNA) protocol as specified in `RFC 5891
  5. <https://tools.ietf.org/html/rfc5891>`_. This is the latest version of
  6. the protocol and is sometimes referred to as “IDNA 2008”.
  7. This library also provides support for Unicode Technical
  8. Standard 46, `Unicode IDNA Compatibility Processing
  9. <https://unicode.org/reports/tr46/>`_.
  10. This acts as a suitable replacement for the “encodings.idna”
  11. module that comes with the Python standard library, but which
  12. only supports the older superseded IDNA specification (`RFC 3490
  13. <https://tools.ietf.org/html/rfc3490>`_).
  14. Basic functions are simply executed:
  15. .. code-block:: pycon
  16. >>> import idna
  17. >>> idna.encode('ドメイン.テスト')
  18. b'xn--eckwd4c7c.xn--zckzah'
  19. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  20. ドメイン.テスト
  21. Installation
  22. ------------
  23. This package is available for installation from PyPI:
  24. .. code-block:: bash
  25. $ python3 -m pip install idna
  26. Usage
  27. -----
  28. For typical usage, the ``encode`` and ``decode`` functions will take a
  29. domain name argument and perform a conversion to A-labels or U-labels
  30. respectively.
  31. .. code-block:: pycon
  32. >>> import idna
  33. >>> idna.encode('ドメイン.テスト')
  34. b'xn--eckwd4c7c.xn--zckzah'
  35. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  36. ドメイン.テスト
  37. You may use the codec encoding and decoding methods using the
  38. ``idna.codec`` module:
  39. .. code-block:: pycon
  40. >>> import idna.codec
  41. >>> print('домен.испытание'.encode('idna2008'))
  42. b'xn--d1acufc.xn--80akhbyknj4f'
  43. >>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))
  44. домен.испытание
  45. Conversions can be applied at a per-label basis using the ``ulabel`` or
  46. ``alabel`` functions if necessary:
  47. .. code-block:: pycon
  48. >>> idna.alabel('测试')
  49. b'xn--0zwm56d'
  50. Compatibility Mapping (UTS #46)
  51. +++++++++++++++++++++++++++++++
  52. As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the
  53. IDNA specification does not normalize input from different potential
  54. ways a user may input a domain name. This functionality, known as
  55. a “mapping”, is considered by the specification to be a local
  56. user-interface issue distinct from IDNA conversion functionality.
  57. This library provides one such mapping that was developed by the
  58. Unicode Consortium. Known as `Unicode IDNA Compatibility Processing
  59. <https://unicode.org/reports/tr46/>`_, it provides for both a regular
  60. mapping for typical applications, as well as a transitional mapping to
  61. help migrate from older IDNA 2003 applications. Strings are
  62. preprocessed according to Section 4.4 “Preprocessing for IDNA2008”
  63. prior to the IDNA operations.
  64. For example, “Königsgäßchen” is not a permissible label as *LATIN
  65. CAPITAL LETTER K* is not allowed (nor are capital letters in general).
  66. UTS 46 will convert this into lower case prior to applying the IDNA
  67. conversion.
  68. .. code-block:: pycon
  69. >>> import idna
  70. >>> idna.encode('Königsgäßchen')
  71. ...
  72. idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
  73. >>> idna.encode('Königsgäßchen', uts46=True)
  74. b'xn--knigsgchen-b4a3dun'
  75. >>> print(idna.decode('xn--knigsgchen-b4a3dun'))
  76. königsgäßchen
  77. Transitional processing provides conversions to help transition from
  78. the older 2003 standard to the current standard. For example, in the
  79. original IDNA specification, the *LATIN SMALL LETTER SHARP S* (ß) was
  80. converted into two *LATIN SMALL LETTER S* (ss), whereas in the current
  81. IDNA specification this conversion is not performed.
  82. .. code-block:: pycon
  83. >>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
  84. 'xn--knigsgsschen-lcb0w'
  85. Implementers should use transitional processing with caution, only in
  86. rare cases where conversion from legacy labels to current labels must be
  87. performed (i.e. IDNA implementations that pre-date 2008). For typical
  88. applications that just need to convert labels, transitional processing
  89. is unlikely to be beneficial and could produce unexpected incompatible
  90. results.
  91. ``encodings.idna`` Compatibility
  92. ++++++++++++++++++++++++++++++++
  93. Function calls from the Python built-in ``encodings.idna`` module are
  94. mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
  95. Simply substitute the ``import`` clause in your code to refer to the new
  96. module name.
  97. Exceptions
  98. ----------
  99. All errors raised during the conversion following the specification
  100. should raise an exception derived from the ``idna.IDNAError`` base
  101. class.
  102. More specific exceptions that may be generated as ``idna.IDNABidiError``
  103. when the error reflects an illegal combination of left-to-right and
  104. right-to-left characters in a label; ``idna.InvalidCodepoint`` when
  105. a specific codepoint is an illegal character in an IDN label (i.e.
  106. INVALID); and ``idna.InvalidCodepointContext`` when the codepoint is
  107. illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ
  108. but the contextual requirements are not satisfied.)
  109. Building and Diagnostics
  110. ------------------------
  111. The IDNA and UTS 46 functionality relies upon pre-calculated lookup
  112. tables for performance. These tables are derived from computing against
  113. eligibility criteria in the respective standards. These tables are
  114. computed using the command-line script ``tools/idna-data``.
  115. This tool will fetch relevant codepoint data from the Unicode repository
  116. and perform the required calculations to identify eligibility. There are
  117. three main modes:
  118. * ``idna-data make-libdata``. Generates ``idnadata.py`` and
  119. ``uts46data.py``, the pre-calculated lookup tables used for IDNA and
  120. UTS 46 conversions. Implementers who wish to track this library against
  121. a different Unicode version may use this tool to manually generate a
  122. different version of the ``idnadata.py`` and ``uts46data.py`` files.
  123. * ``idna-data make-table``. Generate a table of the IDNA disposition
  124. (e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix
  125. B.1 of RFC 5892 and the pre-computed tables published by `IANA
  126. <https://www.iana.org/>`_.
  127. * ``idna-data U+0061``. Prints debugging output on the various
  128. properties associated with an individual Unicode codepoint (in this
  129. case, U+0061), that are used to assess the IDNA and UTS 46 status of a
  130. codepoint. This is helpful in debugging or analysis.
  131. The tool accepts a number of arguments, described using ``idna-data
  132. -h``. Most notably, the ``--version`` argument allows the specification
  133. of the version of Unicode to be used in computing the table data. For
  134. example, ``idna-data --version 9.0.0 make-libdata`` will generate
  135. library data against Unicode 9.0.0.
  136. Additional Notes
  137. ----------------
  138. * **Packages**. The latest tagged release version is published in the
  139. `Python Package Index <https://pypi.org/project/idna/>`_.
  140. * **Version support**. This library supports Python 3.6 and higher.
  141. As this library serves as a low-level toolkit for a variety of
  142. applications, many of which strive for broad compatibility with older
  143. Python versions, there is no rush to remove older interpreter support.
  144. Removing support for older versions should be well justified in that the
  145. maintenance burden has become too high.
  146. * **Python 2**. Python 2 is supported by version 2.x of this library.
  147. Use "idna<3" in your requirements file if you need this library for
  148. a Python 2 application. Be advised that these versions are no longer
  149. actively developed.
  150. * **Testing**. The library has a test suite based on each rule of the
  151. IDNA specification, as well as tests that are provided as part of the
  152. Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing
  153. <https://unicode.org/reports/tr46/>`_.
  154. * **Emoji**. It is an occasional request to support emoji domains in
  155. this library. Encoding of symbols like emoji is expressly prohibited by
  156. the technical standard IDNA 2008 and emoji domains are broadly phased
  157. out across the domain industry due to associated security risks. For
  158. now, applications that need to support these non-compliant labels
  159. may wish to consider trying the encode/decode operation in this library
  160. first, and then falling back to using `encodings.idna`. See `the Github
  161. project <https://github.com/kjd/idna/issues/18>`_ for more discussion.