README.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
  1. Internationalized Domain Names in Applications (IDNA)
  2. =====================================================
  3. Support for the Internationalized Domain Names in
  4. Applications (IDNA) protocol as specified in `RFC 5891
  5. <https://tools.ietf.org/html/rfc5891>`_. This is the latest version of
  6. the protocol and is sometimes referred to as “IDNA 2008”.
  7. This library also provides support for Unicode Technical
  8. Standard 46, `Unicode IDNA Compatibility Processing
  9. <https://unicode.org/reports/tr46/>`_.
  10. This acts as a suitable replacement for the “encodings.idna”
  11. module that comes with the Python standard library, but which
  12. only supports the older superseded IDNA specification (`RFC 3490
  13. <https://tools.ietf.org/html/rfc3490>`_).
  14. Basic functions are simply executed:
  15. .. code-block:: pycon
  16. >>> import idna
  17. >>> idna.encode('ドメイン.テスト')
  18. b'xn--eckwd4c7c.xn--zckzah'
  19. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  20. ドメイン.テスト
  21. Installation
  22. ------------
  23. This package is available for installation from PyPI:
  24. .. code-block:: bash
  25. $ python3 -m pip install idna
  26. Usage
  27. -----
  28. For typical usage, the ``encode`` and ``decode`` functions will take a
  29. domain name argument and perform a conversion to A-labels or U-labels
  30. respectively.
  31. .. code-block:: pycon
  32. >>> import idna
  33. >>> idna.encode('ドメイン.テスト')
  34. b'xn--eckwd4c7c.xn--zckzah'
  35. >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
  36. ドメイン.テスト
  37. You may use the codec encoding and decoding methods using the
  38. ``idna.codec`` module:
  39. .. code-block:: pycon
  40. >>> import idna.codec
  41. >>> print('домен.испытание'.encode('idna2008'))
  42. b'xn--d1acufc.xn--80akhbyknj4f'
  43. >>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))
  44. домен.испытание
  45. Conversions can be applied at a per-label basis using the ``ulabel`` or
  46. ``alabel`` functions if necessary:
  47. .. code-block:: pycon
  48. >>> idna.alabel('测试')
  49. b'xn--0zwm56d'
  50. Compatibility Mapping (UTS #46)
  51. +++++++++++++++++++++++++++++++
  52. As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the
  53. IDNA specification does not normalize input from different potential
  54. ways a user may input a domain name. This functionality, known as
  55. a “mapping”, is considered by the specification to be a local
  56. user-interface issue distinct from IDNA conversion functionality.
  57. This library provides one such mapping that was developed by the
  58. Unicode Consortium. Known as `Unicode IDNA Compatibility Processing
  59. <https://unicode.org/reports/tr46/>`_, it provides for both a regular
  60. mapping for typical applications, as well as a transitional mapping to
  61. help migrate from older IDNA 2003 applications.
  62. For example, “Königsgäßchen” is not a permissible label as *LATIN
  63. CAPITAL LETTER K* is not allowed (nor are capital letters in general).
  64. UTS 46 will convert this into lower case prior to applying the IDNA
  65. conversion.
  66. .. code-block:: pycon
  67. >>> import idna
  68. >>> idna.encode('Königsgäßchen')
  69. ...
  70. idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
  71. >>> idna.encode('Königsgäßchen', uts46=True)
  72. b'xn--knigsgchen-b4a3dun'
  73. >>> print(idna.decode('xn--knigsgchen-b4a3dun'))
  74. königsgäßchen
  75. Transitional processing provides conversions to help transition from
  76. the older 2003 standard to the current standard. For example, in the
  77. original IDNA specification, the *LATIN SMALL LETTER SHARP S* (ß) was
  78. converted into two *LATIN SMALL LETTER S* (ss), whereas in the current
  79. IDNA specification this conversion is not performed.
  80. .. code-block:: pycon
  81. >>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
  82. 'xn--knigsgsschen-lcb0w'
  83. Implementers should use transitional processing with caution, only in
  84. rare cases where conversion from legacy labels to current labels must be
  85. performed (i.e. IDNA implementations that pre-date 2008). For typical
  86. applications that just need to convert labels, transitional processing
  87. is unlikely to be beneficial and could produce unexpected incompatible
  88. results.
  89. ``encodings.idna`` Compatibility
  90. ++++++++++++++++++++++++++++++++
  91. Function calls from the Python built-in ``encodings.idna`` module are
  92. mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
  93. Simply substitute the ``import`` clause in your code to refer to the new
  94. module name.
  95. Exceptions
  96. ----------
  97. All errors raised during the conversion following the specification
  98. should raise an exception derived from the ``idna.IDNAError`` base
  99. class.
  100. More specific exceptions that may be generated as ``idna.IDNABidiError``
  101. when the error reflects an illegal combination of left-to-right and
  102. right-to-left characters in a label; ``idna.InvalidCodepoint`` when
  103. a specific codepoint is an illegal character in an IDN label (i.e.
  104. INVALID); and ``idna.InvalidCodepointContext`` when the codepoint is
  105. illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ
  106. but the contextual requirements are not satisfied.)
  107. Building and Diagnostics
  108. ------------------------
  109. The IDNA and UTS 46 functionality relies upon pre-calculated lookup
  110. tables for performance. These tables are derived from computing against
  111. eligibility criteria in the respective standards. These tables are
  112. computed using the command-line script ``tools/idna-data``.
  113. This tool will fetch relevant codepoint data from the Unicode repository
  114. and perform the required calculations to identify eligibility. There are
  115. three main modes:
  116. * ``idna-data make-libdata``. Generates ``idnadata.py`` and
  117. ``uts46data.py``, the pre-calculated lookup tables used for IDNA and
  118. UTS 46 conversions. Implementers who wish to track this library against
  119. a different Unicode version may use this tool to manually generate a
  120. different version of the ``idnadata.py`` and ``uts46data.py`` files.
  121. * ``idna-data make-table``. Generate a table of the IDNA disposition
  122. (e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix
  123. B.1 of RFC 5892 and the pre-computed tables published by `IANA
  124. <https://www.iana.org/>`_.
  125. * ``idna-data U+0061``. Prints debugging output on the various
  126. properties associated with an individual Unicode codepoint (in this
  127. case, U+0061), that are used to assess the IDNA and UTS 46 status of a
  128. codepoint. This is helpful in debugging or analysis.
  129. The tool accepts a number of arguments, described using ``idna-data
  130. -h``. Most notably, the ``--version`` argument allows the specification
  131. of the version of Unicode to be used in computing the table data. For
  132. example, ``idna-data --version 9.0.0 make-libdata`` will generate
  133. library data against Unicode 9.0.0.
  134. Additional Notes
  135. ----------------
  136. * **Packages**. The latest tagged release version is published in the
  137. `Python Package Index <https://pypi.org/project/idna/>`_.
  138. * **Version support**. This library supports Python 3.5 and higher.
  139. As this library serves as a low-level toolkit for a variety of
  140. applications, many of which strive for broad compatibility with older
  141. Python versions, there is no rush to remove older interpreter support.
  142. Removing support for older versions should be well justified in that the
  143. maintenance burden has become too high.
  144. * **Python 2**. Python 2 is supported by version 2.x of this library.
  145. While active development of the version 2.x series has ended, notable
  146. issues being corrected may be backported to 2.x. Use "idna<3" in your
  147. requirements file if you need this library for a Python 2 application.
  148. * **Testing**. The library has a test suite based on each rule of the
  149. IDNA specification, as well as tests that are provided as part of the
  150. Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing
  151. <https://unicode.org/reports/tr46/>`_.
  152. * **Emoji**. It is an occasional request to support emoji domains in
  153. this library. Encoding of symbols like emoji is expressly prohibited by
  154. the technical standard IDNA 2008 and emoji domains are broadly phased
  155. out across the domain industry due to associated security risks. For
  156. now, applications that need to support these non-compliant labels
  157. may wish to consider trying the encode/decode operation in this library
  158. first, and then falling back to using `encodings.idna`. See `the Github
  159. project <https://github.com/kjd/idna/issues/18>`_ for more discussion.