This python module provides an API with data about languages/regions/scripts for use in the language-support categorization of the font families in the Google Fonts collection.

You can also directly access the raw **textproto** files on the `Lib/gflanguages/data` directory:
* [`languages`](https://github.com/googlefonts/lang/tree/main/Lib/gflanguages/data/languages)
* [`regions`](https://github.com/googlefonts/lang/tree/main/Lib/gflanguages/data/regions)
* [`scripts`](https://github.com/googlefonts/lang/tree/main/Lib/gflanguages/data/scripts)

Most of the code in this project was copied from the `gftools` repository (https://github.com/googlefonts/gftools/) so that language/region/script data can be easily available to all our tools without having to also get the large dependency tree of `gftools`. The most immediate user of this module is `Font Bakery`, which needs to validate language support on font binaries being checked. (see https://github.com/googlefonts/fontbakery/issues/3605)

The second obvious user of this `gflanguages` module is `gftools` itself.

Language/region/script definitions and the `gflanguages` modules are used as a subtree in the `google/fonts` repo, on its **lang/** directory (https://github.com/google/fonts/tree/main/lang).

This module is the main place to update these definitions, avoiding data duplication and guaranteeing uniformity across tools.

To learn more about how *lang* metadata affects downstream, see [gf-guide/lang](https://googlefonts.github.io/gf-guide/lang).

## Sample text rules

If there is a `sample_text` field for a language, it should contain all of the following fields:

* `masthead_full`: show off four glyphs
* `masthead_partial`: show off two glyphs
* `styles`: a phrase of 40-60 characters
* `tester`: a phrase of 60-90 characters
* `poster_sm`: a word or phrase of 10-17 characters
* `poster_md`: a word or phrase of 6-12 characters
* `poster_lg`: a word or phrase of 3-8 characters
* `specimen_48`: a sentence of 50-80 characters
* `specimen_36`: a paragraph of 100-120 characters
* `specimen_32`: a paragraph of 140-180 characters
* `specimen_21`: one or more paragraphs totalling 300-500 characters
* `specimen_16`: one or more paragraphs totalling 550-750 characters

Generally the sample text should be taken from the UN Declaration of Human Rights; if using Eric Muller's XML translations, `snippets/lang_sample_text.py` will convert the XML into textproto.

If the UDHR is not available in the language, the sample text should be a "neutral" text (not political or religious) - folk tales are generally good sources. (We recognise that for some liturgical languages, religious texts may be the only extant samples.) In these cases, please add a `note:` field with the source of the sample text.