2023-10-20
2022-10-30
2021-12-16
Unicode 14 support (#233).
Support GNUInstallDirs
in CMake build (#159).
cmake
build now installs pkg-config
file (#224).
Various build and portability improvements.
2020-12-15
utf8proc_grapheme_break_stateful
for NULL
state argument, which
also broke utf8proc_grapheme_break
.2020-11-23
New utf8proc_islower
and utf8proc_isupper
functions (#196).
Bugfix for manual calls to grapheme_break_extended
for initial characters (#205).
Various build and portability improvements.
2019-03-27
Unicode 13 support (#179).
No longer report zero width for category Sk (#167).
cmake
support improvements (#173).
2019-05-10
Unicode 12.1 support (#156).
New -DUTF8PROC_INSTALL=No
option for cmake
builds to disable installation (#152).
Better make
support for HP-UX (#154).
Fixed incorrect UTF8PROC_VERSION_MINOR
version number in header and bumped shared-library version.
2019-03-30
Unicode 12 support (#148).
New function utf8proc_unicode_version
to return the supported Unicode version (#151).
Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 (#150).
Fix CHARBOUND
option for utf8proc_map
to preserve U+FFFE and U+FFFF non-characters (#149).
2018-07-24
utf8proc_NFKC_Casefold
convenience function for NFKC_Casefold
normalization (#133).
UTF8PROC_STRIPNA
option to strip unassigned codepoints (#133).
Support building static libraries on Windows (callers need to
#define UTF8PROC_STATIC
) (#123).
cmake
fix to avoid defining UTF8PROC_EXPORTS
globally (#121).
toupper
of ß (U+00df) now yields ẞ (U+1E9E) (#134), similar to musl;
case-folding still yields the standard "ss" mapping.
utf8proc_charwidth
now returns 1
for U+00AD (soft hyphen) and
for unassigned/PUA codepoints (#135).
2018-04-27
2016-12-26:
New functions utf8proc_map_custom
and utf8proc_decompose_custom
to allow user-supplied transformations of codepoints, in conjunction
with other transformations (#89).
New function utf8proc_normalize_utf32
to apply normalizations
directly to UTF-32 data (not just UTF-8) (#88).
Fixed stack overflow that could occur due to incorrect definition
of UINT16_MAX
with some compilers (#84).
Fixed conflict with stdbool.h
in Visual Studio (#90).
Updated font metrics to use Unifont 9.0.04.
2016-07-27:
Move -Wmissing-prototypes
warning flag from Makefile
to .travis.yml
since MSVC does not understand this flag and it is occasionally useful to
build using MSVC through the Makefile
(#79).
Use a different variable name for a nested loop in bench/bench.c
, and
declare it in a C89 way rather than inside the for
to avoid "error:
'for' loop initial declarations are only allowed in C99 mode" (#80).
2016-07-13:
Bug fix in utf8proc_grapheme_break_stateful
(#77).
Tests now use versioned Unicode files, so they will no longer break when a new version of Unicode is released (#78).
2016-07-13:
Updated for Unicode 9.0 (#70).
New utf8proc_grapheme_break_stateful
to handle the complicated
grapheme-breaking rules in Unicode 9. The old utf8proc_grapheme_break
is still provided, but may incorrectly identify grapheme breaks
in some Unicode-9 sequences.
Smaller Unicode tables (#62, #68). This required changes
in the utf8proc_property_t
structure, which breaks backward
compatibility if you access this struct
directly. The
functions in the API remain backward-compatible, however.
Buffer overrun fix (#66).
2015-11-02:
Do not export symbol for internal function unsafe_encode_char()
(#55).
Install relative symbolic links for shared libraries (#58).
Add missing files to make clean
(#58).
2015-07-06:
Updated for Unicode 8.0 (#45).
New utf8proc_tolower
and utf8proc_toupper
functions, portable
replacements for towlower
and towupper
in the C library (#40).
Don't treat Unicode "non-characters" as invalid, and improved validity checking in general (#35).
Prefix all typedefs with utf8proc_
, e.g. utf8proc_int32_t
,
to avoid collisions with other libraries (#32).
Rename DLLEXPORT
to UTF8PROC_DLLEXPORT
to prevent collisions.
Fix build breakage in the benchmark routines.
More fine-grained Makefile variables (PICFLAG
etcetera), so that
compilation flags can be selectively overridden, and in particular
so that CFLAGS
can be changed without accidentally eliminating
necessary flags like -fPIC
and -std=c99
(#43).
Updated character-width tables based on Unifont 8.0.01 (#51) and the Unicode 8 character categories (#47).
2015-03-28:
Updated for Unicode 7.0 (#6).
New function utf8proc_grapheme_break(c1,c2)
that returns whether
there is a grapheme break between c1
and c2
(#20).
New function utf8proc_charwidth(c)
that returns the number of
column-positions that should be required for c
; essentially a
portable replacment for wcwidth(c)
(#27).
New function utf8proc_category(c)
that returns the Unicode
category of c
(as one of the constants UTF8PROC_CATEGORY_xx
).
Also, a function utf8proc_category_string(c)
that returns the Unicode
category of c
as a two-character string.
cmake
script CMakeLists.txt
, in addition to Makefile
, for
easier compilation on Windows (#28).
Various Makefile
improvements: a make check
target to perform
tests (#13), make install
, a rule to automate updating the Unicode
tables, etcetera.
The shared library is now versioned (e.g. has a soname on GNU/Linux) (#24).
C++/MSVC compatibility (#17).
Most #defined
constants are now enums
(#29).
New preprocessor constants UTF8PROC_VERSION_MAJOR
,
UTF8PROC_VERSION_MINOR
, and UTF8PROC_VERSION_PATCH
for compile-time
detection of the API version.
Doxygen-formatted documentation (#29).
The Ruby and PostgreSQL plugins have been removed due to lack of testing (#22).
2013-11-27:
c
language name)2009-08-20:
RSTRING_PTR()
and RSTRING_LEN()
instead of RSTRING()->ptr
and
RSTRING()->len
for ruby1.9 compatibility (and #define
them, if not
existent)2009-10-02:
2009-10-08:
2009-10-16:
2009-06-14:
2009-08-19:
README
file2008-10-04:
utf8proc_version
returning a string containing the version
number of the library.libutf8proc.dylib
for MacOSX.2009-05-01:
SET_VARSIZE
macro)2007-07-25:
2007-06-25:
unistrip
, which behaves like unifold
,
but also removes all character marks (e.g. accents).2007-07-22:
utf8proc_codepoint_valid
to the C library.Makefile
from -g -O0
to -O2
utf8proc_data.c
file, is now
included in the distribution.2007-03-16:
String#utf8chars
).2006-09-21:
Integer#utf8
, which raises an exception, if the given
code-point is invalid because of being too high (this was missing yet)2006-12-26:
2006-09-20:
Release of version 1.0.1
2006-09-17:
LUMP
option, which lumps certain characters together (see lump.md
) (also used for the PostgreSQL unifold
function)STRIPMARK
option, which strips marking characters (or marks of composed characters)String#char_ary
in favour of String#utf8chars
2006-07-18:
2006-08-04:
CHARBOUND
)String#chars
, which is returning an array of UTF-8 encoded grapheme clustersNLF2LF
transformation in postgresql unifold
functionDECOMPOSE
option, if you neither use COMPOSE
or DECOMPOSE
, no normalization will be performed (different from previous versions)2006-06-05:
2006-06-20:
2006-06-02: initial release of version 0.1