README 44 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002
  1. README file for PCRE (Perl-compatible regular expression library)
  2. -----------------------------------------------------------------
  3. NOTE: This set of files relates to PCRE releases that use the original API,
  4. with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
  5. first release of a new API, known as PCRE2, with release numbers starting at
  6. 10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
  7. libraries (now called PCRE1) are now at end of life, and 8.45 is the final
  8. release. New projects are advised to use the new PCRE2 libraries.
  9. The latest release of PCRE1 is always available in three alternative formats
  10. from:
  11. https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.gz
  12. https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.bz2
  13. https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.zip
  14. There is a mailing list for discussion about the development of PCRE at
  15. pcre-dev@exim.org. You can access the archives and subscribe or manage your
  16. subscription here:
  17. https://lists.exim.org/mailman/listinfo/pcre-dev
  18. Please read the NEWS file if you are upgrading from a previous release.
  19. The contents of this README file are:
  20. The PCRE APIs
  21. Documentation for PCRE
  22. Contributions by users of PCRE
  23. Building PCRE on non-Unix-like systems
  24. Building PCRE without using autotools
  25. Building PCRE using autotools
  26. Retrieving configuration information
  27. Shared libraries
  28. Cross-compiling using autotools
  29. Using HP's ANSI C++ compiler (aCC)
  30. Compiling in Tru64 using native compilers
  31. Using Sun's compilers for Solaris
  32. Using PCRE from MySQL
  33. Making new tarballs
  34. Testing PCRE
  35. Character tables
  36. File manifest
  37. The PCRE APIs
  38. -------------
  39. PCRE is written in C, and it has its own API. There are three sets of
  40. functions, one for the 8-bit library, which processes strings of bytes, one for
  41. the 16-bit library, which processes strings of 16-bit values, and one for the
  42. 32-bit library, which processes strings of 32-bit values. The distribution also
  43. includes a set of C++ wrapper functions (see the pcrecpp man page for details),
  44. courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
  45. C++. Other C++ wrappers have been created from time to time. See, for example:
  46. https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
  47. style to the C API.
  48. The distribution also contains a set of C wrapper functions (again, just for
  49. the 8-bit library) that are based on the POSIX regular expression API (see the
  50. pcreposix man page). These end up in the library called libpcreposix. Note that
  51. this just provides a POSIX calling interface to PCRE; the regular expressions
  52. themselves still follow Perl syntax and semantics. The POSIX API is restricted,
  53. and does not give full access to all of PCRE's facilities.
  54. The header file for the POSIX-style functions is called pcreposix.h. The
  55. official POSIX name is regex.h, but I did not want to risk possible problems
  56. with existing files of that name by distributing it that way. To use PCRE with
  57. an existing program that uses the POSIX API, pcreposix.h will have to be
  58. renamed or pointed at by a link.
  59. If you are using the POSIX interface to PCRE and there is already a POSIX regex
  60. library installed on your system, as well as worrying about the regex.h header
  61. file (as mentioned above), you must also take care when linking programs to
  62. ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
  63. up the POSIX functions of the same name from the other library.
  64. One way of avoiding this confusion is to compile PCRE with the addition of
  65. -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
  66. compiler flags (CFLAGS if you are using "configure" -- see below). This has the
  67. effect of renaming the functions so that the names no longer clash. Of course,
  68. you have to do the same thing for your applications, or write them using the
  69. new names.
  70. Documentation for PCRE
  71. ----------------------
  72. If you install PCRE in the normal way on a Unix-like system, you will end up
  73. with a set of man pages whose names all start with "pcre". The one that is just
  74. called "pcre" lists all the others. In addition to these man pages, the PCRE
  75. documentation is supplied in two other forms:
  76. 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
  77. doc/pcretest.txt in the source distribution. The first of these is a
  78. concatenation of the text forms of all the section 3 man pages except
  79. the listing of pcredemo.c and those that summarize individual functions.
  80. The other two are the text forms of the section 1 man pages for the
  81. pcregrep and pcretest commands. These text forms are provided for ease of
  82. scanning with text editors or similar tools. They are installed in
  83. <prefix>/share/doc/pcre, where <prefix> is the installation prefix
  84. (defaulting to /usr/local).
  85. 2. A set of files containing all the documentation in HTML form, hyperlinked
  86. in various ways, and rooted in a file called index.html, is distributed in
  87. doc/html and installed in <prefix>/share/doc/pcre/html.
  88. Users of PCRE have contributed files containing the documentation for various
  89. releases in CHM format. These can be found in the Contrib directory of the FTP
  90. site (see next section).
  91. Contributions by users of PCRE
  92. ------------------------------
  93. You can find contributions from PCRE users in the directory
  94. ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
  95. There is a README file giving brief descriptions of what they are. Some are
  96. complete in themselves; others are pointers to URLs containing relevant files.
  97. Some of this material is likely to be well out-of-date. Several of the earlier
  98. contributions provided support for compiling PCRE on various flavours of
  99. Windows (I myself do not use Windows). Nowadays there is more Windows support
  100. in the standard distribution, so these contibutions have been archived.
  101. A PCRE user maintains downloadable Windows binaries of the pcregrep and
  102. pcretest programs here:
  103. http://www.rexegg.com/pcregrep-pcretest.html
  104. Building PCRE on non-Unix-like systems
  105. --------------------------------------
  106. For a non-Unix-like system, please read the comments in the file
  107. NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
  108. "make" you may be able to build PCRE using autotools in the same way as for
  109. many Unix-like systems.
  110. PCRE can also be configured using the GUI facility provided by CMake's
  111. cmake-gui command. This creates Makefiles, solution files, etc. The file
  112. NON-AUTOTOOLS-BUILD has information about CMake.
  113. PCRE has been compiled on many different operating systems. It should be
  114. straightforward to build PCRE on any system that has a Standard C compiler and
  115. library, because it uses only Standard C functions.
  116. Building PCRE without using autotools
  117. -------------------------------------
  118. The use of autotools (in particular, libtool) is problematic in some
  119. environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
  120. file for ways of building PCRE without using autotools.
  121. Building PCRE using autotools
  122. -----------------------------
  123. If you are using HP's ANSI C++ compiler (aCC), please see the special note
  124. in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
  125. The following instructions assume the use of the widely used "configure; make;
  126. make install" (autotools) process.
  127. To build PCRE on system that supports autotools, first run the "configure"
  128. command from the PCRE distribution directory, with your current directory set
  129. to the directory where you want the files to be created. This command is a
  130. standard GNU "autoconf" configuration script, for which generic instructions
  131. are supplied in the file INSTALL.
  132. Most commonly, people build PCRE within its own distribution directory, and in
  133. this case, on many systems, just running "./configure" is sufficient. However,
  134. the usual methods of changing standard defaults are available. For example:
  135. CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
  136. This command specifies that the C compiler should be run with the flags '-O2
  137. -Wall' instead of the default, and that "make install" should install PCRE
  138. under /opt/local instead of the default /usr/local.
  139. If you want to build in a different directory, just run "configure" with that
  140. directory as current. For example, suppose you have unpacked the PCRE source
  141. into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
  142. cd /build/pcre/pcre-xxx
  143. /source/pcre/pcre-xxx/configure
  144. PCRE is written in C and is normally compiled as a C library. However, it is
  145. possible to build it as a C++ library, though the provided building apparatus
  146. does not have any features to support this.
  147. There are some optional features that can be included or omitted from the PCRE
  148. library. They are also documented in the pcrebuild man page.
  149. . By default, both shared and static libraries are built. You can change this
  150. by adding one of these options to the "configure" command:
  151. --disable-shared
  152. --disable-static
  153. (See also "Shared libraries on Unix-like systems" below.)
  154. . By default, only the 8-bit library is built. If you add --enable-pcre16 to
  155. the "configure" command, the 16-bit library is also built. If you add
  156. --enable-pcre32 to the "configure" command, the 32-bit library is also built.
  157. If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
  158. building the 8-bit library.
  159. . If you are building the 8-bit library and want to suppress the building of
  160. the C++ wrapper library, you can add --disable-cpp to the "configure"
  161. command. Otherwise, when "configure" is run without --disable-pcre8, it will
  162. try to find a C++ compiler and C++ header files, and if it succeeds, it will
  163. try to build the C++ wrapper.
  164. . If you want to include support for just-in-time compiling, which can give
  165. large performance improvements on certain platforms, add --enable-jit to the
  166. "configure" command. This support is available only for certain hardware
  167. architectures. If you try to enable it on an unsupported architecture, there
  168. will be a compile time error.
  169. . When JIT support is enabled, pcregrep automatically makes use of it, unless
  170. you add --disable-pcregrep-jit to the "configure" command.
  171. . If you want to make use of the support for UTF-8 Unicode character strings in
  172. the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
  173. or UTF-32 Unicode character strings in the 32-bit library, you must add
  174. --enable-utf to the "configure" command. Without it, the code for handling
  175. UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
  176. when --enable-utf is included, the use of a UTF encoding still has to be
  177. enabled by an option at run time. When PCRE is compiled with this option, its
  178. input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
  179. platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
  180. the same time.
  181. . There are no separate options for enabling UTF-8, UTF-16 and UTF-32
  182. independently because that would allow ridiculous settings such as requesting
  183. UTF-16 support while building only the 8-bit library. However, the option
  184. --enable-utf8 is retained for backwards compatibility with earlier releases
  185. that did not support 16-bit or 32-bit character strings. It is synonymous with
  186. --enable-utf. It is not possible to configure one library with UTF support
  187. and the other without in the same configuration.
  188. . If, in addition to support for UTF-8/16/32 character strings, you want to
  189. include support for the \P, \p, and \X sequences that recognize Unicode
  190. character properties, you must add --enable-unicode-properties to the
  191. "configure" command. This adds about 30K to the size of the library (in the
  192. form of a property table); only the basic two-letter properties such as Lu
  193. are supported.
  194. . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
  195. of the preceding, or any of the Unicode newline sequences as indicating the
  196. end of a line. Whatever you specify at build time is the default; the caller
  197. of PCRE can change the selection at run time. The default newline indicator
  198. is a single LF character (the Unix standard). You can specify the default
  199. newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
  200. or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
  201. --enable-newline-is-any to the "configure" command, respectively.
  202. If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
  203. the standard tests will fail, because the lines in the test files end with
  204. LF. Even if the files are edited to change the line endings, there are likely
  205. to be some failures. With --enable-newline-is-anycrlf or
  206. --enable-newline-is-any, many tests should succeed, but there may be some
  207. failures.
  208. . By default, the sequence \R in a pattern matches any Unicode line ending
  209. sequence. This is independent of the option specifying what PCRE considers to
  210. be the end of a line (see above). However, the caller of PCRE can restrict \R
  211. to match only CR, LF, or CRLF. You can make this the default by adding
  212. --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
  213. . When called via the POSIX interface, PCRE uses malloc() to get additional
  214. storage for processing capturing parentheses if there are more than 10 of
  215. them in a pattern. You can increase this threshold by setting, for example,
  216. --with-posix-malloc-threshold=20
  217. on the "configure" command.
  218. . PCRE has a counter that limits the depth of nesting of parentheses in a
  219. pattern. This limits the amount of system stack that a pattern uses when it
  220. is compiled. The default is 250, but you can change it by setting, for
  221. example,
  222. --with-parens-nest-limit=500
  223. . PCRE has a counter that can be set to limit the amount of resources it uses
  224. when matching a pattern. If the limit is exceeded during a match, the match
  225. fails. The default is ten million. You can change the default by setting, for
  226. example,
  227. --with-match-limit=500000
  228. on the "configure" command. This is just the default; individual calls to
  229. pcre_exec() can supply their own value. There is more discussion on the
  230. pcreapi man page.
  231. . There is a separate counter that limits the depth of recursive function calls
  232. during a matching process. This also has a default of ten million, which is
  233. essentially "unlimited". You can change the default by setting, for example,
  234. --with-match-limit-recursion=500000
  235. Recursive function calls use up the runtime stack; running out of stack can
  236. cause programs to crash in strange ways. There is a discussion about stack
  237. sizes in the pcrestack man page.
  238. . The default maximum compiled pattern size is around 64K. You can increase
  239. this by adding --with-link-size=3 to the "configure" command. In the 8-bit
  240. library, PCRE then uses three bytes instead of two for offsets to different
  241. parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
  242. the same as --with-link-size=4, which (in both libraries) uses four-byte
  243. offsets. Increasing the internal link size reduces performance. In the 32-bit
  244. library, the only supported link size is 4.
  245. . You can build PCRE so that its internal match() function that is called from
  246. pcre_exec() does not call itself recursively. Instead, it uses memory blocks
  247. obtained from the heap via the special functions pcre_stack_malloc() and
  248. pcre_stack_free() to save data that would otherwise be saved on the stack. To
  249. build PCRE like this, use
  250. --disable-stack-for-recursion
  251. on the "configure" command. PCRE runs more slowly in this mode, but it may be
  252. necessary in environments with limited stack sizes. This applies only to the
  253. normal execution of the pcre_exec() function; if JIT support is being
  254. successfully used, it is not relevant. Equally, it does not apply to
  255. pcre_dfa_exec(), which does not use deeply nested recursion. There is a
  256. discussion about stack sizes in the pcrestack man page.
  257. . For speed, PCRE uses four tables for manipulating and identifying characters
  258. whose code point values are less than 256. By default, it uses a set of
  259. tables for ASCII encoding that is part of the distribution. If you specify
  260. --enable-rebuild-chartables
  261. a program called dftables is compiled and run in the default C locale when
  262. you obey "make". It builds a source file called pcre_chartables.c. If you do
  263. not specify this option, pcre_chartables.c is created as a copy of
  264. pcre_chartables.c.dist. See "Character tables" below for further information.
  265. . It is possible to compile PCRE for use on systems that use EBCDIC as their
  266. character code (as opposed to ASCII/Unicode) by specifying
  267. --enable-ebcdic
  268. This automatically implies --enable-rebuild-chartables (see above). However,
  269. when PCRE is built this way, it always operates in EBCDIC. It cannot support
  270. both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
  271. which specifies that the code value for the EBCDIC NL character is 0x25
  272. instead of the default 0x15.
  273. . In environments where valgrind is installed, if you specify
  274. --enable-valgrind
  275. PCRE will use valgrind annotations to mark certain memory regions as
  276. unaddressable. This allows it to detect invalid memory accesses, and is
  277. mostly useful for debugging PCRE itself.
  278. . In environments where the gcc compiler is used and lcov version 1.6 or above
  279. is installed, if you specify
  280. --enable-coverage
  281. the build process implements a code coverage report for the test suite. The
  282. report is generated by running "make coverage". If ccache is installed on
  283. your system, it must be disabled when building PCRE for coverage reporting.
  284. You can do this by setting the environment variable CCACHE_DISABLE=1 before
  285. running "make" to build PCRE. There is more information about coverage
  286. reporting in the "pcrebuild" documentation.
  287. . The pcregrep program currently supports only 8-bit data files, and so
  288. requires the 8-bit PCRE library. It is possible to compile pcregrep to use
  289. libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
  290. specifying one or both of
  291. --enable-pcregrep-libz
  292. --enable-pcregrep-libbz2
  293. Of course, the relevant libraries must be installed on your system.
  294. . The default size (in bytes) of the internal buffer used by pcregrep can be
  295. set by, for example:
  296. --with-pcregrep-bufsize=51200
  297. The value must be a plain integer. The default is 20480.
  298. . It is possible to compile pcretest so that it links with the libreadline
  299. or libedit libraries, by specifying, respectively,
  300. --enable-pcretest-libreadline or --enable-pcretest-libedit
  301. If this is done, when pcretest's input is from a terminal, it reads it using
  302. the readline() function. This provides line-editing and history facilities.
  303. Note that libreadline is GPL-licenced, so if you distribute a binary of
  304. pcretest linked in this way, there may be licensing issues. These can be
  305. avoided by linking with libedit (which has a BSD licence) instead.
  306. Enabling libreadline causes the -lreadline option to be added to the pcretest
  307. build. In many operating environments with a sytem-installed readline
  308. library this is sufficient. However, in some environments (e.g. if an
  309. unmodified distribution version of readline is in use), it may be necessary
  310. to specify something like LIBS="-lncurses" as well. This is because, to quote
  311. the readline INSTALL, "Readline uses the termcap functions, but does not link
  312. with the termcap or curses library itself, allowing applications which link
  313. with readline the to choose an appropriate library." If you get error
  314. messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
  315. this is the problem, and linking with the ncurses library should fix it.
  316. The "configure" script builds the following files for the basic C library:
  317. . Makefile the makefile that builds the library
  318. . config.h build-time configuration options for the library
  319. . pcre.h the public PCRE header file
  320. . pcre-config script that shows the building settings such as CFLAGS
  321. that were set for "configure"
  322. . libpcre.pc ) data for the pkg-config command
  323. . libpcre16.pc )
  324. . libpcre32.pc )
  325. . libpcreposix.pc )
  326. . libtool script that builds shared and/or static libraries
  327. Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
  328. names config.h.generic and pcre.h.generic. These are provided for those who
  329. have to built PCRE without using "configure" or CMake. If you use "configure"
  330. or CMake, the .generic versions are not used.
  331. When building the 8-bit library, if a C++ compiler is found, the following
  332. files are also built:
  333. . libpcrecpp.pc data for the pkg-config command
  334. . pcrecpparg.h header file for calling PCRE via the C++ wrapper
  335. . pcre_stringpiece.h header for the C++ "stringpiece" functions
  336. The "configure" script also creates config.status, which is an executable
  337. script that can be run to recreate the configuration, and config.log, which
  338. contains compiler output from tests that "configure" runs.
  339. Once "configure" has run, you can run "make". This builds the the libraries
  340. libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
  341. enabled JIT support with --enable-jit, a test program called pcre_jit_test is
  342. built as well.
  343. If the 8-bit library is built, libpcreposix and the pcregrep command are also
  344. built, and if a C++ compiler was found on your system, and you did not disable
  345. it with --disable-cpp, "make" builds the C++ wrapper library, which is called
  346. libpcrecpp, as well as some test programs called pcrecpp_unittest,
  347. pcre_scanner_unittest, and pcre_stringpiece_unittest.
  348. The command "make check" runs all the appropriate tests. Details of the PCRE
  349. tests are given below in a separate section of this document.
  350. You can use "make install" to install PCRE into live directories on your
  351. system. The following are installed (file names are all relative to the
  352. <prefix> that is set when "configure" is run):
  353. Commands (bin):
  354. pcretest
  355. pcregrep (if 8-bit support is enabled)
  356. pcre-config
  357. Libraries (lib):
  358. libpcre16 (if 16-bit support is enabled)
  359. libpcre32 (if 32-bit support is enabled)
  360. libpcre (if 8-bit support is enabled)
  361. libpcreposix (if 8-bit support is enabled)
  362. libpcrecpp (if 8-bit and C++ support is enabled)
  363. Configuration information (lib/pkgconfig):
  364. libpcre16.pc
  365. libpcre32.pc
  366. libpcre.pc
  367. libpcreposix.pc
  368. libpcrecpp.pc (if C++ support is enabled)
  369. Header files (include):
  370. pcre.h
  371. pcreposix.h
  372. pcre_scanner.h )
  373. pcre_stringpiece.h ) if C++ support is enabled
  374. pcrecpp.h )
  375. pcrecpparg.h )
  376. Man pages (share/man/man{1,3}):
  377. pcregrep.1
  378. pcretest.1
  379. pcre-config.1
  380. pcre.3
  381. pcre*.3 (lots more pages, all starting "pcre")
  382. HTML documentation (share/doc/pcre/html):
  383. index.html
  384. *.html (lots more pages, hyperlinked from index.html)
  385. Text file documentation (share/doc/pcre):
  386. AUTHORS
  387. COPYING
  388. ChangeLog
  389. LICENCE
  390. NEWS
  391. README
  392. pcre.txt (a concatenation of the man(3) pages)
  393. pcretest.txt the pcretest man page
  394. pcregrep.txt the pcregrep man page
  395. pcre-config.txt the pcre-config man page
  396. If you want to remove PCRE from your system, you can run "make uninstall".
  397. This removes all the files that "make install" installed. However, it does not
  398. remove any directories, because these are often shared with other programs.
  399. Retrieving configuration information
  400. ------------------------------------
  401. Running "make install" installs the command pcre-config, which can be used to
  402. recall information about the PCRE configuration and installation. For example:
  403. pcre-config --version
  404. prints the version number, and
  405. pcre-config --libs
  406. outputs information about where the library is installed. This command can be
  407. included in makefiles for programs that use PCRE, saving the programmer from
  408. having to remember too many details.
  409. The pkg-config command is another system for saving and retrieving information
  410. about installed libraries. Instead of separate commands for each library, a
  411. single command is used. For example:
  412. pkg-config --cflags pcre
  413. The data is held in *.pc files that are installed in a directory called
  414. <prefix>/lib/pkgconfig.
  415. Shared libraries
  416. ----------------
  417. The default distribution builds PCRE as shared libraries and static libraries,
  418. as long as the operating system supports shared libraries. Shared library
  419. support relies on the "libtool" script which is built as part of the
  420. "configure" process.
  421. The libtool script is used to compile and link both shared and static
  422. libraries. They are placed in a subdirectory called .libs when they are newly
  423. built. The programs pcretest and pcregrep are built to use these uninstalled
  424. libraries (by means of wrapper scripts in the case of shared libraries). When
  425. you use "make install" to install shared libraries, pcregrep and pcretest are
  426. automatically re-built to use the newly installed shared libraries before being
  427. installed themselves. However, the versions left in the build directory still
  428. use the uninstalled libraries.
  429. To build PCRE using static libraries only you must use --disable-shared when
  430. configuring it. For example:
  431. ./configure --prefix=/usr/gnu --disable-shared
  432. Then run "make" in the usual way. Similarly, you can use --disable-static to
  433. build only shared libraries.
  434. Cross-compiling using autotools
  435. -------------------------------
  436. You can specify CC and CFLAGS in the normal way to the "configure" command, in
  437. order to cross-compile PCRE for some other host. However, you should NOT
  438. specify --enable-rebuild-chartables, because if you do, the dftables.c source
  439. file is compiled and run on the local host, in order to generate the inbuilt
  440. character tables (the pcre_chartables.c file). This will probably not work,
  441. because dftables.c needs to be compiled with the local compiler, not the cross
  442. compiler.
  443. When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
  444. by making a copy of pcre_chartables.c.dist, which is a default set of tables
  445. that assumes ASCII code. Cross-compiling with the default tables should not be
  446. a problem.
  447. If you need to modify the character tables when cross-compiling, you should
  448. move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
  449. run it on the local host to make a new version of pcre_chartables.c.dist.
  450. Then when you cross-compile PCRE this new version of the tables will be used.
  451. Using HP's ANSI C++ compiler (aCC)
  452. ----------------------------------
  453. Unless C++ support is disabled by specifying the "--disable-cpp" option of the
  454. "configure" script, you must include the "-AA" option in the CXXFLAGS
  455. environment variable in order for the C++ components to compile correctly.
  456. Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
  457. needed libraries fail to get included when specifying the "-AA" compiler
  458. option. If you experience unresolved symbols when linking the C++ programs,
  459. use the workaround of specifying the following environment variable prior to
  460. running the "configure" script:
  461. CXXLDFLAGS="-lstd_v2 -lCsup_v2"
  462. Compiling in Tru64 using native compilers
  463. -----------------------------------------
  464. The following error may occur when compiling with native compilers in the Tru64
  465. operating system:
  466. CXX libpcrecpp_la-pcrecpp.lo
  467. cxx: Error: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/iosfwd, line 58: #error
  468. directive: "cannot include iosfwd -- define __USE_STD_IOSTREAM to
  469. override default - see section 7.1.2 of the C++ Using Guide"
  470. #error "cannot include iosfwd -- define __USE_STD_IOSTREAM to override default
  471. - see section 7.1.2 of the C++ Using Guide"
  472. This may be followed by other errors, complaining that 'namespace "std" has no
  473. member'. The solution to this is to add the line
  474. #define __USE_STD_IOSTREAM 1
  475. to the config.h file.
  476. Using Sun's compilers for Solaris
  477. ---------------------------------
  478. A user reports that the following configurations work on Solaris 9 sparcv9 and
  479. Solaris 9 x86 (32-bit):
  480. Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
  481. Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
  482. Using PCRE from MySQL
  483. ---------------------
  484. On systems where both PCRE and MySQL are installed, it is possible to make use
  485. of PCRE from within MySQL, as an alternative to the built-in pattern matching.
  486. There is a web page that tells you how to do this:
  487. http://www.mysqludf.org/lib_mysqludf_preg/index.php
  488. Making new tarballs
  489. -------------------
  490. The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
  491. zip formats. The command "make distcheck" does the same, but then does a trial
  492. build of the new distribution to ensure that it works.
  493. If you have modified any of the man page sources in the doc directory, you
  494. should first run the PrepareRelease script before making a distribution. This
  495. script creates the .txt and HTML forms of the documentation from the man pages.
  496. Testing PCRE
  497. ------------
  498. To test the basic PCRE library on a Unix-like system, run the RunTest script.
  499. There is another script called RunGrepTest that tests the options of the
  500. pcregrep command. If the C++ wrapper library is built, three test programs
  501. called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest
  502. are also built. When JIT support is enabled, another test program called
  503. pcre_jit_test is built.
  504. Both the scripts and all the program tests are run if you obey "make check" or
  505. "make test". For other environments, see the instructions in
  506. NON-AUTOTOOLS-BUILD.
  507. The RunTest script runs the pcretest test program (which is documented in its
  508. own man page) on each of the relevant testinput files in the testdata
  509. directory, and compares the output with the contents of the corresponding
  510. testoutput files. RunTest uses a file called testtry to hold the main output
  511. from pcretest. Other files whose names begin with "test" are used as working
  512. files in some tests.
  513. Some tests are relevant only when certain build-time options were selected. For
  514. example, the tests for UTF-8/16/32 support are run only if --enable-utf was
  515. used. RunTest outputs a comment when it skips a test.
  516. Many of the tests that are not skipped are run up to three times. The second
  517. run forces pcre_study() to be called for all patterns except for a few in some
  518. tests that are marked "never study" (see the pcretest program for how this is
  519. done). If JIT support is available, the non-DFA tests are run a third time,
  520. this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
  521. This testing can be suppressed by putting "nojit" on the RunTest command line.
  522. The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
  523. libraries that are enabled. If you want to run just one set of tests, call
  524. RunTest with either the -8, -16 or -32 option.
  525. If valgrind is installed, you can run the tests under it by putting "valgrind"
  526. on the RunTest command line. To run pcretest on just one or more specific test
  527. files, give their numbers as arguments to RunTest, for example:
  528. RunTest 2 7 11
  529. You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
  530. end), or a number preceded by ~ to exclude a test. For example:
  531. Runtest 3-15 ~10
  532. This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
  533. except test 13. Whatever order the arguments are in, the tests are always run
  534. in numerical order.
  535. You can also call RunTest with the single argument "list" to cause it to output
  536. a list of tests.
  537. The first test file can be fed directly into the perltest.pl script to check
  538. that Perl gives the same results. The only difference you should see is in the
  539. first few lines, where the Perl version is given instead of the PCRE version.
  540. The second set of tests check pcre_fullinfo(), pcre_study(),
  541. pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
  542. detection, and run-time flags that are specific to PCRE, as well as the POSIX
  543. wrapper API. It also uses the debugging flags to check some of the internals of
  544. pcre_compile().
  545. If you build PCRE with a locale setting that is not the standard C locale, the
  546. character tables may be different (see next paragraph). In some cases, this may
  547. cause failures in the second set of tests. For example, in a locale where the
  548. isprint() function yields TRUE for characters in the range 128-255, the use of
  549. [:isascii:] inside a character class defines a different set of characters, and
  550. this shows up in this test as a difference in the compiled code, which is being
  551. listed for checking. Where the comparison test output contains [\x00-\x7f] the
  552. test will contain [\x00-\xff], and similarly in some other cases. This is not a
  553. bug in PCRE.
  554. The third set of tests checks pcre_maketables(), the facility for building a
  555. set of character tables for a specific locale and using them instead of the
  556. default tables. The tests make use of the "fr_FR" (French) locale. Before
  557. running the test, the script checks for the presence of this locale by running
  558. the "locale" command. If that command fails, or if it doesn't include "fr_FR"
  559. in the list of available locales, the third test cannot be run, and a comment
  560. is output to say why. If running this test produces instances of the error
  561. ** Failed to set locale "fr_FR"
  562. in the comparison output, it means that locale is not available on your system,
  563. despite being listed by "locale". This does not mean that PCRE is broken.
  564. [If you are trying to run this test on Windows, you may be able to get it to
  565. work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
  566. RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
  567. Windows versions of test 2. More info on using RunTest.bat is included in the
  568. document entitled NON-UNIX-USE.]
  569. The fourth and fifth tests check the UTF-8/16/32 support and error handling and
  570. internal UTF features of PCRE that are not relevant to Perl, respectively. The
  571. sixth and seventh tests do the same for Unicode character properties support.
  572. The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
  573. matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
  574. mode with Unicode property support, respectively.
  575. The eleventh test checks some internal offsets and code size features; it is
  576. run only when the default "link size" of 2 is set (in other cases the sizes
  577. change) and when Unicode property support is enabled.
  578. The twelfth test is run only when JIT support is available, and the thirteenth
  579. test is run only when JIT support is not available. They test some JIT-specific
  580. features such as information output from pcretest about JIT compilation.
  581. The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
  582. the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit
  583. mode. These are tests that generate different output in the two modes. They are
  584. for general cases, UTF-8/16/32 support, and Unicode property support,
  585. respectively.
  586. The twentieth test is run only in 16/32-bit mode. It tests some specific
  587. 16/32-bit features of the DFA matching engine.
  588. The twenty-first and twenty-second tests are run only in 16/32-bit mode, when
  589. the link size is set to 2 for the 16-bit library. They test reloading
  590. pre-compiled patterns.
  591. The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are
  592. for general cases, and UTF-16 support, respectively.
  593. The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are
  594. for general cases, and UTF-32 support, respectively.
  595. Character tables
  596. ----------------
  597. For speed, PCRE uses four tables for manipulating and identifying characters
  598. whose code point values are less than 256. The final argument of the
  599. pcre_compile() function is a pointer to a block of memory containing the
  600. concatenated tables. A call to pcre_maketables() can be used to generate a set
  601. of tables in the current locale. If the final argument for pcre_compile() is
  602. passed as NULL, a set of default tables that is built into the binary is used.
  603. The source file called pcre_chartables.c contains the default set of tables. By
  604. default, this is created as a copy of pcre_chartables.c.dist, which contains
  605. tables for ASCII coding. However, if --enable-rebuild-chartables is specified
  606. for ./configure, a different version of pcre_chartables.c is built by the
  607. program dftables (compiled from dftables.c), which uses the ANSI C character
  608. handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
  609. build the table sources. This means that the default C locale which is set for
  610. your system will control the contents of these default tables. You can change
  611. the default tables by editing pcre_chartables.c and then re-building PCRE. If
  612. you do this, you should take care to ensure that the file does not get
  613. automatically re-generated. The best way to do this is to move
  614. pcre_chartables.c.dist out of the way and replace it with your customized
  615. tables.
  616. When the dftables program is run as a result of --enable-rebuild-chartables,
  617. it uses the default C locale that is set on your system. It does not pay
  618. attention to the LC_xxx environment variables. In other words, it uses the
  619. system's default locale rather than whatever the compiling user happens to have
  620. set. If you really do want to build a source set of character tables in a
  621. locale that is specified by the LC_xxx variables, you can run the dftables
  622. program by hand with the -L option. For example:
  623. ./dftables -L pcre_chartables.c.special
  624. The first two 256-byte tables provide lower casing and case flipping functions,
  625. respectively. The next table consists of three 32-byte bit maps which identify
  626. digits, "word" characters, and white space, respectively. These are used when
  627. building 32-byte bit maps that represent character classes for code points less
  628. than 256.
  629. The final 256-byte table has bits indicating various character types, as
  630. follows:
  631. 1 white space character
  632. 2 letter
  633. 4 decimal digit
  634. 8 hexadecimal digit
  635. 16 alphanumeric or '_'
  636. 128 regular expression metacharacter or binary zero
  637. You should not alter the set of characters that contain the 128 bit, as that
  638. will cause PCRE to malfunction.
  639. File manifest
  640. -------------
  641. The distribution should contain the files listed below. Where a file name is
  642. given as pcre[16|32]_xxx it means that there are three files, one with the name
  643. pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
  644. (A) Source files of the PCRE library functions and their headers:
  645. dftables.c auxiliary program for building pcre_chartables.c
  646. when --enable-rebuild-chartables is specified
  647. pcre_chartables.c.dist a default set of character tables that assume ASCII
  648. coding; used, unless --enable-rebuild-chartables is
  649. specified, by copying to pcre[16]_chartables.c
  650. pcreposix.c )
  651. pcre[16|32]_byte_order.c )
  652. pcre[16|32]_compile.c )
  653. pcre[16|32]_config.c )
  654. pcre[16|32]_dfa_exec.c )
  655. pcre[16|32]_exec.c )
  656. pcre[16|32]_fullinfo.c )
  657. pcre[16|32]_get.c ) sources for the functions in the library,
  658. pcre[16|32]_globals.c ) and some internal functions that they use
  659. pcre[16|32]_jit_compile.c )
  660. pcre[16|32]_maketables.c )
  661. pcre[16|32]_newline.c )
  662. pcre[16|32]_refcount.c )
  663. pcre[16|32]_string_utils.c )
  664. pcre[16|32]_study.c )
  665. pcre[16|32]_tables.c )
  666. pcre[16|32]_ucd.c )
  667. pcre[16|32]_version.c )
  668. pcre[16|32]_xclass.c )
  669. pcre_ord2utf8.c )
  670. pcre_valid_utf8.c )
  671. pcre16_ord2utf16.c )
  672. pcre16_utf16_utils.c )
  673. pcre16_valid_utf16.c )
  674. pcre32_utf32_utils.c )
  675. pcre32_valid_utf32.c )
  676. pcre[16|32]_printint.c ) debugging function that is used by pcretest,
  677. ) and can also be #included in pcre_compile()
  678. pcre.h.in template for pcre.h when built by "configure"
  679. pcreposix.h header for the external POSIX wrapper API
  680. pcre_internal.h header for internal use
  681. sljit/* 16 files that make up the JIT compiler
  682. ucp.h header for Unicode property handling
  683. config.h.in template for config.h, which is built by "configure"
  684. pcrecpp.h public header file for the C++ wrapper
  685. pcrecpparg.h.in template for another C++ header file
  686. pcre_scanner.h public header file for C++ scanner functions
  687. pcrecpp.cc )
  688. pcre_scanner.cc ) source for the C++ wrapper library
  689. pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
  690. C++ stringpiece functions
  691. pcre_stringpiece.cc source for the C++ stringpiece functions
  692. (B) Source files for programs that use PCRE:
  693. pcredemo.c simple demonstration of coding calls to PCRE
  694. pcregrep.c source of a grep utility that uses PCRE
  695. pcretest.c comprehensive test program
  696. (C) Auxiliary files:
  697. 132html script to turn "man" pages into HTML
  698. AUTHORS information about the author of PCRE
  699. ChangeLog log of changes to the code
  700. CleanTxt script to clean nroff output for txt man pages
  701. Detrail script to remove trailing spaces
  702. HACKING some notes about the internals of PCRE
  703. INSTALL generic installation instructions
  704. LICENCE conditions for the use of PCRE
  705. COPYING the same, using GNU's standard name
  706. Makefile.in ) template for Unix Makefile, which is built by
  707. ) "configure"
  708. Makefile.am ) the automake input that was used to create
  709. ) Makefile.in
  710. NEWS important changes in this release
  711. NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD
  712. NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools
  713. PrepareRelease script to make preparations for "make dist"
  714. README this file
  715. RunTest a Unix shell script for running tests
  716. RunGrepTest a Unix shell script for pcregrep tests
  717. aclocal.m4 m4 macros (generated by "aclocal")
  718. config.guess ) files used by libtool,
  719. config.sub ) used only when building a shared library
  720. configure a configuring shell script (built by autoconf)
  721. configure.ac ) the autoconf input that was used to build
  722. ) "configure" and config.h
  723. depcomp ) script to find program dependencies, generated by
  724. ) automake
  725. doc/*.3 man page sources for PCRE
  726. doc/*.1 man page sources for pcregrep and pcretest
  727. doc/index.html.src the base HTML page
  728. doc/html/* HTML documentation
  729. doc/pcre.txt plain text version of the man pages
  730. doc/pcretest.txt plain text documentation of test program
  731. doc/perltest.txt plain text documentation of Perl test program
  732. install-sh a shell script for installing files
  733. libpcre16.pc.in template for libpcre16.pc for pkg-config
  734. libpcre32.pc.in template for libpcre32.pc for pkg-config
  735. libpcre.pc.in template for libpcre.pc for pkg-config
  736. libpcreposix.pc.in template for libpcreposix.pc for pkg-config
  737. libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
  738. ltmain.sh file used to build a libtool script
  739. missing ) common stub for a few missing GNU programs while
  740. ) installing, generated by automake
  741. mkinstalldirs script for making install directories
  742. perltest.pl Perl test program
  743. pcre-config.in source of script which retains PCRE information
  744. pcre_jit_test.c test program for the JIT compiler
  745. pcrecpp_unittest.cc )
  746. pcre_scanner_unittest.cc ) test programs for the C++ wrapper
  747. pcre_stringpiece_unittest.cc )
  748. testdata/testinput* test data for main library tests
  749. testdata/testoutput* expected test results
  750. testdata/grep* input and output for pcregrep tests
  751. testdata/* other supporting test files
  752. (D) Auxiliary files for cmake support
  753. cmake/COPYING-CMAKE-SCRIPTS
  754. cmake/FindPackageHandleStandardArgs.cmake
  755. cmake/FindEditline.cmake
  756. cmake/FindReadline.cmake
  757. CMakeLists.txt
  758. config-cmake.h.in
  759. (E) Auxiliary files for VPASCAL
  760. makevp.bat
  761. makevp_c.txt
  762. makevp_l.txt
  763. pcregexp.pas
  764. (F) Auxiliary files for building PCRE "by hand"
  765. pcre.h.generic ) a version of the public PCRE header file
  766. ) for use in non-"configure" environments
  767. config.h.generic ) a version of config.h for use in non-"configure"
  768. ) environments
  769. (F) Miscellaneous
  770. RunTest.bat a script for running tests under Windows
  771. Philip Hazel
  772. Email local part: Philip.Hazel
  773. Email domain: gmail.com
  774. Last updated: 15 June 2021