README 9.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246
  1. Requirements:
  2. - automake, autoconf, libtool
  3. (not needed when compiling a release)
  4. - pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
  5. (not needed when compiling a release using the included isl and pet)
  6. - gmp (http://gmplib.org/)
  7. - libyaml (http://pyyaml.org/wiki/LibYAML)
  8. (only needed if you want to compile the pet executable)
  9. - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
  10. Unless you have some other reasons for wanting to use the svn version,
  11. it is best to install the latest release (3.9).
  12. For more details, see pet/README.
  13. If you are installing on Ubuntu, then you can install the following packages:
  14. automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
  15. Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
  16. Older versions of this package did not include the required libraries.
  17. If you are using an older version of ubuntu, then you need to compile and
  18. install LLVM/clang from source.
  19. Preparing:
  20. Grab the latest release and extract it or get the source from
  21. the git repository as follows. This process requires autoconf,
  22. automake, libtool and pkg-config.
  23. git clone git://repo.or.cz/ppcg.git
  24. cd ppcg
  25. ./get_submodules.sh
  26. ./autogen.sh
  27. Compilation:
  28. ./configure
  29. make
  30. make check
  31. If you have installed any of the required libraries in a non-standard
  32. location, then you may need to use the --with-gmp-prefix,
  33. --with-libyaml-prefix and/or --with-clang-prefix options
  34. when calling "./configure".
  35. Using PPCG to generate CUDA or OpenCL code
  36. To convert a fragment of a C program to CUDA, insert a line containing
  37. #pragma scop
  38. before the fragment and add a line containing
  39. #pragma endscop
  40. after the fragment. To generate CUDA code run
  41. ppcg --target=cuda file.c
  42. where file.c is the file containing the fragment. The generated
  43. code is stored in file_host.cu and file_kernel.cu.
  44. To generate OpenCL code run
  45. ppcg --target=opencl file.c
  46. where file.c is the file containing the fragment. The generated code
  47. is stored in file_host.c and file_kernel.cl.
  48. Specifying tile, grid and block sizes
  49. The iterations space tile size, grid size and block size can
  50. be specified using the --sizes option. The argument is a union map
  51. in isl notation mapping kernels identified by their sequence number
  52. in a "kernel" space to singleton sets in the "tile", "grid" and "block"
  53. spaces. The sizes are specified outermost to innermost.
  54. The dimension of the "tile" space indicates the (maximal) number of loop
  55. dimensions to tile. The elements of the single integer tuple
  56. specify the tile sizes in each dimension.
  57. In case of hybrid tiling, the first element is half the size of
  58. the tile in the time (sequential) dimension. The second element
  59. specifies the number of elements in the base of the hexagon.
  60. The remaining elements specify the tile sizes in the remaining space
  61. dimensions.
  62. The dimension of the "grid" space indicates the (maximal) number of block
  63. dimensions in the grid. The elements of the single integer tuple
  64. specify the number of blocks in each dimension.
  65. The dimension of the "block" space indicates the (maximal) number of thread
  66. dimensions in the grid. The elements of the single integer tuple
  67. specify the number of threads in each dimension.
  68. For example,
  69. { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
  70. specifies that in kernel 0, two loops should be tiled with a tile
  71. size of 64 in both dimensions and that all kernels except kernel 4
  72. should be run using a block of 16 threads.
  73. Since PPCG performs some scheduling, it can be difficult to predict
  74. what exactly will end up in a kernel. If you want to specify
  75. tile, grid or block sizes, you may want to run PPCG first with the defaults,
  76. examine the kernels and then run PPCG again with the desired sizes.
  77. Instead of examining the kernels, you can also specify the option
  78. --dump-sizes on the first run to obtain the effectively used default sizes.
  79. Compiling the generated CUDA code with nvcc
  80. To get optimal performance from nvcc, it is important to choose --arch
  81. according to your target GPU. Specifically, use the flag "--arch sm_20"
  82. for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
  83. GK110 Kepler. We discourage the use of older cards as we have seen
  84. correctness issues with compilation for older architectures.
  85. Note that in the absence of any --arch flag, nvcc defaults to
  86. "--arch sm_13". This will not only be slower, but can also cause
  87. correctness issues.
  88. If you want to obtain results that are identical to those obtained
  89. by the original code, then you may need to disable some optimizations
  90. by passing the "--fmad=false" option.
  91. Compiling the generated OpenCL code with gcc
  92. To compile the host code you need to link against the file
  93. ocl_utilities.c which contains utility functions used by the generated
  94. OpenCL host code. To compile the host code with gcc, run
  95. gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
  96. Note that we have experienced the generated OpenCL code freezing
  97. on some inputs (e.g., the PolyBench symm benchmark) when using
  98. at least some version of the Nvidia OpenCL library, while the
  99. corresponding CUDA code runs fine.
  100. We have experienced no such freezes when using AMD, ARM or Intel
  101. OpenCL libraries.
  102. By default, the compiled executable will need the _kernel.cl file at
  103. run time. Alternatively, the option --opencl-embed-kernel-code may be
  104. given to place the kernel code in a string literal. The kernel code is
  105. then compiled into the host binary, such that the _kernel.cl file is no
  106. longer needed at run time. Any kernel include files, in particular
  107. those supplied using --opencl-include-file, will still be required at
  108. run time.
  109. Function calls
  110. Function calls inside the analyzed fragment are reproduced
  111. in the CUDA or OpenCL code, but for now it is left to the user
  112. to make sure that the functions that are being called are
  113. available from the generated kernels.
  114. In the case of OpenCL code, the --opencl-include-file option
  115. may be used to specify one or more files to be #include'd
  116. from the generated code. These files may then contain
  117. the definitions of the functions being called from the
  118. program fragment. If the pathnames of the included files
  119. are relative to the current directory, then you may need
  120. to additionally specify the --opencl-compiler-options=-I.
  121. to make sure that the files can be found by the OpenCL compiler.
  122. The included files may contain definitions of types used by the
  123. generated kernels. By default, PPCG generates definitions for
  124. types as needed, but these definitions may collide with those in
  125. the included files, as PPCG does not consider the contents of the
  126. included files. The --no-opencl-print-kernel-types will prevent
  127. PPCG from generating type definitions.
  128. GNU extensions
  129. By default, PPCG may print out macro definitions that involve
  130. GNU extensions such as __typeof__ and statement expressions.
  131. Some compilers may not support these extensions.
  132. In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)
  133. has been reported not to support __typeof__.
  134. The use of these extensions can be turned off with the
  135. --no-allow-gnu-extensions option.
  136. Processing PolyBench
  137. When processing a PolyBench/C 3.2 benchmark, you should always specify
  138. -DPOLYBENCH_USE_C99_PROTO on the ppcg command line. Otherwise, the source
  139. files are inconsistent, having fixed size arrays but parametrically
  140. bounded loops iterating over them.
  141. However, you should not specify this define when compiling
  142. the PPCG generated code using nvcc since CUDA does not support VLAs.
  143. CUDA and function overloading
  144. While CUDA supports function overloading based on the arguments types,
  145. no such function overloading exists in the input language C. Since PPCG
  146. simply prints out the same function name as in the original code, this
  147. may result in a different function being called based on the types
  148. of the arguments. For example, if the original code contains a call
  149. to the function sqrt() with a float argument, then the argument will
  150. be promoted to a double and the sqrt() function will be called.
  151. In the transformed (CUDA) code, however, overloading will cause the
  152. function sqrtf() to be called. Until this issue has been resolved in PPCG,
  153. we recommend that users either explicitly call the function sqrtf() or
  154. explicitly cast the argument to double in the input code.
  155. Contact
  156. For bug reports, feature requests and questions,
  157. contact http://groups.google.com/group/isl-development
  158. Whenever you report a bug, please mention the exact version of PPCG
  159. that you are using (output of "./ppcg --version"). If you are unable
  160. to compile PPCG, then report the git version (output of "git describe")
  161. or the version number included in the name of the tarball.
  162. Citing PPCG
  163. If you use PPCG for your research, you are invited to cite
  164. the following paper.
  165. @article{Verdoolaege2013PPCG,
  166. author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
  167. G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
  168. Catthoor, Francky},
  169. title = {Polyhedral parallel code generation for CUDA},
  170. journal = {ACM Trans. Archit. Code Optim.},
  171. issue_date = {January 2013},
  172. volume = {9},
  173. number = {4},
  174. month = jan,
  175. year = {2013},
  176. issn = {1544-3566},
  177. pages = {54:1--54:23},
  178. doi = {10.1145/2400682.2400713},
  179. acmid = {2400713},
  180. publisher = {ACM},
  181. address = {New York, NY, USA},
  182. }