123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246 |
- Requirements:
- - automake, autoconf, libtool
- (not needed when compiling a release)
- - pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
- (not needed when compiling a release using the included isl and pet)
- - gmp (http://gmplib.org/)
- - libyaml (http://pyyaml.org/wiki/LibYAML)
- (only needed if you want to compile the pet executable)
- - LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
- Unless you have some other reasons for wanting to use the svn version,
- it is best to install the latest release (3.9).
- For more details, see pet/README.
- If you are installing on Ubuntu, then you can install the following packages:
- automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
- Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
- Older versions of this package did not include the required libraries.
- If you are using an older version of ubuntu, then you need to compile and
- install LLVM/clang from source.
- Preparing:
- Grab the latest release and extract it or get the source from
- the git repository as follows. This process requires autoconf,
- automake, libtool and pkg-config.
- git clone git://repo.or.cz/ppcg.git
- cd ppcg
- ./get_submodules.sh
- ./autogen.sh
- Compilation:
- ./configure
- make
- make check
- If you have installed any of the required libraries in a non-standard
- location, then you may need to use the --with-gmp-prefix,
- --with-libyaml-prefix and/or --with-clang-prefix options
- when calling "./configure".
- Using PPCG to generate CUDA or OpenCL code
- To convert a fragment of a C program to CUDA, insert a line containing
- #pragma scop
- before the fragment and add a line containing
- #pragma endscop
- after the fragment. To generate CUDA code run
-
- ppcg --target=cuda file.c
- where file.c is the file containing the fragment. The generated
- code is stored in file_host.cu and file_kernel.cu.
- To generate OpenCL code run
- ppcg --target=opencl file.c
- where file.c is the file containing the fragment. The generated code
- is stored in file_host.c and file_kernel.cl.
- Specifying tile, grid and block sizes
- The iterations space tile size, grid size and block size can
- be specified using the --sizes option. The argument is a union map
- in isl notation mapping kernels identified by their sequence number
- in a "kernel" space to singleton sets in the "tile", "grid" and "block"
- spaces. The sizes are specified outermost to innermost.
- The dimension of the "tile" space indicates the (maximal) number of loop
- dimensions to tile. The elements of the single integer tuple
- specify the tile sizes in each dimension.
- In case of hybrid tiling, the first element is half the size of
- the tile in the time (sequential) dimension. The second element
- specifies the number of elements in the base of the hexagon.
- The remaining elements specify the tile sizes in the remaining space
- dimensions.
- The dimension of the "grid" space indicates the (maximal) number of block
- dimensions in the grid. The elements of the single integer tuple
- specify the number of blocks in each dimension.
- The dimension of the "block" space indicates the (maximal) number of thread
- dimensions in the grid. The elements of the single integer tuple
- specify the number of threads in each dimension.
- For example,
- { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
- specifies that in kernel 0, two loops should be tiled with a tile
- size of 64 in both dimensions and that all kernels except kernel 4
- should be run using a block of 16 threads.
- Since PPCG performs some scheduling, it can be difficult to predict
- what exactly will end up in a kernel. If you want to specify
- tile, grid or block sizes, you may want to run PPCG first with the defaults,
- examine the kernels and then run PPCG again with the desired sizes.
- Instead of examining the kernels, you can also specify the option
- --dump-sizes on the first run to obtain the effectively used default sizes.
- Compiling the generated CUDA code with nvcc
- To get optimal performance from nvcc, it is important to choose --arch
- according to your target GPU. Specifically, use the flag "--arch sm_20"
- for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
- GK110 Kepler. We discourage the use of older cards as we have seen
- correctness issues with compilation for older architectures.
- Note that in the absence of any --arch flag, nvcc defaults to
- "--arch sm_13". This will not only be slower, but can also cause
- correctness issues.
- If you want to obtain results that are identical to those obtained
- by the original code, then you may need to disable some optimizations
- by passing the "--fmad=false" option.
- Compiling the generated OpenCL code with gcc
- To compile the host code you need to link against the file
- ocl_utilities.c which contains utility functions used by the generated
- OpenCL host code. To compile the host code with gcc, run
- gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
- Note that we have experienced the generated OpenCL code freezing
- on some inputs (e.g., the PolyBench symm benchmark) when using
- at least some version of the Nvidia OpenCL library, while the
- corresponding CUDA code runs fine.
- We have experienced no such freezes when using AMD, ARM or Intel
- OpenCL libraries.
- By default, the compiled executable will need the _kernel.cl file at
- run time. Alternatively, the option --opencl-embed-kernel-code may be
- given to place the kernel code in a string literal. The kernel code is
- then compiled into the host binary, such that the _kernel.cl file is no
- longer needed at run time. Any kernel include files, in particular
- those supplied using --opencl-include-file, will still be required at
- run time.
- Function calls
- Function calls inside the analyzed fragment are reproduced
- in the CUDA or OpenCL code, but for now it is left to the user
- to make sure that the functions that are being called are
- available from the generated kernels.
- In the case of OpenCL code, the --opencl-include-file option
- may be used to specify one or more files to be #include'd
- from the generated code. These files may then contain
- the definitions of the functions being called from the
- program fragment. If the pathnames of the included files
- are relative to the current directory, then you may need
- to additionally specify the --opencl-compiler-options=-I.
- to make sure that the files can be found by the OpenCL compiler.
- The included files may contain definitions of types used by the
- generated kernels. By default, PPCG generates definitions for
- types as needed, but these definitions may collide with those in
- the included files, as PPCG does not consider the contents of the
- included files. The --no-opencl-print-kernel-types will prevent
- PPCG from generating type definitions.
- GNU extensions
- By default, PPCG may print out macro definitions that involve
- GNU extensions such as __typeof__ and statement expressions.
- Some compilers may not support these extensions.
- In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)
- has been reported not to support __typeof__.
- The use of these extensions can be turned off with the
- --no-allow-gnu-extensions option.
- Processing PolyBench
- When processing a PolyBench/C 3.2 benchmark, you should always specify
- -DPOLYBENCH_USE_C99_PROTO on the ppcg command line. Otherwise, the source
- files are inconsistent, having fixed size arrays but parametrically
- bounded loops iterating over them.
- However, you should not specify this define when compiling
- the PPCG generated code using nvcc since CUDA does not support VLAs.
- CUDA and function overloading
- While CUDA supports function overloading based on the arguments types,
- no such function overloading exists in the input language C. Since PPCG
- simply prints out the same function name as in the original code, this
- may result in a different function being called based on the types
- of the arguments. For example, if the original code contains a call
- to the function sqrt() with a float argument, then the argument will
- be promoted to a double and the sqrt() function will be called.
- In the transformed (CUDA) code, however, overloading will cause the
- function sqrtf() to be called. Until this issue has been resolved in PPCG,
- we recommend that users either explicitly call the function sqrtf() or
- explicitly cast the argument to double in the input code.
- Contact
- For bug reports, feature requests and questions,
- contact http://groups.google.com/group/isl-development
- Whenever you report a bug, please mention the exact version of PPCG
- that you are using (output of "./ppcg --version"). If you are unable
- to compile PPCG, then report the git version (output of "git describe")
- or the version number included in the name of the tarball.
- Citing PPCG
- If you use PPCG for your research, you are invited to cite
- the following paper.
- @article{Verdoolaege2013PPCG,
- author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
- G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
- Catthoor, Francky},
- title = {Polyhedral parallel code generation for CUDA},
- journal = {ACM Trans. Archit. Code Optim.},
- issue_date = {January 2013},
- volume = {9},
- number = {4},
- month = jan,
- year = {2013},
- issn = {1544-3566},
- pages = {54:1--54:23},
- doi = {10.1145/2400682.2400713},
- acmid = {2400713},
- publisher = {ACM},
- address = {New York, NY, USA},
- }
|