swscale.txt 4.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
  1. The official guide to swscale for confused developers.
  2. ========================================================
  3. Current (simplified) Architecture:
  4. ---------------------------------
  5. Input
  6. v
  7. _______OR_________
  8. / \
  9. / \
  10. special converter [Input to YUV converter]
  11. | |
  12. | (8-bit YUV 4:4:4 / 4:2:2 / 4:2:0 / 4:0:0 )
  13. | |
  14. | v
  15. | Horizontal scaler
  16. | |
  17. | (15-bit YUV 4:4:4 / 4:2:2 / 4:2:0 / 4:1:1 / 4:0:0 )
  18. | |
  19. | v
  20. | Vertical scaler and output converter
  21. | |
  22. v v
  23. output
  24. Swscale has 2 scaler paths. Each side must be capable of handling
  25. slices, that is, consecutive non-overlapping rectangles of dimension
  26. (0,slice_top) - (picture_width, slice_bottom).
  27. special converter
  28. These generally are unscaled converters of common
  29. formats, like YUV 4:2:0/4:2:2 -> RGB12/15/16/24/32. Though it could also
  30. in principle contain scalers optimized for specific common cases.
  31. Main path
  32. The main path is used when no special converter can be used. The code
  33. is designed as a destination line pull architecture. That is, for each
  34. output line the vertical scaler pulls lines from a ring buffer. When
  35. the ring buffer does not contain the wanted line, then it is pulled from
  36. the input slice through the input converter and horizontal scaler.
  37. The result is also stored in the ring buffer to serve future vertical
  38. scaler requests.
  39. When no more output can be generated because lines from a future slice
  40. would be needed, then all remaining lines in the current slice are
  41. converted, horizontally scaled and put in the ring buffer.
  42. [This is done for luma and chroma, each with possibly different numbers
  43. of lines per picture.]
  44. Input to YUV Converter
  45. When the input to the main path is not planar 8 bits per component YUV or
  46. 8-bit gray, it is converted to planar 8-bit YUV. Two sets of converters
  47. exist for this currently: One performs horizontal downscaling by 2
  48. before the conversion, the other leaves the full chroma resolution,
  49. but is slightly slower. The scaler will try to preserve full chroma
  50. when the output uses it. It is possible to force full chroma with
  51. SWS_FULL_CHR_H_INP even for cases where the scaler thinks it is useless.
  52. Horizontal scaler
  53. There are several horizontal scalers. A special case worth mentioning is
  54. the fast bilinear scaler that is made of runtime-generated MMXEXT code
  55. using specially tuned pshufw instructions.
  56. The remaining scalers are specially-tuned for various filter lengths.
  57. They scale 8-bit unsigned planar data to 16-bit signed planar data.
  58. Future >8 bits per component inputs will need to add a new horizontal
  59. scaler that preserves the input precision.
  60. Vertical scaler and output converter
  61. There is a large number of combined vertical scalers + output converters.
  62. Some are:
  63. * unscaled output converters
  64. * unscaled output converters that average 2 chroma lines
  65. * bilinear converters (C, MMX and accurate MMX)
  66. * arbitrary filter length converters (C, MMX and accurate MMX)
  67. And
  68. * Plain C 8-bit 4:2:2 YUV -> RGB converters using LUTs
  69. * Plain C 17-bit 4:4:4 YUV -> RGB converters using multiplies
  70. * MMX 11-bit 4:2:2 YUV -> RGB converters
  71. * Plain C 16-bit Y -> 16-bit gray
  72. ...
  73. RGB with less than 8 bits per component uses dither to improve the
  74. subjective quality and low-frequency accuracy.
  75. Filter coefficients:
  76. --------------------
  77. There are several different scalers (bilinear, bicubic, lanczos, area,
  78. sinc, ...). Their coefficients are calculated in initFilter().
  79. Horizontal filter coefficients have a 1.0 point at 1 << 14, vertical ones at
  80. 1 << 12. The 1.0 points have been chosen to maximize precision while leaving
  81. a little headroom for convolutional filters like sharpening filters and
  82. minimizing SIMD instructions needed to apply them.
  83. It would be trivial to use a different 1.0 point if some specific scaler
  84. would benefit from it.
  85. Also, as already hinted at, initFilter() accepts an optional convolutional
  86. filter as input that can be used for contrast, saturation, blur, sharpening
  87. shift, chroma vs. luma shift, ...