Bernhard Rosenkränzer
|
ba9fb5da3a
Fix PIC compilation, some defines were under #ifdef !PIC but used
|
18 years ago |
Michael Niedermayer
|
d72bc32389
unused variable
|
18 years ago |
Michael Niedermayer
|
ebd624b662
optimize sign decoding code in decode_residual()
|
18 years ago |
Jindřich Makovička
|
a0f2c6ba38
Kill a warning with MSVC
|
18 years ago |
Michael Niedermayer
|
eb73bf723d
x86 asm version of the decode significance loop (not 8x8) of decode_residual() 5% faster decode_residual() on P3
|
18 years ago |
Michael Niedermayer
|
4041a495a8
cosmetic (%%eax->%0)
|
18 years ago |
Diego Biurrun
|
8dda3e796b
Fix crash with illegal instruction, cmov is available on 686 and later only.
|
18 years ago |
Diego Biurrun
|
e962604f1c
Expand some #endif comments.
|
18 years ago |
Michael Niedermayer
|
165c5f0909
fix !CMOV_IS_FAST case (iam not really happy with the fix but i didnt come up with a better one quickly)
|
18 years ago |
Michael Niedermayer
|
1d7c111856
10l
|
18 years ago |
Michael Niedermayer
|
faff3a7ad0
this code will not work with PIC as it needs 7 registers and gcc doesnt support that in PIC
|
18 years ago |
Michael Niedermayer
|
f24a515931
shift CABACContext.range right, this reduces the number of shifts needed in get_cabac() and is slightly faster on P3 (and should be much faster on P4 as the P4 except the more recent variants lacks an integer shifter and so shifts have ~10 times longer latency then simple operations like adds)
|
18 years ago |
Michael Niedermayer
|
68a205edef
dehack *ps_state indexing in the branchless decoder
|
18 years ago |
Michael Niedermayer
|
12ff5b0f3b
add "memory" to the clobber list we change memory so we need it, this also fixes some problems with gcc svn
|
18 years ago |
Michael Niedermayer
|
851ded8918
prevent "mb level" get_cabac() calls from being inlined (3% faster decode_mb_cabac() on P3)
|
18 years ago |
Guillaume Poirier
|
a0490b324a
adds some useful comments after some of the #else, #elseif,
|
18 years ago |
Diego Biurrun
|
c26abfa541
Rename ABS macro to FFABS.
|
18 years ago |
Michael Niedermayer
|
1f4d5e9f69
slightly faster on P3 slightly slower on athlon and probably faster on P4
|
18 years ago |
Michael Niedermayer
|
2b5269b51c
moving lps state transition code a little up in the branched asm code (1% faster on P3)
|
18 years ago |
Michael Niedermayer
|
b99f3cabed
write cabac low and range variables as early as possible to prevent stalls from reading them before they where written, the P4 is said to disslike that alot, on P3 its 2% faster (START/STOP_TIMER over decode_residual)
|
18 years ago |
Michael Niedermayer
|
d17faef011
use ecx instead of cl (no speed change on P3 but might avoid partial register stalls on some cpus)
|
18 years ago |
Michael Niedermayer
|
d61c4e731e
make state transition tables global as they are constant and the code is slightly faster that way
|
18 years ago |
Michael Niedermayer
|
5f3eca121e
10l
|
18 years ago |
Michael Niedermayer
|
0fa352c7e6
make lps_range a global table its constant anyway (saves 1 addition for accessing it)
|
18 years ago |
Michael Niedermayer
|
3650b43959
enable CMOV_IS_FAST as its faster or equal speed on every cpu (duron, athlon, PM, P3) from which ive seen benchmarks, it might be slower on P4 but noone has posted benchmarks ...
|
18 years ago |
Diego Biurrun
|
0bc2e7f081
BRANCHLESS_CABAD --> BRANCHLESS_CABAC_DECODER
|
18 years ago |
Michael Niedermayer
|
9ed92c65f1
moving another bit&1 out, this is as fast as with it in there, but it makes more sense with it outside of the loop
|
18 years ago |
Michael Niedermayer
|
f1b37db48d
move the &1 out of the asm so gcc can optimize it away in inlined cases (yes this is slightly faster)
|
18 years ago |
Michael Niedermayer
|
ab0151d163
replace a few and/sub/... by cmov
|
18 years ago |
Michael Niedermayer
|
a6672acf45
reading 8bit mem into a 8bit register needs 2 uops on P4, 8bit->32bit with zero extension needs just 1
|
18 years ago |
Michael Niedermayer
|
2d3df05ca0
on the P4 inc needs twice as much time a add
|
18 years ago |
Michael Niedermayer
|
2ee9dc65be
10l
|
18 years ago |
Michael Niedermayer
|
7822e1c1ff
reverse remainder of the failed attempt to optimize *state=c->mps_state[s]
|
18 years ago |
Michael Niedermayer
|
ef0090a998
x86 branchless cabac decoder
|
18 years ago |
Michael Niedermayer
|
2e1aee80f4
optimize branchless C CABAC decoder
|
18 years ago |
Michael Niedermayer
|
1c2a417f6a
move outcommented START/STOP_TIMER to a hopefully better place for benchmarking ...
|
18 years ago |
Michael Niedermayer
|
30dc5f56ad
drop failed attempt to optimize *state= c->mps_state[s];
|
18 years ago |
Michael Niedermayer
|
c56d23dacf
10l bugfix for some disabled code
|
18 years ago |
Michael Niedermayer
|
f7d0b68361
first try of a handwritten get_cabac() for x86, this is 10-20% faster on P3 depening on if you try to subtract the START/STOP_TIMER overhead
|
18 years ago |
Michael Niedermayer
|
5bbe2a5292
remove bytestream_end checks, seems to work fine without them and the bitstream reader doesnt check for the end either
|
18 years ago |
Michael Niedermayer
|
c010d69a75
decrease ff_h264_norm_shift[] size
|
18 years ago |
Michael Niedermayer
|
6ff042699f
cleanup
|
18 years ago |
Michael Niedermayer
|
260ceb6322
branchless renormalization (1% faster get_cabac) old branchless renormalization wasnt faster because gcc was scared of the shift variable (missusing bit variable now)
|
18 years ago |
Michael Niedermayer
|
99ce10873d
5% faster get_cabac()
|
18 years ago |
Michael Niedermayer
|
400d0f8e47
disable benchmarking code
|
18 years ago |
Michael Niedermayer
|
4310580db5
renorm_cabac_decoder_once START/STOP_TIMER scores for athlon
|
18 years ago |
Michael Niedermayer
|
5659b509c7
refill cabac variables in 16bit steps, 3% faster get_cabac()
|
18 years ago |
Diego Biurrun
|
b78e7197a8
Change license headers to say 'FFmpeg' instead of 'this program/this library'
|
18 years ago |
Michael Niedermayer
|
2ae7569dc8
() 10l
|
18 years ago |
Michael Niedermayer
|
ec8f483ab5
several x86 renorm_cabac_decoder_once optimizations
|
18 years ago |