-
- Downloads
x86: synth filter float: implement SSE2 version
Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by:
Janne Grunau <janne-libav@jannau.net>
Showing
- libavcodec/synth_filter.c 1 addition, 0 deletionslibavcodec/synth_filter.c
- libavcodec/synth_filter.h 1 addition, 0 deletionslibavcodec/synth_filter.h
- libavcodec/x86/dcadsp.asm 152 additions, 0 deletionslibavcodec/x86/dcadsp.asm
- libavcodec/x86/dcadsp_init.c 28 additions, 0 deletionslibavcodec/x86/dcadsp_init.c
Please register or sign in to comment