View Single Post
Posts: 503 | Thanked: 267 times | Joined on Jul 2006 @ Helsinki
#289
Originally Posted by uris View Post
About SIMD benefits, I understood that new OMAP 2420 has SIMD support?
Some benchmarks done for XVID with and without SSE (SIMD of Intel Pentium family):

http://list.xvid.org/pipermail/xvid-...ry/004815.html

SIMD SSE brings big benefits (200-300%) for XVID decoding. Not familiar how well OMAP SIMD implementation and Intel SSE compares but probably it would make sense to take SIMD into use. This I suppose requires coding some routines in ARM assembler, though.
ARMv5TE (Nokia 770) has instructions for performing fast single clock 16-bit multiplication (and idct for video decoding cointains a lot of such multiplications). Availability of such instructions speeds up video decoding as generic 32-bit multiplication takes more time.

ARMv6 (Nokia N800) has SIMD instructions to treat 32-bit registers as a pair of 16-bit values and perform arithmetic operations on this pair allowing to execute two 16-bit multiplies per clock.

Intel MMX uses 64-bit registers for SIMD, so it can perform operation on four 16-bit values at once. SSE2 has 128-bit registers and can perform operations on eight 16-bit values at once.

So SIMD on ARMv6 can't boost performance as much as on x86, but the improvement should be still quite noticeable

And you are right about assembly optimizations, they are needed and already used (but still can be improved) even for ARMv5TE.