![]() |
EMULib Source with Maemo Support
Hello, All!
I have just released the updated source code for the EMULib, a library of emulation and service routines including image processing and audio synthesis. The new version includes Maemo support, with joystick emulation, direct frame buffer access, and assembler-optimized scaling routines. You can get EMULib sources from http://fms.komkon.org/EMUL8/ To see how EMULib can be used, check out recently updated ColEm source code: http://fms.komkon.org/ColEm |
Re: EMULib Source with Maemo Support
Some comments:
1. ioctl(FBFD,OMAPFB_VSYNC); is useless and does nothing (and if it actually waited for VSYNC, that would be bad for performance) 2. You don't need to use OMAPFB_FORMAT_FLAG_FORCE_VSYNC flag (you may actually screw up tearing synchronization using it), just OMAPFB_FORMAT_FLAG_TEARSYNC is enough 3. And of course the license choice is bad ;) |
Re: EMULib Source with Maemo Support
Quote:
|
Re: EMULib Source with Maemo Support
Quote:
By the way, your assembly code is not good for ARM11. For example LibARM.s contains lots of chunks of code like this: Code:
mov r14,r5,lsr #16 Code:
mov r14,r5,lsr #16 Just in order to make life easier and ensure that you managed to schedule instructions properly without missing anything, it is possible to use oprofile and collect CYCLES_DATA_STALL events. Because of the pipeline properties, they do not point exactly to the poorly scheduled instruction but are reported with some delay. So if you are looking at 'opannotate' output and see some spikes of CYCLES_DATA_STALL samples, the offending code is usually a few lines above. Checking ARM11 TRM helps to understand why exactly you got this pipeline stall. Also optimizations for improving memory access performance are important. ARM processors usually don't allocate cache line on write miss, but uses write buffer to store data to memory. This implies that a special care needs to be taken about writes to memory as they may become a bottleneck. For OMAP1710 (Nokia 770) and OMAP2420 (Nokia N800/810) it happens that 16 byte aligned stores of exactly 4 registers with STM instruction are able to make use of burst transfers and performance is much better (roughly twice). So for example, in spite of being somewhat counterintuitive, instead of Code:
LDM {set of 8 registers} Code:
LDM {set of 8 registers} I'm not sure if the same burst write optimization is useful for other ARM processors though because it may be too platform/microarchitecture specific. |
All times are GMT. The time now is 23:42. |
vBulletin® Version 3.8.8