View Single Post
Posts: 3,074 | Thanked: 12,964 times | Joined on Mar 2010 @ Sofia,Bulgaria
#41
Originally Posted by gidzzz View Post
I used the flags that you advised, no change. However...

I tried with GCC 4.6.2 and it worked flawlessly. What are the results? Size of the executable decreased from 2814 KiB to 2232 KiB, but there was no significant FPS gain in my test cases. At 850 MHz it looks approximately like this:

[Scene]: [GCC 4.2.1] -> [GCC 4.6.2 Thumb]
Tutorial: 30.5 FPS -> 31 FPS
Medium battle: 5.5-7.5 FPS -> 6.5-8 FPS
Large battle: 3.2 FPS -> 3.5 FPS


I have begun porting the game to unwrapped GLES, to see if it works any faster, but it did not bring any great improvements in terms of FPS so far (but it's still far from complete). Nevertheless, it wasn't a waste of time, as I fixed two graphical bugs on the way. I'm especially happy that ion beams don't look so lame anymore.

I updated the first post so that it links to the new version. The changes are:
  • Prettier ion cannons
  • Fix for engine trails sometimes not appearing
  • Framerate is written to the terminal
And there's the Thumb executable too!
Could you share (or point me to) the source code, I want to look at the build scripts. Also 4.7.2 linaro is way better than 4.6.1 in optimizing ARM code, I suspect your results are a combination of wrong compiler arch flags and 4.6.1 gcc

EDIT:
It is definitely your compiler flags are wrong for some reason, I tested a bit, and there is no need to call the kernel for atomic 64 bit operations when gcc compiles for armv7-a:

Code:
echo "void f(){volatile long long a=123;__sync_val_compare_and_swap(&a,3,4);}" | gcc -O2 -mthumb -mfloat-abi=softfp -x c -Wall -dA -S - -o -
results in:

Code:
        .syntax unified
        .arch armv7-a
        .eabi_attribute 27, 3   @ Tag_ABI_HardFP_use
        .fpu neon
        .eabi_attribute 20, 1   @ Tag_ABI_FP_denormal
        .eabi_attribute 21, 1   @ Tag_ABI_FP_exceptions
        .eabi_attribute 23, 3   @ Tag_ABI_FP_number_model
        .eabi_attribute 24, 1   @ Tag_ABI_align8_needed
        .eabi_attribute 25, 1   @ Tag_ABI_align8_preserved
        .eabi_attribute 26, 2   @ Tag_ABI_enum_size
        .eabi_attribute 30, 2   @ Tag_ABI_optimization_goals
        .eabi_attribute 34, 1   @ Tag_CPU_unaligned_access
        .eabi_attribute 18, 4   @ Tag_ABI_PCS_wchar_t
        .file   ""
        .text
        .align  2
        .global f
        .thumb
        .thumb_func
        .type   f, %function
f:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
@ BLOCK 2 freq:10000 seq:0
@ PRED: ENTRY [100.0%]  (fallthru)
        push    {r4, r5}
        fldd    d16, .L5        @ int
        sub     sp, sp, #8
        add     r4, sp, #8
        fstmdbd r4!, {d16}      @ int
        dmb     sy
        movs    r2, #4
@ SUCC: 3 [100.0%]  (fallthru,can_fallthru)
        movs    r3, #0
@ BLOCK 3 freq:10000 seq:1
@ PRED: 2 [100.0%]  (fallthru,can_fallthru) 4 [1.0%]  (dfs_back,can_fallthru)
.L2:
        ldrexd  r0, r1, [r4]
        cmp     r1, #0
        it eq
        cmpeq   r0, #3
@ SUCC: 5 [1.0%]  (can_fallthru,loop_exit) 4 [99.0%]  (fallthru,can_fallthru)
        bne     .L3
@ BLOCK 4 freq:9901 seq:2
@ PRED: 3 [99.0%]  (fallthru,can_fallthru)
        strexd  r5, r2, r3, [r4]
        cmp     r5, #0
@ SUCC: 3 [1.0%]  (dfs_back,can_fallthru) 5 [99.0%]  (fallthru,can_fallthru,loop_exit)
        bne     .L2
@ BLOCK 5 freq:9902 seq:3
@ PRED: 3 [1.0%]  (can_fallthru,loop_exit) 4 [99.0%]  (fallthru,can_fallthru,loop_exit)
.L3:
        dmb     sy
@ SUCC: EXIT [100.0%]
        add     sp, sp, #8
        pop     {r4, r5}
        bx      lr
.L6:
        .align  3
.L5:
        .word   123
        .word   0
        .size   f, .-f
        .ident  "GCC: (Linaro GCC 4.7-2012.07) 4.7.2 20120701 (prerelease)"
        .section        .note.GNU-stack,"",%progbits
One can clearly see ldrexd/strexd instructions emitted and no libgcc wrappers

I did readelf -a on your thumb binary and it looks ok, excluding that used FPU is vfp3 instead of neon:

Code:
Attribute Section: aeabi
File Attributes
  Tag_CPU_name: "CORTEX-A8"
  Tag_CPU_arch: v7
  Tag_CPU_arch_profile: Application
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_FP_arch: VFPv3
  Tag_ABI_PCS_wchar_t: 4
  Tag_ABI_FP_denormal: Needed
  Tag_ABI_FP_exceptions: Needed
  Tag_ABI_FP_number_model: IEEE 754
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_enum_size: int
  Tag_ABI_HardFP_use: SP and DP
  Tag_Virtualization_use: TrustZone
Now, seems like there is a bug in some of the low-level routines (source-code/Makefie wise) which does not detect/use the correct arch when compiled with gcc 4.7.2
__________________
Never fear. I is here.

720p video support on N900,SmartReflex on N900,Keyboard and mouse support on N900
Nothing is impossible - Stable thumb2 on n900

Community SSU developer
kernel-power developer and maintainer


Last edited by freemangordon; 2012-12-02 at 17:19.
 

The Following 4 Users Say Thank You to freemangordon For This Useful Post: