I've forward-ported linear gradient algo from maemo stock pixman, which is using integers (instead of floats used in upstream). The reason is that gradient in upstream was 2x-3x times slower than on stock, albeit more smooth. Could you do some profiling to see which is the function slowing down the benchmarks? It is possible there are some performance tweaks/patches in mozilla pixman which can be used.
i find the before/after and left/right a little confusing...