I guess you tried to start copying earlier to narrow the gap? It just needs to write to framebuffer data which is already transferred and do not outrun the DMA in progress. Maybe some timestamp when DMA transfer started could help with timing this?
But this would need modified kernel anyway (unless the timestamp can be figured out somehow) so we could also preallocate second framebuffer in such kernel too (but this eats memory). Is the timing worth the effort?