So for such playback you must sent only one rectangle = whole video frame (and then you are limited by bandwidth).
Also the overhead of starting and stopping the transfer may be bigger than sending one bigger rectangle.
And BTW, OMAP is system on chip, it is not clear every part (DSP, MPU, IVA, 3d accelerator - each being separate CPU with own caches, private SRAM, even private MMU units ...) can directly access any other part.
Yes and such step causes delay and you must stop drawing until frame is transferred.
Anyway, as for complexity, feel free to study omapfb, rfbi, dispc and blizzard drivers (and lcd_mipid.c but that one does not add any complexity) in linux sources (in drivers/video/omap/), each handling different part of hardware puzzle.
I don't understand it completely but seeing code of those drivers in kernel is good (or bad) enough for me to feel pity for anyone who must touch that code and may be tasked to throw 3d acceleration to the mix :-)