As you say, it depends on the paradigm you are going for. Personally, I see the episode as an "object" and so it makes sense to have a stacked window being part of the back/forward navigation and showing more information about the object (which could include file size, duration etc. within the shownotes scrollable area) with the actions available below.
Another idea I had which I mentioned in the https://bugs.maemo.org/11501 report is that once an episode is represented as a window rather than a dialogue, there is the possibility of adding extra functions. For example, a swipe left/right (and/or hardware keyboard cursor left/right) within the window could move to the previous/next episode in the list, which could be useful for flicking through eps to find a particular one. This isn't possible with the dialogue implementation.
The stacked window approach is the one that I chose for my own application, cuteTube. I don't believe that there are any performance implications as a result.