You need at least 12 or 15 frames per second to look anywhere close to realistic, historically video rates 24,25,30 frames per second was the standard (now days there's ridiculous frame rates for gamers with expensive GPUs and super-high refresher-rate curved ultra-wide screens, lol,).
Like Tom said, create a queue of messages 'Send tMessage in X milliseconds' is a great way to do any sort of sequence with fairly precise timing, although it can be tricky if you need to do a lot of operations in between messages, best to subrtract the processing time of the script operations from the send in time time for the next message (if that makes sense) if you need events to arrive at steady intervals (like for a drum beat for example).
But a convincing pseudo-3D flip effect can be done as just image distortion, like that which can be done with the free-distort tools in 2D image editors such as Photoshop (if you're into something else like Flexo printing on stretchy curved plastics you might want other distortions like arched curves, reversed-fish-eye lens distortion, etc.)..
Sometimes you don't need a 3D engine with GPU acceleration. Isometric psuedo 2.5D can be cool too. Those 1980/90s computers did some 3D(-ish) stuff without GPU and some looked really good (OK maybe terrible by
today's standards).
It would be nice to have a library collecting ideas for easy and/or interesting visual effects or visual cue methods (like animated focus borders). Some of the engines transition effects disappeared along with QuickTime. Our XT script does have the ability to manipulate pixels directly (I suppose it you use NetBPM ascii text format
https://netpbm.sourceforge.net/ all xTalks can too).