What's out there currently is whatever is on midi.org - which I believe has what you want, the Clip File specification, which is accompanied by a not-yet-ready Container File specification(est release in 2024).
That said, even implementing MIDI 1 for sequencing is a huge step in terms of possibilities; the cost of a dynamic soundtrack of the iMUSE/Monkey Island 2 sort, where you can make the entire soundtrack transition smoothly between pre-written clips, is mostly borne in the asset creation process. Either you have to program generative sequences, or you're looking at a composer spending hundreds of hours making transition cues.
What MIDI 2 adds for this task is mostly on the end of recording expression in higher resolution, and that means you are making a higher-fidelity sequencer asset, and programming higher-fidelity synth patches. So the asset cost may go up even further to actually make use of that stuff.
If I were exploring that space again, which I've done in the past, I would aim for one of:
- Sequenced sample playback with a focus on mixing and arranging longer samples and multisampled instruments generatively - a mostly-standard approach
- Discarding the MIDI interface and writing to my own synth engine directly, using a format like ABC to compose.
- Applying generative AI to create a "part player" that embellishes a MIDI sequence(this has some demonstrations but isn't doable in real-time...yet)