I gave my game engine the capability of handling software synthesizers, and those software synthesizers also can receive MIDI 2.0 commands. While my engine does already have a MIDI 1.0 sequencer, it needs some workarounds.
Also if I could spec my own format, I would add support for adaptive soundtracks, which was a major driving force behind the decision of writing software synths instead of going with the more conventional method of playing back audio.
What’s out there currently is whatever is on midi.org - which I believe has what you want, the Clip File specification, which is accompanied by a not-yet-ready Container File specification(est release in 2024).
That said, even implementing MIDI 1 for sequencing is a huge step in terms of possibilities; the cost of a dynamic soundtrack of the iMUSE/Monkey Island 2 sort, where you can make the entire soundtrack transition smoothly between pre-written clips, is mostly borne in the asset creation process. Either you have to program generative sequences, or you’re looking at a composer spending hundreds of hours making transition cues.
What MIDI 2 adds for this task is mostly on the end of recording expression in higher resolution, and that means you are making a higher-fidelity sequencer asset, and programming higher-fidelity synth patches. So the asset cost may go up even further to actually make use of that stuff.
If I were exploring that space again, which I’ve done in the past, I would aim for one of: