It's a little confusing. Nextion makes "HMI displays". It's an integrated module that runs its own software, draws the UI, processes events, etc. It's a black box that just reports back to the processor "button 3 on page 1 has been pressed". You design the interface with that ugly Windows app and upload it to the display, but there is no direct access to the screen.
To make use of the Nextion display, you need something connected to it, and that's where the ESP32 comes in. It receives those "button 3 pressed" events and handles them, but crucially, it does not have raw access to the screen, so you can't just draw your own widgets on it like you'd be able to do on an ordinary display.
There are other projects to build your own controller with a touch screen and a microcontroller; the appeal of the NSPanel is that it's basically an ESP32 and a Nextion display conveniently prebuilt, has decent hardware and aesthetics, and it isn't hard to reflash it with ESPHome. Replacing the Sonoff firmware on the ESP32 doesn't change the limitations of the Nextion display.
This post is basically what the Lemmy community has become, in a nutshell. I thought there would be a mass exodus from Reddit but it seems like the only people who came here and stayed are far out on the fringe. Between this kind of stuff and “I refuse to own a car because the infotainment system is not open source!”, I find myself more and more gravitating back to Reddit for some normality.. which is a hell of a thing to say.