I would define the core capability as “takes audio, extracts meaning, matches to intent, executes intent”. Everything else is an implementation of a specific intent/action. Some of them are likely fragile and depend on integration APIs that may be changing or going away.
Ultimately, yes, it is corporate-speak to sugarcoat what looks like a net negative for users at the current time with nebulous claims of a better future.
This is excellent and really helps show how it was going engines first at the end.