Efficient models run on device or small edge clusters; cloud inference demand plateaus; AI application layer explodes; software captures value.