I wasn't aware there was any "processing" being done on the PCH. At least in a way that was somewhat transparent to an OS.
Yes in the way microcontrollers do. So even Quark might have been a big step up. Seems to be a natural progression. Everything starts needing dedicated cores for it.
Crestmont is possible. I will be more optimistic if it is the case. Idling for streaming video won’t be so much of a big deal, but interactive tasks they’d want to be careful short of compromising UX, and Crestmont would be fine for that. Indeed, the migration policy and heuristics for QoS is where Microsoft also comes into play. I wonder what the timescales are for powering the compute tile on/off with Foveros and migrating a thread. Suppose it depends how this is implemented.
Note that current power management is very sophisticated.
-Speedstep, which is P state and dynamically switches frequency and voltage depending on load. If the task is bursty enough, it may not even reach peak frequency.
-Turbo Boost 2.0(Sandy Bridge): Puts clocks above base to take advantage of the lag between thermal heating and power use. Governed by load, power, and temperature.
-C-State, which is a sleep specific state. C1 to C10 are available. C8 and below hasn't been implemented properly until just less than 5 years ago due to the complexity of implementing it and the blistering rate computers are introduced. C10 was first introduced with Haswell in 2013!
-S0iX, brought by Haswell which brings S0 level responsiveness with S3 levels of power. Which is a requirement to go below C7.
-cTDP, which allows the system to adjust TDP on the fly, depending on the usage(tablet vs clamshell, vs gaming, etc), and now even based on load.
-Speedshift(Or the more technical term EARTH), which moves the control to the CPU, and power use is further determined by power use at the platform level, where previously it was just the CPU. Brought by Skylake.
It's said that Speedshift can mostly replace the roles taken by Speedstep, not just because of that but because of other features introduced such as more C states and faster transitions along with dedicated PMU chip inside the core.
If you watch how the transitions work, sleep state transitions are significant. The deepest C states takes seconds to even over a minute! Remember the CPU is trying to get the ENTIRE system to sleep, which includes SSD, touchpad, mouse, chipset, IO ports like USB and PCI Express, the webcams, fingerprint readers, etc. One feature misbehaving out of a single device can cause all power management features to be out of whack.
Ever since I was 14 years old when I started really reading into computers and they were talking about battery life increases every generation, it didn't pan out as claimed. Then when I started reading about Haswell and the real problem, and the effort Intel was putting into the ecosystem, I knew we had something special coming. It had to be an ecosystem effort anyways. The Haswell jump was equal to more than a decade worth of previous efforts.
So I do not worry too much about it making it more complicated if in practice it can work like Speedshift replacing Speedstep for the most part, or C3/4/5 being skipped and now goes from C2 to C6. Maybe it takes extra 50ms, but if it allows it to reach C7 and deeper in that time period it'll be a huge win vs today, because while it's theoretically possible to reach that, it takes the average user closing tasks and applications beyond what they have a clue about.
Further favoring it if it makes implementation easier for the laptop manufacturers, which is the reason C10 took so long to adopt. Getting all the hardware and the firmware to work together was a monumental task and that's why even big vendors like Lenovo didn't write C10 capable firmware even way after Haswell was introduced. Haswell still was a huge gain, but it took Broadwell to get the 2x originally planned. I think Icelake was the first mass C10 adoption.