AI compute performance has seen exponential growth in the past decade, with 2~3x improvements every year, outpaced Moore's Law scaling. This has enabled the breakthrough in large language models (LLM) with trillions of parameters and tens of thousands of AI GPUs running in parallel in one compute cluster. In return, the explosive growth of applications enabled by LLM also created significant demand in computing power, and has facilitated the transition from traditional data center to AI factories populated by AI GPUs. GPU AI compute technology has also evolved rapidly to enable this improvement. The full stack improvements in GPU computing in AI factory will enable new applications and require further revolutions in computing in AI factory. IC technologies also need to be co-optimized to enable the era of rapid AI compute revolution.