Silent Data Errors are a big problem for the cloud operators and with AI training with large language models creating higher demand for cloud computing the silent data error issue will create greater complications for AI in the cloud. This paper looks at the underlying issues of chip failures not caught at test and the causes of the failures. An alternative approach is suggested to help better understand the causes of chip failures and also to prevent failures in the field.