My dad's been bitching about ASCI Q for over 10 years. The article is oddly present-tense; experiences with ASCI-Q and whatever the livermore machine were basically pushed the labs out of supercomputers and into distributed computing. Not mentioned in the article is LANL's solution - kill any processor that errors out. Throughout its history, ASCI-Q usually ran with about 20% of its chips offline. Effectively, they turned their giant box into a network. It has also been suggested that LANL had to be more pragmatic than Livermore because, at 7200 feet, it had a lot more cosmic rays to deal with.
fail-fast is great but depends on detecting failures in a timely manner. :) Some errors will affect the output (e.g. result is off) but the process itself hasn't failed. Re. distributed computing. It's a bit fuzzy. Your mutli-core laptop is in practical fact a "distributed system" but comes in a nice monolithic envelope.