Right around the same time that Windows RTM was posted to TechNet and MSDN (August 6), reports began to surface that Windows 7 was suffering a bluescreen when users would run the CHKDSK utility with the /r switch selected (which locates bad sectors and recovers readable data from them). See this 8/5 Gizmodo item as a good example of the kind of reportage that swirled around the Web in the wake of initial reports on this problem: Windows 7 Has an Obscure OS-Crashing Memory Bug (or visit Bing or Google using “Windows 7 chkdsk /r crash” as the search string). As you might imagine, furor and hysteria soon followed, and subsequent reports from MS that confirmed high memory as a deliberate design choice to limit ongoing concurrent use of the affected PC (who wants to use a machine with a bum disk until it’s repaired anyway?) and indicated that crash conditions were not reproducible did little to quell the uproar.
Today, just over three weeks later, the hubbub has died down. That’s why I found MS VP Steven Sinofsky’s detailed posting (the blog is signed only “Steven” so I assume this means him, though I’m happy to be corrected if I’m wrong about this) on the Engineering Windows 7 blog to discuss this item fascinating, even though it appeared on 8/10 and I just stumbled across it just this morning. It’s entitled “What do we do with a bug report?” and makes for some compelling reading. He not only digs into the particulars of this particular Windows 7 problem, he also reflects on how MS deals with bug reports, especially when they involve mention of bluescreens or other system crashes.
What I found most interesting in this posting were the following revelations:
- MS has access to a lot of data about crashes and errors, thanks to the many users who turn on error reporting during the Windows installation process and elect to share that information with Microsoft. His discussions of telemetry and the kind of information it can provide are terrific, and help to explain why you might agree to those information sharing and improved experience requests, the next time you install an MS product.
- MS uses some interesting test approaches to try to reproduce and understand crash situations. Sinofsky talks about how they use configuration data to set up test environments, and how they reach out into the tens of thousands of in-house users in their own community to see more data than might otherwise be at hand. We’re talking hundreds upon hundreds of test runs going in a pretty short period of time.
- Though I’m sure some of his rhetoric and discussion is easy to interpret, and possibly even to dismiss, as “executive damage control,” I gotta say I was impressed by how clearly and concisely he described the overall situation, and put it into the more general context of bug reporting and resolution processes at Microsoft. I’ve already noticed that my own error reports from Windows 7 tend to produce more updates and eventual solutions from the Action Center, than they ever did from the error reporting facility in Windows Vista. Now, I’m starting to understand why this might be the case.
Check out this blog. I’m sure you’ll find it both interesting and informative.