Tuesday, October 13, 2009

Cloud Computing Snafu Deletes Microsoft Sidekick T-Mobile Data

The following message thread occurred in the old CSix Cloud Computing Yahoo Group starting 10-13-9.  The text was transferred in case anyone wants to continue the discussion here.



Cloud computing snafu deletes Microsoft Sidekick T-Mobile data

Posted by: "jmcd0205"   jmcd0205

Tue Oct 13, 2009 8:42 am (PDT)

FYI, Blog posting from sfgate.com by Yobie Benjamin. Companies need to remember some basics, like backing up data, maybe 2-3x...?

Eight hundred thousand T-Mobile subscribers who use Microsoft's Danger Sidekick smart phones suffered the worst possible failure that can occur to anyone's personal data. All the customers' data - address books, calendars, to-do lists and photos was wiped out... kaput... gone... destroyed... forever and ever.

This computing nightmare affects not only Sidekick owners but everyone who owns a smart phone who now have to question the integrity of their own devices.

T-Mobile, the operator of the Sidekick's data service and Microsoft fumble to explain how this massive clusterfoot happened.

T-Mobile's web site explains:

Regrettably, based on Microsoft/Danger' s latest recovery assessment of their systems, we must now inform you that personal information stored on your device -- such as contacts, calendar entries, to-do lists or photos -- that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low... Sidekick users "NOT reset their device by removing the battery or letting their battery drain completely, as any personal content that currently resides on your device will be lost."

The implication of the statement is Microsoft servers suffered a catastrophic failure and that there was no backup. How the f&^% is it possible there was no single backup on an enterprise so critical to the personal lives of tens of thousands of T-Mobile/Microsoft Sidekick users?

This isn't the first time a Web service has crashed and left its users without access to data stored "in the cloud." Google's Gmail has had multiple outages but it has very quickly recovered with no data loss.

This is the FIRST TIME a MAJOR cloud-computing vendor didn't have any backups. It is a total failure of systems from Microsoft's server operating systems, storage systems, processes, procedures and everything that shoulda, woulda, coulda happened.

Folks, this is Microsoft we're talking about here. It's the same company who wants us to upgrade to Windows 7. It's the company who wants to be your cloud computing company of choice. It's the same company who sells server operating systems.

The blame is also being shared with Hitachi Data Systems who provided the failed backup systems. It is being reported Microsoft contracted Hitachi Data Systems (HDS) to do remedial work on the server infrastructure and that, during the work, the server infrastructure failed. There was no backups or replicated data set and so the data was lost forever.

Further this involves a major telecommunications company - T-Mobile, not some small rural mobile phone provider. We need to understand so consumers know what to avoid in the future.

In every company I have ever worked in or consulted for, backing up is part of Information Technology 101. I always advocate not only one backup but sometimes double and triple backups and minimum 30-day archives. How can Microsoft, Hitachi and T-Mobile allow this massive failure?

Microsoft, Hitachi and T-Mobile must come clean and explain where and how the failures occurred lest they suffer the consequence of loosing the public and corporate enterprises' trust. For now till they explain their names are all in the mud.

This is a complete failure. It would be a tough day to sell Microsoft server operating systems software today and even harder to sell Hitachi Data's backup systems... unless Microsoft, Hitachi and T-Mobile comes clean and explains how this massive failure can happen. If I were a company chief information officer or chief information security officer, I would have to do a complete double-take before I commit to a server operating system or backup solution that can suffer such catastrophic failure.

The worst part of this (having owned a Sidekick) is there is NO EASY WAY to backup your Sidekick. It's supposed to do it for you. This absolutely sucks for T-Mobile subscribers who use the Microsoft Sidekick dumb phone.

It's small comfort that T-Mobile suspended the further sales of Microsoft's Sidekick smart phone.




Re: Cloud computing snafu deletes Microsoft Sidekick T-Mobile data

Posted by: "Bob Sutterfield"  bsut2002

Sat Oct 17, 2009 11:16 am (PDT)

Here's how I understand the situation, from other sources:

Microsoft bought Danger some time ago. This acquisition included their
products, staff, physical assets (like servers and software), and their
revenue contracts (like that with T-Mobile). Microsoft is partway through
ingesting their acquisition, and came to the point of technical
integration. The Danger services were singly homed, without redundancy,
which is why they wanted to move onto Microsoft's cloud. They were
re-installing the OS, and preparing to later install the cloud management
software and Danger-specific service software, onto the Danger servers.
This task was being performed by a Hitachi contractor working for
Microsoft.

The project plan assumed (it turns out mistakenly) that the servers and
services and databases were already configured in a manner compliant with
the Microsoft cloud infrastructure spec, so the technician was simply
re-installing everything. If that assumption had been accurate, the
services and databases would have been in service redundantly someplace
else, and removing the Danger infrastructure from service would have
triggered a fail-over to those other instances. (This is why I always
design services with tripled infrastructure: one I can put under
maintenance, one in service, and another backup in case the primary fails
while I'm doing maintenance. Plus I never use cold backups, I always use
hot-hot load balancing between multiple primary active instances.)

I don't know why the integration project plan didn't include an assessment
of the existing infrastructure and dependencies. I don't know why the plan
didn't include the task of backing up the Danger user databases before
FDISKing the servers. I hope the technician didn't lose her job - the
outage was the project manager's fault.


Re: Cloud computing snafu deletes Microsoft Sidekick T-Mobile data

Posted by: "Bob Sutterfield"  bsut2002

Sun Oct 18, 2009 4:45 pm (PDT)

I forgot to mention: This is getting publicity as a Cloud Computing
Failure. It's SAAS (in the sense that we can now label anything that's
hosted "someplace else" as SAAS to replace the old-fashioned ASP label) and
therefore a Cloud application, but it's not running on a Cloud
infrastructure (elastic pools of virtualized resources). They intended to
move it from a traditional hosting infrastructure onto a Cloud, and that
move plan is where the mistakes happened.