Archive for April, 2011

Amazon Failure Caused by Weak Change Management

April 29, 2011 Leave a comment

Amazon just released their summary of last week’s web services outage.  It is an incredibly long explanation ending with an apology.  The bottom line for this outage is contained in the following statement; “The trigger for this event was a network configuration change.”

When I first became an “EDP Auditor” (showing my age) the first project I did was a program change control review.   Change control is the most basic of all IT controls.  So how does a juggernaut like Amazon allow their service to be crippled by a basic network configuration change?  Don’t they have redundancy and failover for critical services?  Nothing is as simple as it seems these days.

The system administrator that attempted to make the change made a mistake:

“During the change, one of the standard steps is to shift traffic off of one of the redundant routers in the primary EBS network to allow the upgrade to happen. The traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant EBS network. ”

So, yes, they had redundancy.  Unfortunately, the redundancy is what led to the failure.  The redundant network couldn’t handle the load and certain devices could not find their redundant pair for data mirroring.  As a result, they consumed all the available local resources which resulted in overloading the management layer.  The failure of the management layer basically began shutting down entire segments and the Application Program Interfaces (API).  The mirroring architecture put everything on hold until the devices could find a new “partner” to mirror with.

In much the same way that the Fukushima catastrophe was the result of multiple failures of controls and redundancy, so was Amazon’s failure.  Sometimes our systems are so complex that we can’t foresee the true impact of a simple error.  Amazon has taken several steps to prevent this type of failure from happening again, but I suspect we will continue to see unprecedented and unforeseen errors in the cloud as our systems continue to grow in complexity.


SAS 70 is Dead

April 28, 2011 1 comment

If you haven’t heard already, SAS 70 is no more.  As of June 15, 2011, the AICPA is laying Statement on Auditing Standards Number 70 to rest permanently.  Why?  The official reason has to do with the convergence of International Accounting Standards.  The unofficial reason also included trying to gain control over what has become a de facto “certification” based on a misused audit standard (SAS 70).

The original intent of SAS 70 was for auditors of an organization’s financial statements could get an understanding about the controls over the services being  provided by a third party service organization.   The audit standard (SAS 70) was only applicable if the transactions processed were significant to the user organization’s financial statements.  In fact, the SAS 70 standard clearly states that it was NOT relevant in situations where “The services provided are limited to executing client organization transactions that are specifically authorized by the client…”

Note that the premise of the audit was for service organizations that process transactions on behalf of a user organization.   So why do so many co-location facilities and data centers that don’t process any transactions on behalf of their customers claim to be “SAS 70 Certified”?  Because the customer is always right.

Do a Google image search for “SAS 70 Certified”.  You will get thousands of hits.  Logos of all shapes and sizes on websites of service providers claiming to be “SAS 70 Certified”.  The truth of the matter is there is no such certification.  All the logos are merely creations of marketers trying to promote the fact that their organization has undergone a SAS 70 audit.

To figure out how this all got out of control you have to go back to 2004 and the beginning of the SOX era.  SOX changed the auditing process significantly.  Auditors suddenly had to actually understand the controls that were in place around financial statements.  Outsourcing of IT, payroll, benefits administration had been going on for a while.  But now, the auditor had to have some way of understanding what really went on in those third party service organizations.  Enter SAS 70.  A little known and less understood audit standard that seemed to be the silver bullet.

The big 4 auditors began asking for a SAS 70 report for all the third party service providers that their clients were using.  It became a checklist item as part of the yearly financial statement audits.  “Is this business process performed by a third party?  If so, request a copy of the SAS 70.”  Nevermind that most of the junior auditors that filled out the checklist had no clue what SAS 70 was or how it should be used.

Since the auditors needed it, it had to be important.  Eventually, the procurement groups at big corporations began requiring a SAS 70 report of any service provider that did business with the company.  Somewhere along the way, the vernacular changed, and the question became “Are you SAS 70 certified?”.   In order to do business with larger clients, the service organizations had to have a SAS 70 audit performed, whether it made sense or not.  Why?  Because the customer is always right.

In my next blog, I will discuss the new AICPA standards for service organization controls (SOC) audits and why they won’t fix the problem.  Stay tuned.

Categories: SOC Audits, Uncategorized Tags: ,