**All systems are back online** Thank you so much for your patience.
There was no downtime in data/analytics collection, nor most other services including Related Posts. Share API was affected, however as was Shareaholic.com. We worked with Amazon through the night on the issue — thank you to them.
The downtime was caused due to a bug in MySQL and Amazon’s hosted database service. A key Database server and its hot backup, both failed due to the bug. The perfect storm. The hot backup is also replicated and hosted across multiple data centers, so even if one whole data center goes down we have redundancy.
To recover, we had to recover from a backup and to the exact time the failure occurred to avoid any data loss, which took a while to get through given our volume.
We have followed all of Amazon’s best practices and we still had downtime. This was literally, almost worst case scenario, other than a nuke hitting Amazon data centers.
We’re working on a plan with Amazon’s team to ensure this doesn’t happen again, including separately, making the Share API resilient to complete database unavailability.
Again, thank you for your patience. We work hard to earn your trust and know how important this is.