Read time 5 mins
Last week several organizations across the world were impacted by an Office 365 outage. The Exchange Online service was not fully available for several hours. Some couldn’t access their mailboxes and for some, the mail delivery performance (sending/receiving) was just poor.
Whoomp! There it is…
The consequences are obvious. Loss of productivity, bad end-user experience, amplified end-user frustration, loss of business speed and loss of trust. And that’s just naming a few of the many possible business-critical impacts.
Interestingly enough, the case under which this incident was logged (EX172491) has been removed by Microsoft in the meantime.
We all know that there is a certain risk whenever cloud services are used. One key question always remains though – How well are you prepared if something like this happens to your organization? Or in other words – What is your backup plan?
Indeed, a fundamental question for many end users, administrators and businesses who rely on stable, high-performance cloud offerings on a daily basis.
Take a Walk on the Safe Side
Monitoring your Office 365 installation is a critical first step in getting the information you need on your enterprise applications in real time. You can’t effectively manage a vitally important part of your application infrastructure unless you know how it’s performing. Early insights on availability will help you prepare for outages.
Knowing who is impacted is an important element for steering the issue (e.g. notifying your end-users). Whether only a group of people, a subset of users (in case Multi-Geo capabilities of Office 365 are used) or the entire organization using the cloud tenant.
With OfficeExpert we offer a solution that helps you to identify the magnitude of the possible impact.
Furthermore, by using the Mail Flow Simulation Sensor by OfficeExpert, organizations could have seen that the system was somehow restored (accessing the mailbox worked again). They could have also seen that the underlying service of sending/receiving mails was still impaired by the incident though. The following screenshot shows that there was a steady increase in the mail delivery time between January 23rd and 26th.
Ensure Solid Business Continuity for Your End-Users
This transparency helps you know that a particular service is not fully restored. It also helps you understand how you can plan and communicate accordingly. At the end of the day, this naturally benefits the end user too.
Monitoring notifications ensures that you are the first to find out that an issue exists. Even before Microsoft tweets about it hours later. Knowing which services are affected allows you to work proactively by notifying your users and apply contingency plans before being inundated with user tickets.
Try the OfficeExpert sandbox and find out all the advantages you get with this product. So you too can access your key to business continuity.
UPDATE: Further Outage on January 29th!
Another major outage happened on January 29th, 2019 where users were unable to authenticate and access Office 365 services. Azure was affected by this incident also. The root cause which was communicated by Microsoft was a DNS issue with CenturyLink as an internal DNS provider.
The following screenshot shows how OfficeExpert has seen and measured this outage. The Skype for Business Service had a downtime of almost 3 hours. Other services such as Exchange Online were impacted for around 1 hour. The failure indicator (error message in the screenshot) states that a certain full qualified domain name could not be resolved. This matches exactly with the root cause statement by Microsoft.