One of the hardest and most frustrating things about supporting users who are experiencing bad calls, is to get all the right information together. There are so many moving parts that can influence what users experience during a call. From authentication, to the device and peripheral equipment to local network and ISP to the Microsoft cloud itself. All can play a role in the quality of a call. And all that, times the number of participants on the call. As the problem doesn’t have to originate with the participants experiencing it.

We often see how difficult tackling bad calls is and the example below is no exception. So, let us take you through how complex it can be to figure out what is going on and how User Experience monitoring can help isolate the cause.

The situation:

A large Teams Town Hall meeting was planned with almost 500 attendees at one of our customers.
Attendees were joining from different locations and time zones from both home and office locations.
The organizers were all in person at the same office location.

The problem

The meeting started and all was going well until the organizer wanted to unmute an attendee. Bringing them into the discussion on screen. The unmute didn’t seem to work and instead, all of the sudden, audio quality started to break down. For 12-15 minutes, participants were complaining of having trouble hearing or understanding what was going on. Take that against the 500 attendees, and you have more than 100 hours of potential productivity loss! Understandably, this caused a lot of uproar with even C-level attention. And as a consequence, the IT department was tasked to research what happened.

With the aid of OfficeExpert TrueDEM, we will look at this disastrous Teams Town Hall call to see what happened and what could be done to prevent it from happening again.

Why good audio matters

Audio quality is crucial in Microsoft Teams calls and meetings as audio issues tend to have a much bigger impact than others that involve screen sharing or video. Where blurry video, annoying as it is, can be tolerated for a while; poor audio quality can halt communication in its track. After all, if I can’t see you but still hear you, we can communicate. If I can’t hear you or your audio is garbled… communication becomes impossible. Unless perhaps you know sign language. Something most of us don’t.

Figuring out what happened

Back to our analysis. To understand and analyze a Microsoft Teams Call, you need to understand the technology. The way Microsoft Teams transmits audio, video and other data.

In order to transmit a call, Microsoft Teams splits the various inputs (audio, video, screen sharing, applications, etc.) into separate streams. Splitting those again into inbound (what you receive from the cloud) streams and outbound (what you send to the cloud) streams. Depending on what it is, this is then further split into tiny packets (frames) representing a small amount of audio, video, etc. That way, the size of each individual packet being transmitted is minimal. which in turn means that if packets get lost, impact is minimalized.

Microsoft Teams transmits each second of audio as a sequence of 50 frames which each contain up to 20ms of speech content. With OfficeExpert TrueDEM we track the Real Time Protocol packets (frames) in both directions on a 30-second interval. For an active speaking attendee, we therefore expect to see around 1,500 audio packets transmitted for every 30-second interval. This is independent of the video and other packets, being sent separately.

Starting the investigations

As it was a Teams Town Hall meeting and audio went bad, it makes sense to start with looking at the main speaker first. This was after all, the person sending most of the audio that others had trouble hearing.

The following picture shows that around 7 p.m. (start of the meeting), the number of audio packets sent from her device went up to the expected level and stayed there until approximately 7:51:30 p.m. At that point we see a massive drop in packets that are being sent. This explains why others couldn’t hear her; her audio wasn’t getting through to the cloud, and therefore the other participants!

When we look at the raw data for this user, we see the exact point in time when the issue begins and ends. Few audio packets make it through the system.

This continued for nearly 12 minutes. Which coincides with what the other users were stating.

The Network

So now we know exactly when this happened, can we figure out what was causing it?

Let’s first look at networking, as bad network connections often cause bad calls. None of the organizers reported any network problems, but as you know, WIFI connections are notorious for being unstable. It makes sense therefore to start there.
Looking at the data, we can see that the outbound connection speed was stable across time and did not decrease at the moment the problems started.

What is noticeable is that around the time the audio issue occurred, the device received a large amount of incoming data (see first graph below).

This added inbound traffic seems to be mainly due to incoming video streams. Evidenced by the fact that we start seeing video packages coming in (see below).

~2Mbits/sec inbound Video traffic is normal for receiving inbound video streams but it is interesting to see that the 2Mbit/sec exist only for 2-3 minutes and then fell down to almost zero. This could indicate that the user who was sending the video simply closed their camera. Or that video streams were also not passing through the system anymore.

By looking at who was ‘sending’ video, we quickly find that only one other user did, which turns out to be the user who was unmuted. Looking at their data, it seems that they simply closed their camera after two minutes. So the inbound video data peak was as expected and lasted only that short because the other user stopped their camera.

So if it wasn’t the network, was it something happening on the device?

Closing in on the device

The next step is to look closer as to what is happening on the device of the main speaker. The user where we saw that drop in packets being sent.

Looking at the TrueDEM data, we can see that this user was running only Teams and had closed all other programs. CPU & RAM utilization were well within what would be a normal range and showed nothing that would explain 12 minutes of bad audio. In fact, CPU and RAM didn’t show any noticeable peaks around the time the problems occurred.

Could peripheral input & output devices have played a role?

Next we look at what capturing & rendering devices are used. The following table shows that the user was using a peripheral AV production system called AV Bridge MatrixMIX to lead the call. Which is a hardware tool to help lead large meetings like a Teams Town Hall and helps manage audio and video as well as other things.

We see that around the time that the organizer in the Teams Town Hall meeting is trying to unmute the attendee, inbound video rendering starts. We already know this is because the attendee that was unmuted activated their camera. So not necessarily something you wouldn’t expect.

But what is suspect is that this is also the exact moment problems start to arise with the audio.

The debugging

As anyone in this situation likely would, the speaker then tries several things to solve the problem. Including, it seems, stopping the camera and changing the audio system around ~8:02pm. From this moment on, the system uses the internal audio (AMD High Definition Audio Device) instead of the AV Audio bridge and this finally seems to solve the problem, as at that moment audio packets start going out again and the audio problems reside.

So, did stopping the audio system solve it or was the camera being turned off also a factor?

We see (in the table above) that after several minutes (~20:07) the camera is reactivated using the AV Bridge system. Without impacting the number audio packets. It’s therefore very likely that turning off the video by the main speaker wasn’t a factor in this case.

Bottom Line

The following can be deducted based on the facts OfficeExpert TrueDEM has brought to light:

The Teams Town Hall meeting was working fine until the moment the unmuted participant is added.

At that moment, the AV production system used by the main speaker for video & audio seems to start having problems sending audio (audio packages drop from 1500 to nearly 0 within the span of a minute). Was it the unmuting of the participant, the sudden incoming video data when the unmuted participant activated their camera? Or was it perhaps, unlikely as it seems, just pure coincidence that it happened at that same moment? It is hard to say.

Fact is, that the problem clearly originated with the main speaker (no outgoing audio packages) and most likely coming from their external AV production system. As switching from using the external AV Bridge system audio to using the internal device audio seemed to have fixed the issue.

Recommendations

The call had been rehearsed and none of the things the organizers did, including unmuting a participant was wrong. However, knowing that it most likely was the AV production system where it originated and that incoming video likely triggered it, means that certain general recommendations can be made and should be verified for future usage:

  • Test with the specific AV production system if the situation can be reproduced according to the above scenario. If so, ask the vendor to help debug the problem in the AV production system.
  • Use wired connections to connect with any audio and/or video rendering equipment you use. As volatile WIFI / Bluetooth connections can cause havoc on a Teams call. Especially if you know that the call is a Teams Town Hall meeting with hundreds of participants.
  • Make sure that any peripheral equipment you use is certified for use with the Microsoft Teams version you use.
  • Make sure you use the latest drivers of the devices and equipment.

Are you curious to learn more about how User Experience Monitoring can help you address problems and issues with Microsoft Teams calls in your organization?