Considerations When Using Remote Simultaneous Interpreting During the Pandemics

November 14, 2020

When we talk about conference interpreting, the first thing that comes to your mind is probably a bunch of interpreters translating while looking through the glasses of their booths at the speaker in a conference room with a large audience. However, today I am going to discuss about a different scenario - remote simultaneous interpreting (RSI), specifically Cloud-based RSI, which relies on internet, Wi-Fi and cloud computing technologies.

Over the years, more and more cloud-based remote interpreting platforms emerge in the market as if soon interpreters would not need to travel for work anymore. With Covid-19 having led to a global work-from-home experiment, would clients and interpreters be more willing to work with CRSI in the near future?

1. What is Remote/Distance Simultaneous Interpreting (RSI or DSI)?

As its name implies, interpreters don’t have to physically be at the conference and they may even be working from another time zone. First off, the most basic thing about simultaneous interpreting (SI) is that it happens in real time. Unlike consecutive interpreting (CI), in this way the speaker is not disturbed and it saves a huge amount of time, allowing for a fairly smooth output for the listeners who speak various languages. This real-timeliness is what RSI has to be able to ensure about, if not, then it is simply not simultaneous interpreting anymore.

Typically, there are mainly two types of RSI and the key difference is the equipment involved. For example, when large international sports event would require RSI, dozens of interpreters are going to be in a central location while all the games are happening in various stadiums and venues. This kind of RSI, with interpreters still sitting inside booths with headsets on, allows centralized management of interpreters and reduction of supporting staff. To reduce the latency of transmission to a near real-time communication, satellite communication or dedicated lease lines are deployed.

In other words, audience would not experience any delay of interpretation while watching the game. Downside? This solution could burn a big hole in the budget.

Another type of RSI is what is normally called cloud-based remote interpreting (CRSI). Through cloud-based servers, interpreters use a software interface mimicking a traditional physical console (volume control, microphone on/off, relay button, etc.), in addition to a video feed and a text chat box (with a technician probably on the other side of the world, and possibly with the partner interpreter too who is also working from home), to run on a laptop. By connecting the laptop through a network cable or wifi to the Internet, interpreters are able to see and hear the conference from a remote location and interpret at the same time. The interpretation will be sent through the cloud server to the audience, either by traditional Infrared or Radio Frequency receivers, or a smartphone app (with Internet access).  

This technology is no different from dialing in two video conferences (such as Skype, Webex, or Zoom), adding a software interface that looks like a SI console (update: Zoom has added a simple console interface most recently). Without any booths or equipment, CRSI is highly cost efficient. However, does it allow interpreters to deliver the same quality of work? And what is the user experience like?

A Typical Remote Interpreting Scenario (RSI)
     A Cloud-based Remote Interpreting Scenario (CRSI)

2. Latency Issue of Remote Interpreting Platforms

For any remote interpreting solutions, the audience should be able to receive the interpretation instantly and synchronously, which is the feasibility basis of such a solution. It would make no sense if the latency is more than one second, which may sound like nothing, but please bear in mind that interpreters, as humans, would inevitably introduce latency. To achieve that, the practical solution using RSI is through satellite or lease-line communication (the first type of RSI discussed above), when budget is not a concern. How about CRSI, which is much more cost efficient? How different is it as compared to the typical RSI?  

1) The current internet latency based on Ethernet communication would be more than enough to deliver near real-time communication (in milliseconds), given the right infrastructure and configuration is used. However, the magic of Ethernet communication can only work so far in mostly the backbone network level till the traffic hits its bottleneck – the termination point, and after that, the transmission needs to reply on terminal infrastructure (routers, switches and hubs).

On the other side, audience receive the translation through wifi provided by the conference or 4G that is connected to the cloud server, which means latency could further increase in the last mile of transmission. WebRTC, the world’s best internet multi-media transmission technology, can theoretically reduce the latency to 200-300ms, excluding the transmission in the last mile which relies on the conference wifi conditions and any traffic congestion resolution (QoS configuration) in place.

In order to achieve the lowest latency in theory (200-300ms), there are a number of things to consider and human intervention is necessary. For example, a cloud server needs to be allocated near the conference venue, and the interpreters that are working remotely cannot be physically too far away from the conference venue (cloud server location) as the further away they are, the higher the latency would be. Therefore, the distance of remote interpreters from the conference location (assuming a server is nearby) makes a bigger difference than people would have expected.

The author has tested out a few CSRI platforms that are now available in the market and only a couple are able to achieve the theoretical latency most of the time in an ideal situation, but only with a few dozens of audiences (while most are not able to reach the ISO standard, i.e. below 500ms). It’s important to note that none of the platforms are able to achieve the same level of ideal latency all the time as there are usually fluctuations with the internet conditions throughout the day, which could cause significant increase of latency.

2) Would 5G technology be able to improve the latency at some point in the future? As mentioned, the latency was caused by the bottlenecks in the physical infrastructure, rather than the backbone network. 5G could indeed increase the bandwidth of the terminal points, but as any base stations currently are shared by a number of users, the bandwidth per user is unstable and fluctuates over time. Besides bandwidth, what’s more critical in terms of the transmission and voice/video quality would the speed and jittering of the network. Unfortunately, mobile wireless network is still very much lacking comparing with fixed network and even wifi. On the interpreter side, a 5G connection makes no improvement but most likely ends in the opposite because the computer is more stable, connected through a fixed network as required by most CRSI platforms, which is part of the bottlenecks mentioned above.

3) Video resolution and clarity is also a critical factor for remote interpreters to be able to perform effectively, however, increasing video clarity and resolution would increase the latency accordingly. For those conferences with a large number of online participants (as what is happening during this Covid-19 pandemic crisis), the demand for high-quality video transmission is tremendous, which most of the RSI platforms now are not able to meet. It cannot be resolved by simply increasing the bandwidth and cost, but needs a complete rework of the entire platform architecture. Live streaming into Mainland China is another major issue with most RSI platforms as there are no servers allocated at all in China from the platforms based outside China.

Given the technical constraints above, CRSI is more suitable for purely online meetings (such as webinars and webcasts) especially for non-interactive meetings where audience would only listen but not speak (chat box can be enabled), as it is unlikely they would find out about the rather significant delay in this scenario. 

3. From Client’s Perspective

Latency is also what clients need to consider when they decide to work with an RSI platform. If the audience are in the conference room, the extremely delayed “lip-sync” interpretation from the speakers are most likely due to the technical insufficiencies rather than the interpreters. Suppose traditional SI receivers can be used in the last mile transmission, latency issue can be mitigated, comparing with lower operational cost through BYOD solution (audience using their own smartphones as receivers) but with an increased latency (refer to explanation above in 2.3).

Additionally, as the conference and simultaneous interpretation are being live-streamed on a CRSI platform, the content is recorded and stored in the cloud server by default. Therefore, it is extremely important to be reminded of the data privacy, interpreters’ copyrights, and cybersecurity issues, without proper management and legal arrangement, may lead to undesirable consequences. If any cyber attacks happen at any of the allocated cloud servers during the conference, the content may be subject to more serious confidential data leakages exposures.

4. Interpreter Userability Issues

Interpreter’s user experience is so important that we should not ignore because it is critical to delivering good translation within such high-pressure working environment.

  1. A high quality sound feed (more than 48bit, with a wide frequency response) is absolutely necessary as interpreters need to first hear and recognize the source audio clearly to be able to do their work. However, normal phone lines or internet-based teleconference systems, such as skype or webex, offer only 24bit narrow frequency repsonse (sometimes with slightly better quality) sound, which is sufficient for a conversation but not so for interpreting, a highly cognitive activity, due to the ambiguity caused by missing high-pitch sound details. Very often, interpreters have to struggle hard with cracking sounds, unintelligible voiced/voiceless sounds, audio packet loss (due to poor Internet connection) and intermittent audios during this process when simultaneously interpreting through CRSI. As a result, interpreters would feel much more exhausted after working for an CRSI job than a normal SI one using traditional equipment.
  2. Video cues are equally important. On top of a camera live feed which gives the interpreters an idea of what is going on in the conference room, a synchronized screen (slides) feed is a must, keeping the interpreters up to date with what the speakers are talking about. On CRSI platforms, there is only a low quality camera view of the stage (most of the time from a skewed angle), and no slides or screen view is shared at all (not allowed in the software interface either). Adding the screen/slides view might be possible, and higher resolution videos are also possible but only with a higher bandwidth and therefore larger latency would be generated. In addition, the camera feeds, presentation slides, and operating console interface are all in the same screen, making interpreters exhausted with too many sources of visual inputs on top of the audio input and computer operation needed at the same time.
  3. As you all know, simultaneous interpreters juggle many tasks at the same time, including listening, deciphering/reasoning/composing, speaking, taking notes of numbers, sometimes referring to scripts (sight interpretation with live feed) and slides. It would be unreasonable to have the interpreters to split their attention to operate computer software on the same or another computer (and to make things worse, maybe even have to do online chatting with the technician for troubleshooting, or a fellow partner for assistance and handover chat).

The “anti-humane” design on a computer interface would bring more trouble to the interpreters, require more attention to the volume adjustment operation, distracting them and lowering the delivery quality and efficiency. The design of a traditional physical interpreting console is essentially a perfection over decades from generations of SI equipment manufacturers and interpreters. The so-called “convenient” software interface on a touch screen or through a mouse may still have a long way to go to be able to match up to people’s expectation.

5. If interpreters are required to work from home on a CRSI platform, attention should carefully be given to the following considerations:

1) Simultaneous interpreters usually work in turns and as a team, such as jotting down numbers and notes to help each other out. When they sit next to each other, this could be done without even noticing. CRSI is quite another story, as switching involves informing your partner(s) through online chatting which relies on internet connection and speed. If and when technical difficulties happen unfortunately, they still need to type in a chat box to be able to inform a remote technician about the situation while the speaker is still talking and the audience waiting for the translation on the other end.

“Booth” partners on a CRSI, if not having known each other beforehand, would remain as strangers, even after working together for a dozen times, with little communication (neither given a chance nor time), let alone exchanging glossary lists in advance or helping each other out in preparing and carrying out the interpretation together. The handover approach is still insufficient, with a software button and a count-down display, does not necessarily ensure the successful handover, but rather stress out/distract the interpreters every time a handover happens.

2) Although working from home in your pajamas seems intriguing (for instance, travelling time, traffic jams and cost of transportation are all saved, plus working a comfortable and familiar environment), it is no easy feat to build a home SI studio without making certain amount of investment. The cost includes (but not limited to):

  1. Besides the computer you usually use, you need an extra computer to run the CRSI interface as your “console”, as you would need your regular computer to refer to the prepared slides and search for information during your work. Some of the CRSI platforms require interpreters to install remote desktop control software on your computer, and if you are indeed considering working with such platforms, do invest in an extra computer for that given your data privacy infringement exposure on your personal computer (the regular used one). You also need a back-up computer from the compliance perspective, just in case if the one you use experience any technical difficulties during the meeting.
  2. Most of the CRSI platforms would require interpreters to purchase a high-end USB microphone or all-in-one headset as a compliance measure, and 3.5mm headsets usually used by interpreters are not advised (as some CRSI platforms lack in analog to digital audio format processing capabilities).
  3. Prepare a room as your “booth” (if you wish not to disturb your family, soundproofing is also a must).
  4. Fiber optic fixed network connection to home (gigabyte routers, switches, and network cards, as most of the family-use routers and network cards are prevalently 10M/100M-300M ones) with a strong and wide 5GHz wifi coverage.
  5. A backup UPS power supply in case of emergency power shutdown at home.

With all these, interpreters also need to be savvy with IT infrastructure, such as networking, audio processing, and be prepared to troubleshoot in the home studio anytime wearing the hat of an “IT specialist”. Comparing with traditional SI equipment setup with booth, it doesn’t save a lot using a CRSI system, but rather the cost of building the home studio (CAPEX) and its operation (OPEX) is completely transferred to the interpreters.

Besides the investment, interpreters have to take the legal liabilities and risks partially equivalent to SI equipment providers’ in this scenario. The amount of efforts that need to be put in exchange for the “flexibility and comfort” of working from home is most probably more significant than expected.

Based on the aforementioned requirements, it is never advisable for an interpreter to connect just any wi-fi network and start interpreting right away. If someone is brave enough to have tried it for a few times, he/she would most likely have experienced a sudden crash of certain functions, or even come to a complete stop of the meeting. With the risks and investment amounting, would interpreters be willing to carry out such a mission critical task in a home studio? “Teleworking” is tempting, but interpreters need to ascertain the authenticity of the “convenience”.

If in the future this becomes a practical reality, the author believes that interpreters should seriously consider charging the cost of infrastructure rental to the CRSI platform operators (when the traditional SI equipment cost are transferred to the interpreters), including but not limited to the rental of quiet/soundproof working space, usage of computers, networking equipment, backup power supply, high speed internet service, utility fees and don't forget, the infrastructure failure risk premiums. Certainly interpreters can raise the request to the CRSI platforms or the clients in providing the working space, equipment, networking infrastructure, etc. After all, interpreters’ job scope never includes providing the equipment or technology. Colleagues might as well be alerted all the time, to avoid being played as a “sucker” by the CRSI operators.

It is not impossible to improve further the user experience, but ultimately external hardware would be introduced into the solution, and more investment in hardware would defeat the whole purpose of cost reduction and makes not much difference from rebuilding the hardware console.

Some CRSI operators have taken into consideration the interpreters’ userability, technical support and legal liabilities, and they built “hubs” with language service providers installing soundproof booths in their offices, providing large displays, computers, USB microphones, external peripherals, even including backup power supplies. Such costs and investments are in fact transferred to the language service partners. However, the hub model comparing with traditional SI equipment setup onsite doesn’t offer much cost advantage to the end client. It may only be cost effective if the end client is subscription user with frequent webinar needs, or for super large events where the traveling cost of interpreters can be cut down.

3) From the legal perspective, there is no supervision on the CRSI market and operators. Though AIIC is actively working the incorporating distance interpreting standards into the ISO standards (ISO/PAS 24019:2020), still none of the CRSI platforms in the market is completely compliant and there are no supervisory authorities to ensure the compliance. If any accident occurs in such a setup for any event, who would be liable to it? When traditional SI equipment is replaced by CRSI platforms, especially when a home studio is used, the liabilities would be hard to define. From the technical analysis, as stated earlier in this article, the frequency and scale of CRSI platforms running into problems and the risk of failure would be much higher, as compared to using traditional SI equipment.

4) To any interpreters, the protection of one’s hearing is utmost important, and should never be underestimated. Many choose to use their own headsets or earpieces for productivity improvement and comfort in an SI booth. Professional interpreter consoles are built-in with acoustic shock protection functions, to modulate the volume whenever the input volume suddenly shoots up. In such a scenario, SI equipment providers or the party that engage the interpreters would be responsible for the interpreters’ work place health and safety (WSH). When working from home using a CRSI platform, interpreters are the sole party responsible whereas they would have no control over the sudden input volume variations but may suffer from an acoustic shock. If any hearing disability or impairment is caused for the interpreters from sound input technical problems of the CRSI platforms, then who would be responsible for it?

6. Applicable Scenarios for CRSI

We have talked about constraints and limitations, so what are the scenarios where interpreters can work with CRSI while minimizing the aforementioned userability issues?

  1. When conferences are held online (teleconferences, webinars, webcasts, remote attendance), and audience is not physically present at the conferences, CRSI would be a good solution to reduce the cost of traveling by interpreters. Usually this type of online conference is short and sweet (within a reasonably short period), so one simultaneous interpreter may consider doing it solo without running into the trouble of switching with a partner.
  2. When cost saving is the top priority (and client/audience is willing to accept the delayed interpretation), for example, an office lunch training every day over a period of two weeks. CRSI is probably the best solution in this scenario.

7. If fellow interpreters are going to take up jobs through CRSI in the two scenarios mentioned above, here I believe are what they need to consider:

Business Models of the Current CRSI Platforms and Conflict of Interests

As a SaaS (Software as a Service) platform, some of the CRSI operators may claim they are all about technologies but in fact have set up partner interpreting companies to directly source interpreters. This would create a direct conflict of interests and negative impact on the partnering language service providers (senior interpreters, translation companies and SI equipment providers) because CRSI platforms are no doubt appealing to the clients with its significant cost savings. With the increasing promotion of CRSI platforms, is technology advance the real drive behind, or is it the lowered cost from sacrificing interpreters or LSPs’ interests? This is the question we need to ask ourselves as interpreters for the future of our industry development.

With all these in mind, the perfect situation where audience are able to listen to simultaneous interpretation wherever they want while interpreters can deliver efficiently and successfully is still far from reality. There is still a lot to be done in order to break down the technology bottlenecks of CRSI. However, if and when that is achieved, the world would become flat and remote interpreting would in fact be a “teleporting” technology.  And it may not be the best for all as market rate would inevitably be brought down to the same level across the world and for markets that are currently enjoying the higher rates, such as mainland China and Japan, it could be a total game changer. Offshore market scale (say, Mandarin interpreting market outside mainland China) is usually not big, and it would continue to shrink (jobs will be assigned to the location with larger pool of interpreter resources – the domestic market).

Having said so much, with the global Covid-19 pandemic going on, remote online conferences are unavoidable, is there any way to continue our SI work through a third-party technical platform, without worrying about losing clients or jumping into the traps of translation agencies hiding behind the existing CRSI platforms? Please contact me if you wish to know the solution details.

Biography: Dr. Bernard Song, a consultant conference interpreter (English-Mandarin Chinese) based in Singapore, born and raised in the Mainland China. With more than 18 years of interpreting experience globally, Dr. Song also organizes multi-language teams of top interpreters for large events and conferences. He has also been an interpreter trainer for universities in Singapore years before.

With a Ph.D in Computer Engineering, he has more than 12 years of research and development experience as a scientist and engineer, in Artificial Intelligence, Virtual Reality and Human Language Processing. With the IT background, Dr. Song has hands-on experience and in-depth knowledge of SI equipment, transmitters/receivers, CCU and radiators. He is also an avid fan of utilizing the latest technologies to transform the conference interpreting equipment landscape.

Currently, he is offering pro-bono consultancy service to clients as a Remote/Distance Simultaneous Interpreting (RSI/DSI) technology expert, during the Covid-19 Pandemic period.

Contact us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
If you want me to provide you Interpretation/Translation service in English to Chinese (Mandarin) and vice versa, or a quotation for conference interpreting services of any languages, simply contact me by phone or email.

Dr. Bernard Song