Thoughts About Remote Simultaneous Interpreting
When we talk about conference interpreting, the first thing that comes to your mind is probably a bunch of interpreters translating while looking through the glasses of their booths at the speaker in a conference room with a large audience. However, today I am going to discuss about a different scenario – remote simultaneous interpreting (RSI), which relies on internet, Wi-Fi and cloud computing technologies.
Over the years, more and more cloud-based remote interpreting platforms have emerged in the market. These platforms have allowed interpreters to provide translation at their location instead of having to travel from destination to destination.
But what exactly is RSI and how will it impact interpretation? Read on to discover more.
What is RSI?
Typically, there are mainly two types of RSI and the key difference is the type of equipment involved. For example, any large international sports event would require RSI, which puts dozens of interpreters in one place while all the games are happening in various stadiums and venues. This kind of RSI, with interpreters still sitting inside booths with headsets on, allows centralized management of interpreters and reduction of supporting staff. To reduce the latency of transmission to a near real-time communication, satellite communication or dedicated lease lines are deployed.
In other words, audience would not experience any delay of interpretation while watching the game.
On the other hand, this solution could burn a big hole in the budget.
Another type of RSI is what is normally called cloud-based remote interpreting (CRSI). Through cloud-based servers, interpreters use a software interface mimicking a traditional physical console (volume control, microphone on/off, relay button, etc.), in addition to a video feed and text chat box, to run on a laptop. By connecting the laptop through a network cable to the Internet, interpreters are able to see and hear the conference from a remote location and interpret at the same time. The interpretation will be sent through the cloud server to the audience, either by traditional Infrared or Radio Frequency receivers, or a smartphone app (with Internet access). Without any booths or equipment, CRSI is highly cost efficient. However, does it allow interpreters to deliver the same quality of work? And what is the user experience is like?
A Typical Remote Interpreting Scenario (RSI)
A Cloud-based Remote Interpreting Scenario (CRSI)
Latency Issue of Remote Interpreting Platforms
For any remote interpreting solutions, the audience should be able to receive the interpretation instantly and synchronously, which is the feasibility basis of such a solution. It would make no sense if the latency is more than one second, which may sound like nothing, but please bear in mind that interpreters, as humans, would inevitably introduce latency. Therefore, to ensure the most ideal audience experience, latency should be reduced to close to zero. To achieve that, the practical solution using RSI is through satellite or lease-line communication (the first type of RSI discussed above), when budget is not a concern. How about CRSI, which is much more cost efficient? How different is it as compared to the typical RSI?
- The current internet latency communication based on fiber optic transmission would be more than enough to deliver near real-time communication (in milliseconds). However, the magic of fiber optic transmission can only work so far till the traffic hits the termination point, and after that, the transmission needs to reply on local infrastructure (routers, switches and hubs). Therefore, if the audio/video is transmitted from the source (venue) to the interpreter (in a remote location) via a cloud server on the Internet, and back again to the cloud server, then to the venue, and then broadcasted, this procedure would practically cause 5-10 seconds of delay. The author has done a remote simultaneous interpreting job in location A for Cisco, for an event in location B, while the cloud server is at location C, and the audience is spread all over the world. Based on Cisco’s real-time statistics, the average latency between the source voice in venue B and the interpretation back to venue B (excluding interpreter introduced latency) would be 8-10 seconds.
- Would 5G technology be able to improve the latency at some point in the future? As mentioned, the latency was caused by the bottlenecks in the physical infrastructure, rather than the backbone network. 5G could indeed increase the bandwidth of the terminal points, but as any base stations currently are shared by a number of users, the bandwidth per user is unstable and fluctuates over time.
- Would it help to remove all the bottlenecks in the infrastructure? Yes of course, but that would be equivalent to deploying a lease-line solution which bears a much higher cost.
- Video resolution and clarity is also a critical factor for remote interpreters to be able to perform effectively, however, increasing video clarity and resolution would increase the latency accordingly.
Given the technical constraints above, CRSI is more suitable for situations where audience is not physically at the conference as they will not be able to find out about the rather significant delay.
Interpreters Usability Issues
Interpreter’s user experience is so important that we should not ignore because it is critical to delivering good translation at such high-pressure working environment.
- A high-quality sound feed (more than 48K) is paramount as interpreters need to first hear and recognize the source audio clearly to be able to do their work. However, normal phone lines or internet-based teleconference systems, such as skype or WebEx, offer only 24K (sometimes with slightly better quality) sound, which is enough for a conversation but not so for interpreting, a highly cognitive activity. Very often, interpreters have to struggle with cracking sounds, audio packet loss (due to poor Internet connection) and intermittent audios during this process when simultaneously interpreting through CRSI.
- Video cues are equally important. On top of a camera live feed which gives the interpreters an idea of what is going on in the conference room, a synchronized screen (slides) feed is a must, keeping the interpreters up to date with what the speakers are talking about. On CRSI platforms, there is only a low quality camera view of the stage (most of the time from a skewed angle), and no slides or screen view is shared at all (not allowed in the software interface either). Adding the screen/slides view might be possible, and higher resolution videos are also possible but only with a high bandwidth and larger latency would be generated.
- As you all know, simultaneous interpreters juggle many tasks at the same time, including listening, deciphering/reasoning/composing, speaking, taking notes of numbers, sometimes referring to scripts (sight interpretation with live feed) and slides. It would be unreasonable to have the interpreters to split their attention to operate a computer software on the same computer (and maybe even must do online chatting with the technician for troubleshooting).
- Simultaneous interpreters usually work in turns and as a team, such as jotting down numbers and notes to help each other out. When they sit next to each other, this could be done without even noticing it. CRSI is quite another story, as switching involves informing your partner(s) through online chatting which relies on internet connection and speed. If and when technical difficulties happen unfortunately, they still need to type in a chat box to be able to inform a remote technician about the situation while the speaker is still talking and the audience waiting for the translation on the other end.
That said, it is not impossible to mitigate some of these usability issues, but the improved solution would be much closer to the traditional interpreting console used in a traditional onsite interpreting setting. “Disruption” for the sake of disruption would not necessarily be something better (but perhaps worse).
Applicable Scenarios for CRSI
We have talked about constraints and limitations, so what are the scenarios where interpreters can work with CRSI while minimizing the aforementioned usability issues?
- When conferences are held online (teleconferences, webinars, webcasts, remote attendance), and audience is not physically present at the conferences, CRSI would be a good solution to reduce the cost of traveling by interpreters. Usually this type of online conference is short and sweet (within a reasonably short period), so one simultaneous interpreter may consider doing it solo without running into the trouble of switching with a partner.
- When cost saving is the top priority (and client/audience is willing to accept the delayed interpretation), for example, an office lunch training every day over a period of two weeks. CRSI is probably the best solution in this scenario.
If fellow interpreters are going to take up jobs through CRSI in the two scenarios mentioned above, here I believe are what they need to consider:
- Cloud-based interpreting platforms claim that they save the hardware rental cost of the end client by providing such a SaaS solution. Where did the cost of hardware go? It did not just disappear but transferred to each individual interpreter, including the quiet working space (vs. booth), our own computers (vs. consoles), USB headset and microphones (vs. analog 3.5mm headset/earpiece), electricity and network usage. These should be provided by the end client. Either we can request to work in the client’s office with all these provided, or we can charge a fee for the infrastructure cost.
- It is important to manage your client’s expectation from the beginning because as discussed you are likely to underperform with the usability issues.
With all these in mind, the perfect situation where audience are able to listen to simultaneous interpretation wherever they want while interpreters can deliver efficiently and successfully is still far from reality. There is still a lot to be done in order to break down the technology bottlenecks of CRSI. However, if and when that is achieved, the world would become flat and remote interpreting would in fact be a “teleporting” technology. And it may not be the best for all as market rate would inevitably be brought down to the same level across the world and for markets that are currently enjoying the higher rates, such as mainland China and Japan, it could be a total game changer.
Biography: Dr. Bernard Song, a consultant conference interpreter (English-Mandarin Chinese) based in Singapore, born and raised in the Mainland China. With more than 18 years of interpreting experience globally, Dr. Song also organizes multi-language teams of top interpreters for large events and conferences. He has also been an interpreter trainer for universities in Singapore years before.
With a Ph.D in Computer Engineering, he has more than 12 years of research and development experience as a scientist and engineer, in Artificial Intelligence, Virtual Reality and Human Language Processing. With the IT background, Dr. Song has hands-on experience and in-depth knowledge of SI equipment, transmitters/receivers, CCU and radiators. He is also an avid fan of utilizing the latest technologies to transform the conference interpreting equipment landscape.