Why WebRTC Fails Teleoperation, and Why Adamo Wins

Why WebRTC Fails Teleoperation, and Why Adamo Wins

Ken Dixson
8 minute read

Listen to article
Audio generated by DropInBlog's Blog Voice AI™ may have slight pronunciation nuances. Learn more

Table of Contents

WebRTC was built for video calls, not teleoperation. The gap shows up the moment a robot moves.

TL;DR: WebRTC carries everything that is not audio or video down a single data channel, so control commands end up sharing one pipe with battery telemetry, logging, heartbeat, and location. It’s drop order protects audio first and control last, its congestion control assumes a peer to peer call that falls apart over distance, and its negotiation stalls for seconds whenever it switches between robots. Worst of all, when a firewall blocks UDP it falls back to TCP, where head of line blocking can freeze a moving machine at exactly the wrong moment. Adamo takes a different approach end to end. The Adamo protocol, built specifically for robotics, uses a pub/sub scheme over a client-server network topology, prioritising video and control data ahead of audio, using forward error correction to ride through packet loss without waiting on retransmission. WebRTC speaks the language of conference calls. Adamo speaks the language of robotics.

If you operate robots remotely, you have probably reached for WebRTC at some point. The trouble begins when you ask it to do the one job it was never designed for, which is closing a control loop on a machine that can move, lift, and crash. Almost every failure that follows traces back to the same root: WebRTC assumes it is carrying a conversation between two people, and a robot is nothing of the sort. Watch how that single wrong assumption ripples outward.

It begins with a single pipe

WebRTC gives you audio and video as first class citizens, and for everything else it gives you a single data channel. That sounds reasonable until you map a robot onto it. Your control commands go into the data channel, and so does your robot messaging, your battery telemetry, your logging, and your heartbeat, all sharing the same lane. To tell them apart on the other end, you have to stamp every message with a type tag, which is the first sign that you are working against the transport rather than with it.

The bigger problem is volume. A single high frequency source can saturate that one channel and starve everything behind it, and geolocation is a perfect example because GeoJSON is verbose enough to flood the pipe on its own. When that happens, your control commands queue up behind a location blob, so the most important message you send waits on one of the least important. A robot does not have a single stream of importance, it has many, and forcing them all through one channel guarantees the wrong one wins under load.

And that pipe drops the wrong things first

Crowding aside, the real test of any pipe is what happens when it cannot carry everything, and here WebRTC makes a choice that suits a phone call and ruins a robot. It sheds traffic in a fixed order when bandwidth runs short: audio goes first, then video, and control data is the last thing it tries to protect. For two people talking, that ordering is correct, since dropped audio is the real failure and a frozen video tile is merely annoying, but for a machine it is exactly backwards, because a stale picture is survivable and a lost command is not.

Adamo orders things the way teleoperation actually needs. We prioritise video and control, and we carry no audio data at all, so when bandwidth gets scarce the picture degrades gracefully while the robot keeps responding. The operator holds control right up to the edge of the connection rather than losing it first.

The congestion control underneath assumes the wrong network

That drop order does not act on its own. It is steered by WebRTC's congestion control, and every algorithm inside it assumes a peer to peer connection, with two endpoints talking more or less directly and backing off politely the moment they sense loss. That assumption rarely holds in teleoperation, where the operator and the robot can sit on different continents, behind different carriers, separated by a long and variable path. Peer to peer logic reads that distance as a problem and throttles your throughput at the precise moment you need it most.

We route Adamo through a client server network instead, built deliberately for the long haul rather than a direct browser to browser link. That lets our proprietary congestion handling react to each link in the chain precisely, rather than reacting to crude, end-to-end network measurements.

Before any of that, just connecting costs you seconds

The network problems above only matter once you have a connection at all, and getting one is its own delay. Before WebRTC can carry a single frame it has to negotiate the link, and that ICE handshake takes time. Switching from one robot to the next can cost you somewhere around five seconds of dead air, which feels like an eternity in an operations centre. An operator overseeing a fleet should be able to glance from one machine to another and take control immediately, yet instead they sit watching a spinner while the next robot idles. Spread that delay across a full shift and a full fleet and the lost time adds up quickly.

Adamo keeps a persistent connection through its client server fabric, so moving between robots does not mean renegotiating from scratch every time you switch focus.

And when a firewall intervenes, the fallback can crash a robot

Every issue so far assumes the connection is at least running over UDP. When a firewall blocks UDP, WebRTC quietly falls back to TCP, and that fallback sounds like a safety net while behaving like a trap. TCP guarantees ordered delivery, and it enforces that order with head of line blocking, which means a single missing packet holds up every packet behind it until the lost one is resent. For a file transfer that is invisible, but for a robot in motion it is dangerous.

Picture a forklift driving into a Faraday cage. The signal drops, packets go missing, and TCP holds the entire queue while it waits to deliver everything in order. The operator's stop command is sitting in that queue, blocked behind a packet being retransmitted, and by the time it finally arrives the moment has already passed and the forklift has done the thing the operator was trying to prevent. In real time control, late is indistinguishable from wrong, and late is how machines crash. Adamo refuses that bargain entirely, because we built the transport to keep moving when packets go missing rather than freezing the line to wait for them.

Why Adamo holds where WebRTC slips

Run those failures back to back and a pattern emerges. None of them are bugs, and none can be tuned away, because each one is WebRTC doing exactly what a video call needs. The fix is not a faster codec or a closer server, it is a transport that starts from the robot instead of the call, and that is what we built.

Adamo runs on a pub/sub model of publishers and subscribers, topics and messages, which is the native shape of robot software. Your battery is a topic, your control is a topic, and your video is a topic, and each one flows on its own terms and carries its own priority instead of being crammed into a single shared channel. WebRTC asks you to bend your robot data to fit a conferencing pipe, whereas Adamo already thinks in the structure your robots use, which is what we mean when we say we speak the language of robotics.

That same robotics first thinking changes how we handle loss. Packet loss is a fact of remote operation, so the only real question is how you respond to it, and WebRTC waits and asks for the missing packets again, which means a round trip across the world before the stream can recover. Adamo uses adaptive forward error correction, sending enough redundancy that the receiver reconstructs lost packets on its own without ever asking for them back. There is no retransmission stalling the queue and no recovery round trip, so the stream stays smooth and the control loop stays closed even while the network is dropping packets underneath it.

The bottom line

WebRTC is a brilliant answer to a different question, and for browser video calls with zero configuration and smooth audio you should absolutely use it. Teleoperation asks something else entirely, because it puts a human in a loop with a machine that can hurt someone if it lags. The single shared data channel, the inverted drop order, the peer to peer congestion control, the slow negotiation, and the TCP fallback that freezes a moving robot, all flow from the same source, and they are the design working as intended, just for a problem that is not yours.

Adamo is that design done right for robots: a pub/sub that speaks robotics, client server routing built for distance, video and control prioritised ahead of audio, and forward error correction that carries the stream through loss. See it for yourself at adamohq.com.

FAQs

Can Adamo replace WebRTC platforms like LiveKit or Transitive?

Yes. LiveKit and Transitive both carry WebRTC as their underlying media transport, so they inherit its drop order, congestion control, and TCP fallback regardless of how the rest of the stack is built. Adamo does not run on WebRTC at all, which is why it stays fastest as conditions worsen rather than only on a clean single region path. Teams typically integrate Adamo through a Python, Rust, or C SDK and operate through a hosted client with gamepad, VR teleop, recording, and replay built in.


Does Adamo work over long distances?

Yes. Adamo routes through a client server network built for the long haul rather than a direct browser to browser connection, which is where WebRTC's peer to peer congestion control tends to break down. Combined with forward error correction that recovers lost packets without a recovery round trip, this lets Adamo hold low latency across long, variable paths instead of throttling throughput the moment distance introduces loss.

What latency does teleoperation need?

Teleoperation generally needs glass to glass latency in the tens of milliseconds. Above roughly 100ms operators begin to overcorrect and trust in the system degrades, and above 200ms precise manipulation becomes unreliable. Tasks like a robot arm placing a part, a humanoid balancing, or a vehicle navigating a tight space need delay kept low and, just as importantly, kept stable when the network gets rough rather than only when it is idle.


« Back to Blog