Tesla, Waymo, And Autonomous Driving Via Imitation Learning

About: Tesla, Inc. (TSLA)
by: Trent Eady

Imitation learning may hold the key to autonomous driving.

Large-scale training data holds the key to imitation learning.

Tesla has large-scale training data.

Waymo does not.

Autonomous driving - if it can be done - is one of the biggest business opportunities of our time. In the U.S., vehicles drive 3.22 trillion miles per year. Were all these miles to become autonomous, and a 10 cent per mile profit made by autonomous transport service providers, the U.S. market would generate $322 billion in annual profit. Based on this sort of logic, valuations for Waymo (GOOG, GOOGL) - formerly the Google self-driving car project - reach as high as $250 billion.

To me, one of the most fascinating pieces of information to come out about autonomous driving lately is that Waymo is using imitation learning. Imitation learning is a machine learning technique in which a neural network learns to map certain kinds of actions to certain kinds of environment states based on observing what humans do. By training on many examples of human action, the neural network learns "If you see this, do that." Such as, "If you see a stop sign in front of you, stop." Or, "If you see a parked car blocking your way, nudge around it like so."

Drago Anguelov, the lead of Waymo’s research team, recently gave a talk at MIT where he went deep into this topic:

For types of situations where Waymo can collect a lot of data, it uses imitation learning. But Anguelov says for the long tail of human driving behavior - rare situations - there aren’t enough training examples in Waymo’s dataset to do imitation learning. In these cases, it has to rely on hand-coded algorithms, which Anguelov believes should be replaced with machine learning wherever possible.

Extrapolating linearly from the past, Waymo has driven somewhere around 15 million miles. Imagine a situation that arises every 30 million miles on average. Waymo might not have encountered a single example. With a situation that occurs every 1 million miles, it might have only 15 examples. I don't know what's true for imitation learning, but for neural networks that do image classification the rule of thumb is you want at least 1,000 examples per image category (e.g. great white shark). There are lots of rare situations Waymo has never seen, or has seen too few times.

While Anguelov would prefer to do imitation learning across all human driving behavior - including the long tail - Waymo just doesn’t have the data to do it. Well, who does have the data?

Tesla (TSLA). It’s estimated to have more than 400,000 cars with the latest generation of autonomy hardware, driving more than 13 million miles per day. When the fleet grows to a little over 1 million vehicles, it will be driving 1 billion miles per month. As the fleet grows, mileage will grow.

Amir Efrati, a reporter at The Information, has written that Tesla is leveraging this mileage for imitation learning, citing at least one unnamed source who has worked in Tesla’s Autopilot division:

Tesla’s cars collect so much camera and other sensor data as they drive around, even when Autopilot isn’t turned on, that the Autopilot team can examine what traditional human driving looks like in various driving scenarios and mimic it, said the person familiar with the system. …​ Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations. “You don’t need anything else” to teach the system how to drive autonomously, said a person who has been involved with the team.

Tesla hasn’t confirmed this, but CEO Elon Musk made some comments in a recent interview with ARK Invest that could be interpreted as describing imitation learning. One quote from the interview:

The advantage that we have that I think is very difficult to overcome is that we have just a vast amount of data on interventions. So, effectively, the customers are training the system on how to drive. And there are millions of corner cases that are so obscure and weird you wouldn't believe it...

Another quote:

Every time somebody intervenes - takes over from Autopilot - it saves that information and uploads it to our system ... And we’re really starting to get quite good at not even requiring human labeling. Basically the person, say, drives the intersection and is thereby training Autopilot what to do.

These comments are ambiguous and there are multiple possible interpretations. But, to me, imitation learning fits most closely with what Musk said.

To do imitation learning, Tesla wouldn't need to upload any raw sensor data like videos. Instead of raw sensor data, it would upload the perception neural network's judgments about what it sees. The technical term for this is the mid-level representation. The easiest way to understand this concept is to visualize it. A Tesla hacker who goes by the name greentheonly created this visualization of the perceptual judgments made by the neural network running in a Tesla:

The mid-level representation includes the information visualized by the 3D bounding boxes around vehicles, the text labels stating vehicle type and distance, and the “green carpet” showing driveable roadway.

To do imitation learning, the mid-level representation would be paired with data about what the human driver did with the steering wheel and pedals. These state-action pairs, as they’re called, don't need human annotation, which is a costly and slow process. They just need to be uploaded to Tesla’s servers, and then they’re ready to train a neural network.

Besides Waymo’s endorsement, why should we believe imitation learning can train a neural network to execute complex tasks? To me, the most compelling example of imitation learning’s success is AlphaStar, a neural network created by DeepMind (an Alphabet subsidiary). DeepMind used imitation learning to train AlphaStar on millions of StarCraft games played by humans. StarCraft is a complex game requiring long-term planning, high-level strategy, and real time tactical control of military units. This makes it a tough challenge for AI. But using imitation learning alone, AlphaStar achieved performance estimated by DeepMind to be equivalent to a human player around the middle of StarCraft’s competitive rankings. In other words, AlphaStar is estimated to have reached roughly median human skill with StarCraft using imitation learning.

The equivalent for autonomous driving would be if Tesla uploaded state-action pairs from millions of drives with humans in control, and used this data to train a neural network to reach roughly median human performance on driving. Driving is complex, but so is StarCraft, and it’s not clear to me why imitation learning should work any less well for driving than for StarCraft. There may be good reasons, but if there are I don’t know them. (If you’re aware of a good reason, please let me know.)

An important part of making imitation learning work is getting perception right. If Tesla’s perception neural network makes errors, the system will fail to recognize the true state of the environment and fail to obtain the correct state-action pair. To use a toy example, if it misclassifies red traffic lights as green, and observes humans stopping at those lights, it will learn that it should stop at green lights. Even if it learns the correct response to environmental cues, it needs to detect those cues. So, if it knows to go at green lights and stop at red lights, it needs to accurately classify those lights to respond correctly. The system needs to see accurately in order to learn what to do, and it needs to see accurately in order to apply those lessons.

One step toward solving perception is Tesla’s new neural network computer, known as Hardware 3. Tesla’s Director of AI, Andrej Karpathy says that Tesla has “trained large neural networks that work very well” but it’s “not able to deploy them to the fleet due to computational constraints.” With Hardware 3, Tesla will be able to run the larger, more accurate perception neural networks.

For companies like Waymo that don’t have access to large quantities of production fleet data, it's hard to see a compelling path forward. With insufficient training examples to do imitation learning for the totality of the driving task, Waymo is forced to rely on hand-coded algorithms. In computer vision, hand-coded algorithms have been made completely obsolete by neural networks. The same is true for machine translation. With games like StarCraft, no hand-coded bot comes close to the performance of the best neural networks like AlphaStar. A neural network approach to autonomous driving seems more promising than a hand-coded algorithm approach.

To make imitation learning work for driving the way it worked for StarCraft (assuming this is even possible), Waymo would probably need to scale up its training fleet by multiple orders of magnitude. One idea would be for Waymo to sell a driver assistance system to automakers and collect data through it, much like Tesla. This would require Waymo to build a new product that offers a much narrower scope of functionality. The system would have to make due without LIDAR, or at least without the sort of high-range, high-resolution LIDAR typically used on autonomous cars, since these LIDAR units carry a cost that is prohibitive for consumer vehicles. Automakers may be reluctant to go in for the deal if it means Waymo gets all the data, so Waymo may need to sweeten the pot somehow. Perhaps it could agree to share future autonomous ride-hailing revenue with its automotive partners. Or perhaps a partnering automaker could buy a large equity stake in Waymo. Maybe that could work.

For companies like GM (GM) and Ford (F), the temptation to vertically integrate has precluded a partnership of this sort. Why partner with Waymo when you can buy your own Waymo for $1 billion? At least, I imagine that’s the rationale. Given the abundance of self-driving car startups, this could put Waymo in a tough negotiating position. At the same time, these automakers don’t appear to be using their large production fleets to collect training data.

Waymo is in a tight spot. Tesla is in a sweet spot. For that reason, I think Tesla is more likely to deserve the $100 billion-plus valuations assigned to Waymo. It’s possible that full autonomy will never happen and so is worth nothing. But if it does happen, it could be worth trillions. As far as I can tell, Tesla is in a better position than Waymo (and than all other companies) to use imitation learning and cut a slice of that big, big pie.

Tesla and Waymo, side by side

Photo by Steve Jurvetson.

Experts: please help me learn

Are you an expert on machine learning or autonomous vehicles? Is there a factual error in this article, or something you disagree with? Important missing information? Something I haven't considered? Please let me know! I would so appreciate your feedback. You can contact me using this form.

Disclaimer: This article is not investment advice. Please learn about the risks and consider consulting a licensed financial advisor before making investment decisions.

Disclosure: I am/we are long TSLA. I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.