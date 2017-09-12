When Tesla (TSLA) CEO Elon Musk took the stage at TED this April, he said something that has been playing over and over in my mind ever since:

The whole road system is meant to be navigated with passive optical, or cameras, and so once you solve cameras or vision, then autonomy is solved. If you don't solve vision, it's not solved. So that's why our focus is so heavily on having a vision neural net that's very effective for road conditions. ... You can absolutely be superhuman with just cameras. Like, you can probably do it ten times better than humans would, just cameras.

Musk’s backdrop was this demo video of a Tesla Model X — equipped with the same production hardware in all new Teslas — driving itself through Palo Alto, California using only cameras and GPS:



Musk also said that he thought full self-driving would be ready to launch in “about two years,” so in “about” 2019.

Since Musk made these remarks, I’ve been wondering how he could be so confident that cameras alone are sufficient for full self-driving at a level of safety significantly above the average for human drivers. I’ve been wondering this for the past four months. And then a few days ago, in the shower, it hit me.

Computer vision with cameras already outperforms humans!

Let me demonstrate. Can you see the third vehicle up ahead in this shot? It’s surrounded by a yellow box. To the human eye, it looks like there’s nothing there.

Source: NVIDIA.

Here’s that vehicle up close. It’s a semi truck.

To be fair, with different input — the high-resolution color vision of the human eyeball — humans might be able to spot the semi truck. But given the same input, computer vision outperforms humans.

Machine learning performs more than 2x better than humans at image recognition: a 2% error rate for the top AI vs. a 5% error rate for the top humans.

Computer vision is very slightly better than humans at classifying close-up photos of traffic signs, which are expressly designed to be easily recognizable by humans. In one study, the top outperformed the human average by a tiny margin: a 0.54% error rate for the AI vs. a 0.78% error rate for humans. That study was conducted back in 2011. Since then, computer vision has improved ten-fold.

What’s more, computer vision will continue to improve. As more driving data is collected, it will improve specifically for objects, events, and environments encountered on the road. By 2018, Tesla’s Hardware 2 cars will have driven 1 billion miles. Data from some fraction of those miles will be uploaded to Tesla in order to train its vision neural network.

Here’s what many observers fail to appreciate. Tesla’s full self-driving hardware and software doesn’t need to be perfect or ideal. It just needs to be:

1. safer than the human average

2. available before LIDAR-based solutions

Bringing self-driving that improves safety to market as soon as possible is an ethical imperative. It is also of immense financial and competitive importance.

If Tesla can launch full self-driving significantly ahead of any competitor, it will create virtually unlimited demand for its cars and generate billions in profit from the Tesla Network. This will allow the company to aggressively ramp vehicle production and entrench its competitive position.

Affordable LIDAR with adequate specs for full self-driving has not yet arrived. By betting on cameras, Tesla could gain a significant lead over its competitors, which are all pursuing LIDAR-based solutions.

Tesla’s self-driving system can later be augmented with LIDAR if it improves performance. Any number of sensors could be added over the very long term.

The LIDAR debate

There are several moving parts to the debate about LIDAR. The first and least controversial question is whether LIDAR is better than no LIDAR, all else being equal. Everyone agrees on this. If cheap, compact LIDAR existed today that didn't impact a car's appearance or add significant cost, my guess is that Tesla would include it in its cars. My guess also is that once cheap, compact LIDAR is available, Tesla will add it to its cars.

Second, and more controversially, is whether self driving that's better than human driving fundamentally requires LIDAR or not. Musk's argument against LIDAR is that it is blind in weather conditions that cameras (and human eyes) can see in: heavy rain, fog, snow, or dust. While some companies like Ford (F) have been trying to fix this problem in software, it's not clear that such efforts have been more than partially successful. It defeats the purpose of LIDAR if abstracting away raindrops or snowflakes in software is a harder computer vision problem than making camera-based computer vision better than human vision for driving purposes.

If LIDAR can't be made to work in the conditions of rain, fog, or snow that humans safely drive in, then self-driving cars that use LIDAR will either 1) not be able to drive in these weather conditions or 2) will fall back to cameras and radar. Maybe we would accept (1) as part of the trade-off for self-driving cars. However, a car that could autonomously drive in rain, fog, or snow without compromising safety would be more attractive for consumers in many climates. If (2) is possible without compromising safety, then LIDAR isn't necessary.

The third question, and the most subtle, is whether foregoing LIDAR makes software development so much longer and harder that it delays the deployment of self-driving longer than waiting for affordable LIDAR does. This is a very hard question to answer. It depends on to what degree human hours of work by scarce engineering talent is a bottleneck for self-driving software development. Put another way, it depends on the degree to which real world driving data accelerates self-driving development.

Based on the most conservative possible interpretation of Tesla's production targets, if everything goes according to plan, Tesla's fleet of cars with Hardware 2 and future iterations of its full self-driving hardware will drive 1 billion miles by early 2018, 5 billion miles by early 2019, and 10 billion miles by late 2019. This real world driving data can only be collected because Tesla isn't waiting around for affordable LIDAR to be available.

If the acceleration caused by this data is greater than the delay caused by working on a harder software problem, then the net effect of foregoing LIDAR will be to accelerate self-driving development.

The financial impact

Tesla needs to crack self-driving at around the same time as larger automakers or before just to stay afloat. Self-driving is an existential risk for any car company that might fall behind the pack, Tesla included. Tesla has the additional disadvantage of a relatively small annual production volume, which limits fleet data collection capabilities and network effects in local autonomous ride-hailing markets.

If Tesla is one of the first companies to bring self driving to market, the rewards will be huge. If Tesla launches its autonomous ride-hailing service, the Tesla Network, at the beginning of 2020, I calculate that by the fourth quarter of that year it could be pulling in annualized earnings of $12.7 billion. That's without accounting for any other source of revenue, such as freight transportation or solar and energy storage products. For comparison, Alphabet (GOOG, GOOGL), the fifth most profitable U.S. company in 2016, earned $19.5 billion that year.

Conclusion

Tesla is betting on computer vision using cameras to be better than human vision. Computer vision already is better than humans at camera image recognition. While no robust public data yet exists on more complex, dynamic tasks like pedestrian or cyclist detection, anecdotal evidence like the above semi truck example are encouraging. Most importantly, computer vision is subject to a rapid trajectory of improvement.

For those reasons, I’m optimistic about Tesla’s bet on cameras.