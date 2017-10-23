Google's car as it enters an intersection.

I have often written on how Tesla (TSLA) lags in the self-driving race. I did this by highlighting such obvious things as testing data. Or by showing how Tesla's promises of future proof of its prowess are something much easier than what Google was regularly delivering 1.5 years ago.

On top of this, I’ve written on how LIDAR (plus cameras and radar) provides a significant advantage vs. the vision-only approach (or vision plus radar) followed by Tesla.

However, even those detailed arguments don’t cover the entire self-driving development scene and how far behind Tesla is in it. So today I'm going to provide yet more self-driving information and how it makes Tesla's approach look.

A Bit Of Background

I comment a lot on my own articles. In those comments, I often put forward data and arguments which would be worthy of entire articles themselves. When it came to self-driving, I was particularly enthusiastic in doing so.

For instance, I put forth a framework derived from existing technology and public information which explained how a company can presently cover most of the ground toward delivering a self-driving car. This was my comment:

This makes the path toward producing a self-driving car be rather obvious (and it's also rather obvious that this is what Google (NASDAQ:GOOG) (NASDAQ:GOOGL) is doing): 1) Built a car/system that can drive on any environment without hitting anything, static or moving. The car/system can still be hit, and it wouldn't obey anything (any rule) on our real world. The car has awareness of all objects and their speeds and trajectories in the world. 2) Add location awareness to that car/system. The car now knows precisely where it sits in the world (and would still not hit anything, even if it didn't know any rules. It would basically be very conservative vs. anything it could hit, especially moving objects). 3) Add path awareness to that car/system. The car is now able to follow a precise path (along a road, though it being a road is a mere "coincidence"). Its location abilities are so precise that it can even follow a path within just part (a lane) of that road. The car still won't hit anything, but neither would it obey any rules, so it could run a red light and get hit (something, though, that it would also try to avoid if he could see the object/car on a collision course). The car also knows its own future path, so can better predict collisions, so can be less conservative vs. those possible collisions if they don't intersect its non-linear position in the future. 4) Add object recognition to that car/system. The car can now recognize many, many objects: signs, traffic lights, persons, cyclists, other cars, etc. It would still not hit anything. 5) Add expected behaviors to identified objects. The car now "knows" that some objects (cars, cyclists, etc) will follow a path defined by a road just like it does. The car now knows that if other cars are presented with a red light they will tend to stop - though it can still monitor their actual behavior to see if it fits with the expected behavior, and try to avoid someone not stopping). The car now knows that if a person looks to be on a collision course toward the road, it won't necessarily keep on that collision course. The car can be less conservative versus other cars, persons, etc., which would otherwise look to be on a collision course. The car can still react if those objects break their expected behavior. Moreover, the car still won't hit anything that it either recognizes or doesn't, because if it doesn't recognize an object it can still assume the worst, just like it did initially. This is important, object recognition will work very reliably (90%, 95%, 99%-plus) but it will still fail too often for the base system to rely on it. it's there to provide better, more natural, self-driving, but not as a centerpiece of it. 6) And so on and so forth. Also notice, a lot of what's above will actually have human input. The car cannot afford to run a red light, for instance. As a result, it's likely that all traffic lights in a given zone will be hard coded so that the car know precisely there's a traffic light in a given place, and what traffic light applies to its own path. This is clearly what Google is doing (there is proof showing as much). The same goes for which lanes follow where. There is a measure of automatism and a measure of hard coding. The car can still behave under circumstances where no hard coding is present, but it will simply turn a lot more conservative (defaulting to the behavior where it's following a path, but avoiding any chance of collision). Etc, etc. It's easy to see that following this strategy, one can arrive at a self-driving car for well-mapped areas. It's also easy to see that NOT following this strategy will have trouble achieving the necessary self-driving threshold. Now, NOT following this strategy can STILL produce incredible demos. But they're likely to stay demos, and not hit the necessary safety threshold (which is incredibly high).

If you follow the reasoning above, you’ll start understanding how a self-driving car comes about, at least one able to work within a defined area (Level 4 SAE).

What’s interesting is that subsequently (though not entirely as a surprise), Google confirmed the entire framework. It did so in its recent safety report titled “On The Road To Fully Self-Driving.” Pages 8-9 describe the system pretty succinctly. Pages 13-18 add further detail.

This framework as explained by me and now confirmed by Google brings us to today’s argument …

Two Cars Self-Driving Cars Walk Into A Bar...

Or, more exactly, two self-driving cars approach an intersection. One of them is a Tesla, the other is a Google/Waymo or General Motors’ (NYSE:GM) car (since it follows the same framework).

What happens next?

The Google car knows deterministically where it should stop (in the event there isn’t a car ahead of it). It knows where the traffic lights are (in 3D and to within inches at most, so it knows exactly where in the camera output to look for them). It knows what traffic light pertains to what exit lane from the intersection. And where each of the lanes from the intersection leads to.

It also knows, from my previous description, where each object in the scene is (not just relative to itself, but also absolutely in the world). Where each is heading and at what speed. For most of these objects, it will also know what they are (thus, being able to ascribe them probable behaviors).

Having this much knowledge about the world means making this self-driving car drive through the intersection safely is almost child’s play (it’s not, but it’s a solved problem even facing tremendously varying “object populations” on that same intersection).

You can:

Trust that the Waymo car will handle the intersection consistently.

You also can trust that even if the Waymo car is faced with unknown objects or changes, it won’t hit anything. The worst which can happen is that the car might fail to find a solution to navigate the intersection toward its destiny, if there are blocked roads, accidents, etc. Sometimes it will be possible to just re-route on the fly in light of such developments, sometimes possibly not.

What about the Tesla?

When the Tesla car, using its present design philosophy, reaches the intersection it basically knows nothing about the intersection. What it has to work with is:

A video and radar feed mostly on the conditions straight ahead (the other cameras aren’t relevant for this). Tesla has to attempt to transform this video feed into the equivalent of what Google has (regarding the detection of objects, even before their recognition), at the same level of reliability - which is unlikely.

A rough map of the roads into such intersection, which isn’t directly matched to the video+radar feed. This is in contrast with the Google car, which doesn't just have an exact 3D map of the environment, but also has the ability to correlate it to what it’s seeing by using LIDAR to precisely calibrate all sensors to the existing 3D map.

At this point, while the Google car has perfect situational awareness, the Tesla car still needs to interpret the whole scene to a level state-of-the-art scientific research hasn’t yet attained. Remember, we know it isn't attained because Google and others also carry the same sensors as Tesla, on top of their additional capabilities. Were the problem solved, and the excess sensors wouldn't be needed. It’s thus like we’re comparing a solved problem (Google) to something needing uncertain scientific advance to be attained (Tesla). Scientific advances do not come on a schedule.

This has tremendous implications. The Tesla car faces a massive number of opportunities to fail in its task. Take a simple traffic light, which is just one of the many problems the cars will face in such an intersection:

The Google car knows exactly where the traffic lights are, in 3D. It knows where to expect these traffic lights in its video feed. It doesn’t need to scan the whole scene to recognize traffic lights. It knows precisely where they are, and it knows precisely if it fails to find and read them. It also knows precisely what traffic light it needs to monitor, even if there are several of them. This is so because its map is deterministic in connecting each traffic light to each path, including the path the car knows it will need to take. As a result, the Google car can be expected to not miss or misinterpret a single traffic light.

The Tesla car doesn’t know where the traffic lights are. Indeed, it doesn’t even know if there are traffic lights, so if it fails to recognize them in whole-scene object recognition, it won’t know it missed them. You already have several possible points of failure at this point, as image recognition does not work 100% of the time. False positives and false negatives mean the car can either stop for a traffic light that doesn’t exist, or run through one it didn’t see. Moreover, even if the Tesla does see the traffic lights, it won’t immediately know which of those it’s seeing applies to its own path. Again, this can be solved to an extent, but provides another possible point of failure. The points of failure, for something as simple as a traffic light, will compound. That one can go through tens or hundreds of traffic lights quickly means even 99% reliability leads to frequent and important mistakes.

That’s just traffic lights. The same applies to nearly every object on the scene. The Tesla won’t be nearly as aware of where everything is and where it’s heading, simply because detecting objects and their positions is much harder using just video plus radar, than relying on LIDAR. And the odd thing here is that while using video, Tesla doesn’t even rely on stereoscopic cameras … making the problem harder even while relying solely on video.

It gets worse. Since the Tesla does not rely on high resolution, human-curated maps, it can’t know for certain where every lane is or where it heads, or which direction traffic flows on each of them. The Google car has this deterministically – so the chance of failure approaches zero. The Tesla car faces a high chance of failure on every decision … on deciding where the lanes are, on deciding where they lead to and how traffic flows on them.

There’s no way to over-emphasize just how different the two realities above are. The Google approach is resilient to edge cases. It’s intrinsically safe in its ability to not hit anything or miss any (hard coded) traffic sign or light indicating others have priority. It also can recognize its own limitations (due to not recognizing objects, expected traffic signage, odd situations like accidents creating blockages, etc), giving it the ability to fall back to its most conservative layer and not hit anything in the face of anything unexpected (an edge case).

The Tesla car will have trouble dealing with the most basic of realities. It will have trouble working even without edge cases, never mind when they strike. This trouble comes from the fact that it will often be relying on things which fall far short of 100% reliability, versus Google’s approach which, at the base, can provide such reliability (ok, 99.99999(9)%).

You might question whether Tesla isn’t doing the same, or can’t do the same. Well, we know for a fact presently it isn’t. Here’s why:

It lacks LIDAR, and thus lacks the deterministic detection and ranging capabilities of a car equipped with it. As a result, it can never be certain of detecting objects which pose a danger.

In lacking LIDAR, it also lacks the ability to precisely know positions, directions, speeds, even for the objects it can detect. These are connected. If it detects an object, it will still only have rough measures for direction and speed – not accurate enough to serve the purpose of predicting that object’s position in the future.

Worse still, in lacking LIDAR it also doesn’t have the ability to position itself exactly in the world. It’s hard to predict how your own trajectory through the world will evolve, if you can’t even establish your initial position with precision.

It gets worse. We know Tesla is not using high definition, human-curated maps. We know it because Tesla is selling its cars worldwide with the FSD (Full Self-Driving) promise. Yet, if the approach described above was being used by Tesla, any deployment would be slow and on a city-by-city basis – as reliable mapping was completed for each operational zone. It would make no sense to promise FSD worldwide when from the start of such a process it would take years to reach all places where such cars were sold.

Moreover, the HD human-curated maps need LIDAR to be explored fully, as only then can you match them with the necessary precision to the outputs from the other sensors (LIDAR, radar, video). Sure, there are still ways to improve driver assistance features using those maps in the absence of LIDAR as GM is showing, but those are then restricted to simpler driving environments and situations like driving on the highway, under human supervision.

What About AI, What About All Those Miles?

One thing you’ll notice from the discussion above is that AI (Artificial Intelligence, machine learning) isn’t as central to self-driving as one would think, going from the media.

The whole base layer which gives the Goggle cars such resilience to hitting other stuff, the 3D HD mapping, the precise positioning, the deterministic object detection (LIDAR), the knowledge of traffic signage, the path following, isn’t reliant on AI. Google cars fall back to this basic ability to not hit stuff upon uncertainty regarding the environment they face (they also don’t drive outside of their operational design domains, where they have all the necessary data).

Where AI makes an entrance is regarding object recognition or predicted object behavior. But again, the system is allowed to fail in these tasks, since it has a deterministic layer to fall to when such happens. As a result, it can afford to not recognize an object (it still knows the object is there) or its behavior (it still knows if and how it’s moving). Moreover, there’s no reason to think Google is anything but the leader when it comes to applying machine learning to these tasks. However, General Motors’ rapid ascent in its self-driving capabilities shows that not being the clear leader in this AI-dominated area is not a major obstacle.

Also, as you’ll notice, the tasks where AI is used are sub-tasks. There is no AI holistic driving going on. There is no AI to “teach how to drive."

This throws a cold bath on the Tesla self-driving myth. The myth relies on Tesla somehow becoming more proficient than others at machine learning, and also on Tesla having “more miles."

Having more miles, though, is nearly irrelevant. It is irrelevant because the underlying layer on a Google car intrinsically behaves as having driven an infinite number of miles. It just considers the physical world and doesn’t hit anything. One billion miles on top would not improve on this. An increase in LIDAR range, resolution and refresh rate would. That’s got nothing to do with miles driven, though.

As for solving the tasks performed by machine learning, both Google and GM seem to have had enough miles driven for those. The exceptions which remain require a human to notice them and are not directly linked to performing the tasks AI already performs even better (though it’s always desirable to do so).

An Aside On Traffic Lights And Demos

Self driving has an important characteristic which can deceive investors. That characteristic is that it's easy to build convincing self-driving demos, but hard to deliver actual, reliable, safe-enough, self-driving systems.

The traffic light problem described above affords us a good example. If you have a system able to correctly detect and recognize those traffic lights and implications 95% of the time, you'll be able to demo it working correctly 95 times out of 100. If it works 99% of the time, you can demo it working correctly 99 times out of 100. If it works 99.9% of the time, you can demo it working correctly 999 times out of 1,000. Yet, you cannot rely on a system which only handles traffic lights correctly 99.9% of the time.

Indeed, even if you had a system which only handled the traffic lights correctly 1 out of 100 times, you could still easily post a demo of it handling traffic lights. You just wouldn't be able to allow third parties to experience it. Which is funny, because no third parties have been able to experience Tesla's FSD development cars, but some have been able to experience GM's, and many more able to experience Google's.

You might even think that running a red light now and then, dangerous as it is, could be acceptable. That the average driver will run a red light one out of 1,000 times or 1 out of 10,000 times or something like that. But here's the problem: if a self-driving car fails at respecting a red light, it won't necessarily do it right after it turns red. It will fail at any time. And it's quite different to run a red light right after it turns red vs. running it at any other time.

Hence, handling traffic lights correctly 100% of the time is a necessity. It's also a necessity which the Google approach can provide and Tesla's can't. The Google car can provide it even in the event that it can't see the traffic light, because it intrinsically knows it's there (it's in the human-curated map), so it can act conservatively if it can't find it. The Tesla, on the other hand, can fail to recognize the traffic light (because object recognition is far from perfect, especially having to recognize the traffic light in a whole scene instead of just in a very specific place like Google does), and won't know it failed to do so (which makes the situation more dangerous).

Anyway, what does this mean? It means that both Google and Tesla can theoretically show perfect handling of the same intersection/traffic light situation. But Google's approach will work 100% of the time, intrinsically, and Tesla's won't. So while a demo can be similar, Google's approach can likely go all the way towards delivering a final product and Tesla's can't, no matter how convincing the Tesla demo might be (not that we have seen any, lately).

Conclusion

When it comes to self driving, it’s almost not fair. Not only does Tesla lag the leaders (Google/Waymo, GM, others) by a large margin, but it’s also trying to solve a much more difficult problem. The conclusion is necessarily that Tesla will either fail to deliver FSD based on its current approach, or be tremendously late in delivering it.

Of course, Tesla can still pivot on its approach. If Tesla does this, you’ll see a much larger emphasis on adding LIDAR (sold-state LIDAR) to its cars, along with more emphasis on Level 4 and human-curated high-resolution mapping. This would in turn imply a phased FSD rollout per geographic zone (since FSD using this approach can only be deployed on fully-mapped, human-curated areas). If this happens, though, Tesla also will create a large liability toward all those who thought Tesla’s previous approach was viable and paid for it ahead of delivery.