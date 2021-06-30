Olemedia/E+ via Getty Images

The market for Artificial Intelligence (AI) software is large and growing rapidly, but unlike many enterprise software markets, the market remains fragmented and no clear leaders have been established. This presents companies like C3.ai (NYSE:AI) with a large opportunity, if they can build a sustainable competitive advantage. It is unclear what the source of competitive advantage in AI software will be though, as many of the higher value parts of the data science pipeline have been open-sourced. Most vendors are focused on initiatives which increase ease-of-use and productivity like low-code / no-code, MLOps and autoML. The rapidly evolving AI technology landscape and lack of clear market leaders make this a difficult although potentially rewarding market for investors.

Market

The importance of AI to most businesses continues to increase, driven by ever growing volumes of data, improving algorithms and software that lowers the barriers to adoption. It is estimated that by 2022, 65% of CIOs will enable front-line workers with AI and that by 2025, 80% of CIOs alongside lines-of-business will implement intelligent capabilities to predict changing customer behaviors. According to a McKinsey study, AI could potentially increase global GDP growth by approximately 1.2% annually over the next decade, an impact comparable to that of other general-purpose technologies through history. Despite the large potential benefits, many organizations still find it difficult to effectively implement AI projects due to a lack of expertise.

Figure 1: Biggest Bottlenecks to AI Initiatives

(source: Created by author using data from Appen)

As a result there is a growing market for end-to-end machine learning platforms which lower the barriers to adopting AI. There are a large number of stages in a typical data science pipeline outside of building a model. These stages range across data ingestion, data preparation, model training and model deployment.

Figure 2: Data Science Pipeline

(source: Created by author)

An end-to-end machine learning platform must be capable of performing all of these steps and typically requires:

Data Integration - AI at industrial scale requires a unified, federated image of all the data contained in enterprise information systems. This can be addressed by the machine learning platform, although modern data lakes / data warehouses like Snowflake have effectively also solved this problem.

Data Persistence - Large and heterogeneous datasets (hundreds of petabytes or even exabytes) may require various databases (relational databases, key-value stores, graph databases, distributed file systems, etc.) depending on the application.

Platform Services - Including access control, encryption, ETL, queuing, autoscaling, multi-tenancy, cybersecurity, normalization, data privacy and compliance.

Analytics Processing - The volume and velocity of data acquisition requires a range of processing services like continuous analytics processing, MapReduce, batch processing, stream processing and recursive processing.

Machine Learning Services - Enable data scientists to develop and deploy machine learning models. This includes languages like Python, R and Scala and libraries like TensorFlow, Caffe, Torch, Amazon Machine Learning (AMZN) and AzureML (MSFT).

Data Visualization Tools - Including Tableau (CRM), Qlik, Spotfire and Oracle BI (ORCL).

Developer Tools and UI Frameworks - Application development frameworks and user interface development tools including the Eclipse IDE, Visual Studio, React, Angular, R Studio and Jupyter.

Machine learning platforms must also be open and extensible to remain relevant in a rapidly evolving technology landscape. Tools that are state-of-the-art now are likely to be obsolete in 5 to 10 years' time and unless machine learning platforms allow for the modular replacement of obsolete components, they too will become obsolete.

A common approach for many companies has been to build their own pipeline using a combination of point solutions and open-source tools. Using structured programming to build applications by integrating various open-source components and cloud services can be slow, costly and ineffective though. Due to daunting technical requirements, among other reasons, a recent study has shown that 84% of enterprise AI deployments have not scaled.

According to a McKinsey survey, only 55% of organizations believe their automation programs have been successful. Slightly more than half of respondents also stated that their automation programs were more difficult to implement than expected. This difficulty is often due to common problems encountered when developing AI applications, including:

Complexity – When data science pipelines are constructed using structured programming a large number of APIs are needed, creating a complex system of connectors. C3.ai believes that the number of programmers capable of dealing with this level of complexity is small.

Brittleness – The large amount of inter-dependency between modular point-solutions can make the system brittle.

Data Integration - Using structured programming and an API-driven architecture can be time consuming. C3.ai believes this is often the primary reason that large amounts of money and time is spent on AI projects with little progress being achieved.

Legacy big data architectures like Apache Hadoop were designed to overcome some of the issues of implementing data analytics at scale. Apache Hadoop is a collection of open-source software utilities that facilitate solving big data problems using distributed resources. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop allows data to be stored in its native format and can be used as a data lake.

The problem with Hadoop, though, is that its native query mechanism is MapReduce code, rendering it incompatible with the massive product and skillset ecosystem around SQL. Many users were likely disappointed with the outcomes of their Hadoop projects due to a shortage of users with the necessary skills, slow processing speeds and high costs when used with insufficiently large data sets. The popularity of Hadoop is now declining as a variety of cloud deployed solutions have been launched offering the same functionality in a more efficient and easier to administer and secure manner.

AI is a broad category that is increasingly touching every part of the business landscape, as such the market for AI software is large but also difficult to accurately categorize. C3.ai have grouped their market opportunity under the following categories:

Enterprise AI Software

Enterprise Infrastructure Software - Application Development, Infrastructure, and Middleware; Data Integration and Quality Tools, and Master Data Management Products.

- Application Development, Infrastructure, and Middleware; Data Integration and Quality Tools, and Master Data Management Products. Enterprise Applications - Analytics, Business Intelligence and Customer Relationship Management

Enterprise AI software is their core market and while the infrastructure software and applications markets are large, C3.ai will likely find it difficult to gain significant share. C3.ai estimates that their addressable market is currently 174 billion USD compared to a global IT market of over 2.3 trillion USD. I believe this indicative of an overly broad categorization of their market opportunity which is not indicative of their realistic revenue potential. It is difficult to imagine a niche software platform like C3.ai's competing for nearly 10% of all IT spending.

Table 1: C3.ai Total Addressable Market

(source: Created by author using data from C3.ai)

In comparison, Alteryx (AYX) estimate that their total addressable market is approximately 49 billion USD, a much smaller figure that probably more realistically indicates the opportunity for a data science platform. Snowflake (SNOW) estimate that the addressable market opportunity for their Cloud Data Platform is approximately 90 billion USD, indicating that there is also a large opportunity in adjacent infrastructure markets.

While the market for AI software is large and growing rapidly, it is often considered non-critical during times of difficulty. For many organizations the pandemic appears to have accelerated the urgency of AI initiatives, but for many others it has delayed the deployment of AI projects. In a survey conducted by Appen, 41% of participants said that the pandemic has accelerated their AI strategy while for 31% it has caused delays. 36% of survey participants said that the pandemic has accelerated AI deployment, while for 35% it has caused delays.

Figure 3: Impact of COVID-19 on AI Initiatives

(Source: Appen)

The impact of the pandemic can be seen in the hiring data for jobs requiring machine learning competency. Hiring was relatively weak throughout 2020 but has surged since the rollout of vaccines in 2021.

Figure 4: Hiring Trends for Machine Learning

(Source: Created by author using data from Revealera.com)

Competitors

C3.ai believes its primary competition is from customers internally developing AI applications using a combination of open-source and proprietary software. Internally developing AI capabilities tends to be a costly and complex software engineering undertaking which runs the risk of failure and may take a significant time period to realize economic returns. Deploying exclusively or almost exclusively to a third-party platform is not the ideal model for every enterprise though. Many companies will be reluctant to create that type of vendor lock-in for their data and data analytics. There are also data sovereignty laws which need to be considered for cloud deployments.

C3.ai does not believe there are currently any end-to-end enterprise AI development platforms which are directly competitive with its products. AI software is a crowded and competitive market though, so it is not clear whether C3.ai believes that this is because existing solutions do not cover the entire AI lifecycle, because existing solutions do not meet C3.ai’s definition of enterprise or they do not integrate capabilities as tightly as C3.ai.

To some extent this is also because most of C3.ai's platform is not focused on AI functionality. The organizations and applications that C3.ai mentions as competitors or peers are indicative of the true nature of their platform. They primarily compare themselves to analytic databases and IoT platforms, which indicates their focus is on AI infrastructure and operationalizing AI. C3.ai include the following as enterprise AI applications:

Cassandra – open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data.

– open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data. DataStax – database management vendor providing Apache Cassandra as a service.

– database management vendor providing Apache Cassandra as a service. Hadoop – collection of open-source big data software.

– collection of open-source big data software. Cloudera (CLDR) – enterprise data cloud company that provides data management and machine learning based on Apache Hadoop.

(CLDR) – enterprise data cloud company that provides data management and machine learning based on Apache Hadoop. AWS IoT – provides cloud services that connect IoT devices to other devices and AWS cloud services.

C3.ai references GE Predix (GE) and IBM Watson (IBM) as functionally equivalent competitors, but also notes that they do not encounter them in competitive situations anymore. GE Predix is an industrial IoT platform from GE Digital that provides edge-to-cloud data connectivity, processing, analytics and services. IBM Watson is a question-answer system initially developed by IBM to answer questions on Jeopardy. The technology has since found commercial application helping organizations predict future outcomes, automate complex processes and improve employee efficiency. Watson includes tools for building models, pre-built applications to accelerate time to value and access to an ecosystem of partners.

Machine Learning Platforms

There are a number of popular end-to-end machine learning platforms which range from targeting experts to citizen data scientists. The more popular data science platforms tend to be legacy solutions and as such are probably not direct competitors to C3.ai. There are a number of newer platforms which are growing rapidly and potentially present a more significant threat to C3.ai though. This includes companies like Databricks, DataRobot, H20.ai and Dataiku. In particular, Databricks has performed well recently and raised funding at a 28 billion USD valuation in 2021.

Analytic Databases

Analytic database vendors like Snowflake represent a serious threat to downstream companies focused on data analytics. Database vendors are in a strong competitive position due to data gravity and those companies that choose to forward integrate could potentially capture significant value if they can create tightly integrated data management and data analytics solutions.

Applications

Software application vendors like Salesforce are not necessarily direct competitors to C3.ai, but they have the potential to significantly reduce the size of their market by introducing AI functionality into their applications to ensure they maximize the value of their service. Applications have access to proprietary data and can potentially create value by tightly integrating AI functionality with the core services of the application. Salesforce has already made significant headway in this direction with their Einstein service and acquisition of Tableau.

AI Point Solutions

There are a large number of software point solutions focused on various use cases in the data science pipeline. C3.ai is trying to capitalize on the fact that many organizations find it difficult to create an end-to-end pipeline at scale from these point solutions. These solutions are likely to continue improving over time though and interoperability between solutions is likely to improve. In addition, many machine learning platforms are currently active acquirers of point solutions to build out their own capabilities. It is unlikely that C3.ai can compete with all of these point solutions on performance, so they will need to ensure their platform remains compatible with popular tools.

Hyperscale Cloud Computing Vendors

AWS, Azure, IBM and Google (GOOG) all offer elastic cloud computing platforms and a growing library of microservices that can be used for data aggregation, ETL, queuing, data streaming, MapReduce, continuous analytics processing, machine learning services and data visualization. Their deep expertise in machine learning has generally been developed internally to support the functionality of their own businesses and in recent years has come to involve everything from developing custom hardware, data management solutions and machine learning libraries to launching commercial services based on these capabilities. These companies may be happy for the AI market to be dominated by companies like Snowflake and C3.ai as long as the workloads remain in their clouds, but this seems unlikely to me. As the AI market continues to grow, I believe they will continue committing resources to developing AI solutions to differentiate their cloud platforms and capture as much of these higher value activities as possible.

Edge Networks

Edge computing companies like Cloudflare (NET) are enabling applications to migrate towards the edge with their programmable edge networks. As the capabilities and popularity of their services increase, it creates the potential for AI to migrate towards the edge as well (edge of the network rather than edge devices). It is estimated that by 2025, 75% of enterprise-generated data will be processed at the edge. After adding programmability and data storage, Cloudflare has more recently extended their Workers platform to include support for NVIDIA GPUs and TensorFlow, enabling AI at the edge. It is currently unclear how this will play out, but it is another area where a significant amount of AI may be implemented without the involvement of platforms like C3.ai.

IoT Vendors

The IoT landscape is relatively immature and its final form will depend on how technology evolves going forward. A number of hardware companies are developing low power consumption processors to enable on device machine learning. In the case of neuromorphic computing, this would include training as well as inference. Neuromorphic chips implement AI using the configuration of the hardware rather than software. As such, AI implemented on-device using neuromorphic chips would likely occur separately to a platform like C3.ai. While it is unclear how widely neuromorphic chips will be adopted at this stage and it is likely to only be applicable in a sub-set of use cases, it represents another threat to the size of C3.ai’s addressable market.

Conclusion

The combination of a rapidly evolving AI landscape and C3.ai's unclear competitive position makes predicting the company's future difficult. I would not be surprised to see C3.ai develop into a leading cloud software company with a market capitalization many times its current value. I also would not be surprised if few people remember C3.ai in a decade’s time. This makes investing in C3.ai a particularly speculative undertaking.