Investors should look carefully at Cloudera's long-term prospects and take a good look at what competitors large and small have to offer.

Cloudera had a $4.1 billion valuation in 2014 after Intel invested $740 million in the company. It's worth questioning whether it is worth as much today.

Cloudera, a tech company that is best known for its products around open source big data crunching software Hadoop, filed for a $200 million IPO March 31, 2017.

Cloudera (Pending:CLDR), which creates software and tools to help companies process and perform analytics on big data, filed its long awaited IPO last Friday. In its S-1 prospectus summary, Cloudera calls itself a "leading modern platform for data management, machine learning and advanced analytics". In press releases in, and prior to, December 2016, Cloudera identified itself as a "global provider of the fastest, easiest, and most secure data management and analytics platform built on the latest open source technologies." In 2015, Cloudera described itself as "the leader in enterprise analytic data management powered by Apache Hadoop."

(Hadoop is short for Apache Hadoop, an open source data processing framework; it is housed at the Apache Software Foundation and uses simple programming models to distribute processing of large data sets across clusters of commodity computers. The Apache License allows anyone to freely use, modify, and distribute any Apache licensed product.)

Bait and Switch?

The disparate ways in which Cloudera has identified itself could be dismissed as matter of semantics, but companies generally take great care in how they label themselves, especially around an IPO.

So it's worth noting that Cloudera, which for a long time made a big deal around the fact that Hadoop co-founder Doug Cutting was among its employees and claimed that its engineers had contributed more code to Apache Hadoop related projects than its competitors Hortonworks (NASDAQ:HDP) and MapR, doesn't want, or can't afford, to self-identify with "Hadoop".

There is a reason for this, of course. It could be that the hype around Hadoop hit its peak in 2014. At that time, analysts had predicted a $50.2 billion valuation for the market by 2020. Hortonworks went public that year; its shares soared to $26.48 after opening at $16, giving it a unicorn-like status.

Cloudera, which had just received a $900M influx of cash ($740M of it from Intel (NASDAQ:INTC)), didn't need to raise funds at the time, missing the opportunity to pull the IPO trigger when public investors, who believed that big-data-crusher Hadoop would be a big money maker, were eager.

They are not as eager anymore. Hortonworks' stock nose-dived since then, closing at $10.77 yesterday.

So in selling investors on its IPO, Cloudera has little choice but to look at what else it can offer and rebrand.

The company is actively ridding itself of its identity as a Hadoop vendor. Last month, just before announcing its IPO plans, it changed the name of the user conference it co-sponsors with publisher O'Reilly from Strata+Hadoop World to Strata Data and added machine learning, among other topics, to the agenda.

Smoke and Mirrors?

Cloudera also dropped the term "open source" from its self-description, though it mentions it more than 190 times in its S-1 and clearly states that its platform leverages 26 different open source projects.

In the S-1, Cloudera writes, "we committed to the open source community. We create and contribute to projects and we work with the global Apache Software community".

It also warns potential investors of the inherent risks involved in building and betting on a business based on open source:

"If the open source data management committees and contributors fail to adequately further develop and enhance open source technologies … then we would have to rely on other parties, or we would need to expend additional resources, to develop and enhance our platform." "Our solutions depend upon the successful operation of open source software in conjunction with our solutions; any undetected errors or defects in this open source software could prevent the deployment or impair the functionality of our solutions."

Cloudera also states that if something goes wrong with one these open source projects, its engineers may not have the expertise to fix it.

Competition: Apache Hadoop Platform Providers

Cloudera's closest competitor before its January machine-learning pivot was Hortonworks.

Hortonworks went public in 2014 when the market and customers were high on Hadoop. At that time, the Hadoop market was expected to rise to a staggering $50.2 billion valuation by 2020, so it's no wonder that Hortonworks' shares soared to $26.48 after opening at $16.

Hortonworks' stock closed at $10.77 yesterday.

Cloudera claims $261 million revenue for 2016, outpacing Hortonworks which brought in $184.5 million. But the companies aren't of equal age. Cloudera was founded in 2008, Hortonworks was founded in 2011. Cloudera has had a three-year head start in not only building its pipeline but also its revenue stream since it sells its products by subscription.

When it comes to market valuation, Hortonworks trades at 3.3x its full-year 2016 revenue of $184M. Applying the same multiple to Cloudera's 2016 revenue would imply a valuation of approximately $866M.

For Cloudera to justify the $4.1 billion valuation it had in 2014, it would need a trailing price-to-revenue multiple of 15.7x.

Investors might also want to consider that many vendors in the Hadoop ecosystem subscribe to the Open Data Platform Initiative (OPDi), an agreement to what Hadoop is and isn't, and build their products around that definition. Cloudera is not one of them. There is a belief, though it may be anecdotal, that some Hadoop-buyers will only want to buy ODPi compliant solutions, which could rule Cloudera out as a contender and tip the hat toward Hortonworks.

Competition: Apache SPARK and Machine Learning Platforms

On the machine learning front, Cloudera would compete with the likes of Alphabet's (NASDAQ:GOOG) (NASDAQ:GOOGL) Google, Microsoft (NASDAQ:MSFT), IBM (NYSE:IBM) and others which have had armies of researchers working in labs for years. And they have produced consumable products. Microsoft, for example, hosts a machine learning studio on its Azure cloud. Google offers its Cloud Prediction API which provides a RESTful API to build Machine Learning models. IBM launched a Machine Learning product in February.

It's worth noting here that while Cloudera credits itself as an early adopter and expert at an open source technology called Apache Spark, it was created by the principals of Databricks who offer products and services around it. In addition, in 2015, IBM committed 3,500 researchers to Spark development and education. Cloudera doesn't even have that many employees.

Risky Business?

There are a number of risk factors investors should consider:

Why is Cloudera choosing now to IPO? If machine learning is its future, wouldn't it be better to take some time to establish a foothold in that market first?

Do you believe that enterprise technology executives will adopt Hadoop at a more aggressive rate than they have until now?

Does Cloudera know something about Apache Spark and Machine Learning that its larger and better funded competitors don't?

Is Cloudera selling itself as a machine learning expert now vs. later, when its portfolio is more mature, to boost its valuation?

Disclosure: I/we have no positions in any stocks mentioned, and no plans to initiate any positions within the next 72 hours.

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.