Starburst and Presto: with Stellar Velocity
For a while now, we kept hearing the name Presto. As we spoke to engineering communities at Facebook, Uber, Airbnb, Twitter, Comcast, or Nike this open-source project always came up. All of the tech cognoscenti seemed to be using Presto. When something reaches this kind of adoption, we knew we had to dig deeper, so we busied ourselves with research to learn more.
Back in the early years of this decade, the conventional wisdom was that Hadoop would become the “data lake” for all data-intensive businesses. As the name implies, a data lake is a large repository of data, but, in order to derive business insights from it, one has to ask questions of that data. Often, these are recursive questions. The term of art for that process of asking questions is “interactive queries.” The most common language used for these queries today is SQL. Hadoop’s native SQL interactive query solution is an open-source project called Hive. Not surprisingly, Facebook has one of the largest deployments of Hadoop in the world. The problem for Facebook back in 2011 was that Hive was dreadfully slow. They wanted to be able to run a much larger number of queries much more quickly. So, three brilliant Facebook engineers set forth to develop a new, high performance, distributed SQL query engine for big data. These engineers were Dain Sundstrom, Martin Traverso, and Dave Philips and so Presto was born in 2012.
While Presto was developed for querying Hadoop, the three developers built it to be independent of the data lake or the data store that it ran on. This meant that Presto was performant not just on Hadoop, but on a variety of data sources including AWS S3, Postgres, MySQL, Cassandra, Kafka, MongoDB, and many others. In 2013, Presto was open-sourced and it quickly gained adoption. In fact, by 2014, Netflix disclosed that it was using Presto on 10 petabytes of data stored in S3.
As successful open source projects do, Presto continued to percolate and find eager adopters among the best and brightest in the industry. Its reputation as the top query engine as well as that of its three authors continued to grow quietly but steadily over the next half a decade.
Enter Starburst
Amongst the early enthusiasts of Presto was a team of entrepreneurs led by Justin Borgman. Justin had sold his first company, Hadapt, to Teradata. The acquisition was an ill-fated attempt by Teradata to breathe big data life into its legacy data warehouse. Justin and his team were trying to find a new mission inside of Teradata when they stumbled upon Presto and quickly realized its potential. First, within the confines of Teradata, the group began to commercialize Presto by adding enterprise features to what was Presto code base that was principally designed for use inside of Facebook. After encountering encouraging early success, Justin and his team of leading Presto developers decided that they would hang up their own shingle again, and Starburst Data was born.
Open Source Challenges
At Index, we have invested in many successful open-source companies - MySQL, Elastic, Confluent, Hortonworks and Kong to name a few. It has not always been smooth sailing - we’ve learned a lot of do’s and don’t’s about these open-source businesses. One of the most significant lessons came from the Cloudera - Hortonwork conflict that developed over the years. Cloudera and Hortonworks were companies founded from the same gene pool of talent and common software ancestry. Both companies were founded by the core teams at Yahoo that developed and nurtured Hadoop. Because of a variety of strange circumstances, two companies were formed to commercialize Hadoop - each of which had its own Hadoop software distribution. The software was, in fact, largely similar. Not surprisingly the two companies became bitter rivals in the market and competed - while never monopolizing - the moral authority over Hadoop. Worse, the two companies continued to splinter a codebase that deserved to be united.
The end result was that both companies suffered. Wars of words and wars of price eventually dragged down both companies. Cloudera and Hortonworks did merge eventually, but at that point, the damage was done and newer data warehouse technologies had become the “shiny new objects.”
Lessons Learned
The Starburst crew and the Presto trio knew each other well and had great respect for each other. But, despite the enticements from Starburst to join their team, the gravitational pull of Facebook kept Dain, Martin, and Dave in Menlo Park developing Presto deep within that organization.
The timing on our end was perfect, as we were looking to invest in a Presto company. A good Index friend, and world-class software architect, Henning Schmeidehausen at Zuora, introduced us to Dain & Co while they were still at Facebook. We tried several times to encourage them to leave Facebook to no avail. They just didn’t seem too interested in the idea of starting a new company. Meanwhile, Justin kept plugging away - nearly bootstrapped - at building a book of business for Presto. In fact, Starburst’s customer logo slide was starting to look increasingly impressive.
Much as we were tempted to pursue an investment in Starburst, we also knew that without the core founders of the Presto project all in one company, we risked a replay of the Cloudera - Hortonworks rivalry. So, things remained unchanged -- status quo for months.
Power of Networks
One day, early this summer, one of our rockstar Founder / CEOs, Jay Kreps of Confluent, texted us letting us know that Dain, Martin, and Dave were finally seriously considering joining forces with Starburst. Jay had huge respect for the trio and said something along the lines of, “you should invest in anything they do.” These were meaningful words from a person of Jay’s caliber.
So, we went about trying to help bring about a successful union. The task at hand was to have the Presto founders join forces with the Starburst founders. Our great fortune was that Justin and his team are immensely wise and able to put aside ego’s and short term personal gain. We were excited when they came to terms with Dain, Martin, and Dave. The end result was a reborn Starburst - a company constituted of the entrepreneurs that seized the commercial opportunity of Presto and the genius founders who invented it in the first place.
Presto Forward
Fast forward to 2019, and Presto’s momentum is clear and present. The software is already used in prominent companies like Facebook, Airbnb, Netflix, Uber, Twitter, Atlassian, Nasdaq, Nike, CapitalOne, Comcast, and many more. At Facebook, Presto runs 30,000 interactive analytic queries, processing one petabyte of data daily against the corpus of a 300PB data warehouse. And, it can run on both relational and non-relational data stores. In a world where businesses of all types are finding competitive advantage through the intelligent use of data, Presto is an extraordinary asset.
However, software alone does not become a business without great entrepreneurship. Justin and the exceptional team of engineers he has brought with him are on their second rodeo together and this time they have some serious raw materials to work with. They are joined by a small but powerful team that they have cultivated over the last two years. Dain, Martin, and Dave have finally landed in a venture that will spread the gospel of Presto to the world.
We, at Index, are privileged to be their partners in this adventure. Starburst is in its early days. Like with any other company, we expect that there will be many highs and lows, but we are confident that we have invested in one of the great data companies of our time.
Additional Reading:
TechCrunch: Starburst raises $22M to modernize data analytics with Presto
Published — Nov. 20, 2019