Startup Snowflake Launches Cloud Based Data Warehouse Service on AWS

Bob Muglia, a former Microsoft executive, has launched Amazon Web Services Inc. (AWS), a cloud-based data warehouse that it claims will significantly expand the capabilities of traditional analytics platforms.
Muglia described Snowflake Elastic Data Warehouse by Snowflake Computing, as a Big Data platform that was built from scratch and specifically designed to run in the public clouds. It is the latest effort in data warehousing.
Muglia joined Snowflake in 2012, which was founded last year. Altimeter Capital made a $45 million C Series investment in the company, which was founded last year by Muglia, the Oracle RAC lead architect. This brings the total amount of investment in the company to $71 million.
Snowflake claims that its offering is different from other high-performance data warehouse technology because it was built from scratch and can run in the public clouds. Officials claim that the cloud storage makes the service more flexible and cost-effective. Muglia stated that Snowflake has encountered many customers who have captured machine-generated semi-structured data using Hadoop clusters. However, they have had to transform the data into a format that a traditional warehouse can handle. This can allow a business analyst connect to the data with common BI tools like Excel or Tableau.
Muglia stated that Snowflake can load semi-structured data directly from Snowflake, eliminating the need to go through these steps. Business analysts can immediately run queries against the data using the tools they know. This is a huge time and complexity savings. Traditional data warehouses are subject to loss due to data transformation. Some data is not available due to the fact that it has not been transformed. Also, data warehouses are limited in their ability to handle a small portion of the data. One customer had only one week of data loaded into their data warehouse. With Snowflake, they could store all historical data and could run queries on data as old as three months.
Muglia stated that it is not a problem due to the infinite storage available in the backend repository of Snowflake — Amazon Web Services storage. Snowflake customers are using hundreds of terabytes S3 data. Muglia stated that the service can scale easily to multiple petabytes stored in S3.
Muglia stated that there is no limit to how much data you can store. It would be expensive and time-consuming to run a query that required the scanning of a petabyte worth of data. While the laws of physics still apply, one key thing is that query processors can prune the data based on the way they store it and the information they gather about it. If you had five years worth of data, let’s say 5 petabytes, and you wanted to query one week’s worth of this data, regardless of the week, we could simply select the data we needed. Let’s say that we only have half a Terabyte of something. We could eat through it quickly and return the results fairly quickly, even though we may have 5 petabytes.
Muglia was asked if Snowflake would use AWS’s Glacier archive to store data in situations where customers have petabytes of data. Muglia stated that it is probably the most cost-effective place to store data. It’s a lot cheaper than enterprise-class storage and it costs a lot less to store it on nodes (which is what you’d do with Hadoop).