Categories
Uncategorized

AWS Lake Formation simplifies, automates Data Lakes for Analytics

Amazon Web Services Inc. (AWS), has made AWS Lake Formation available to all organizations, simplifying and automating the creation and management data lakes.
Data lakes are part of the Big Data analytics movement. They allow you to store data in a variety of formats and types, both structured and unstructured. This allows you to use them for business-driven analytics that is increasingly supported by machine learning.
However, in most cases, there are many manual steps required to create and manage data pools. AWS Lake Formation is designed for tasks such as cleaning, cataloging, and cataloging data while also making it available for analytics.
AWS stated in a press release that AWS Lake Formation “significantly simplifies the process and removes all the heavy lifting involved in setting up a data lakes.” AWS Lake Formation automates manual and time-consuming steps like provisioning and configuring storage and crawling the data to extract schema tags and metadata tags, optimizing the partitioning of data, and transforming data into formats like Apache Parquet or ORC that are ideal in analytics. AWS Lake Formation cleans up duplicates and improves data quality and consistency using machine learning.
[Click on the image to see a larger view.] AWS Lake Formation (source : AWS). The new service that is available in preview can be used with many other AWS services for analytics or other tasks. Amazon S3 buckets can be used for storage. For example, Amazon Redshift (data warehouse), Amazon Athena (“serverless interactive query service”) and AWS Glue (“extract, transform and load [ETL]) service). Over the next few months, support for Apache Spark analytics with Amazon EMR and Amazon QuickSight will be available.
AWS Lake Formation is available in the US East (N. Virginia), US East(Ohio), US West, Oregon, Europe (Ireland), Asia Pacific (Tokyo), and US East (N. Virginia). There are no additional charges for AWS Lake Formation.
A blog post explains how to set up a data pool using the new service. More information can be found on the “Data Lakes and Analytics on AWS”, the “What Is a Data Lake?” site. article, a “Data Lake Foundation on AWS” quick start, and the AWS Lake Formation website which includes a FAQ.