Insights from the Girard School of Business and the School of Science and Engineering
Many modern businesses have mastered the art of collecting data. Their focus is now on finding ways of putting that data to use to support business strategy and goals. Many businesses find themselves drowning in data, much of it in vast data lakes. Putting that data to use ranks among the chief goals of many professionals work in data science and business analytics. Companies need their expertise to leverage this data.
Business leaders understand the value of data lakes. A 2018 survey of 500 business users across many industries was conducted by Eckerson Group. They found that while the value of data lakes is evident, they must be coupled with expertise, analytics, and business intelligence tools.
What Is a Data Lake?
A data lake serves as a massive repository for all data an organization gathers. This includes structured and unstructured data. In other words, data goes into a data lake “as is.”
Having enormous amounts of raw data is an asset. In theory, results from the analysis of this data could produce better business decisions. And businesses that extract value from data lakes “will outperform peers,” according to Amazon Web Services (AWS).
AWS sites a 2017 study from Aberdeen that showed businesses that implemented data lakes outperformed similar companies by 9% in organic revenue growth.
However, there are issues in transforming this raw data into useful information.
Challenges With Data Lakes
A data lake should be not confused with a data warehouse. Data in a warehouse is collected using a pre-set structure and schema. This data is usually related to business operations, such as information collected from transactional systems.
A data lake has no structure at all. It contains information not found anywhere else. Much of that data is not directly related to business operations. This includes emails, chat logs, images and videos, social media content, and data from the Internet of Things.
Applying structure to that amount of data takes time and effort, but it’s necessary to produce trustworthy information that even people without coding skills can understand.
Possible Solutions for Data Lakes
As the Eckerson survey shows, applying the right analytics tools to data lakes helps users extract value. In that survey, 61% of respondents said the right analytics tools allowed those without coding skills to “author and edit reports and dashboards.”
Data scientists and people with high-level degrees in business analytics can help create these systems. They can also provide guidance on another key issue involving data lakes: data governance.
Organizations often have no high-level strategy in place for how to use data. This can lead to people outside of the IT department accessing data for different uses with no oversight (a situation known as “Shadow IT”). Data governance sets sound policies on how data is collected and used.
Another option is to create machine learning programs that find value in vast data lakes. If programmed with the ability to search for trends and relationships between diverse sets of data, machines can work much faster than humans in extracting value from data lakes.
Data lakes present both opportunities and challenges for businesses. Data scientists and business analytics professionals are central to finding a solution to the challenges and methods for capitalizing on the opportunities.