Table schemas in data pipelines Spark: How to handle large, nested & growing ones
In this post, we describe how we built a pipeline for the type of “incoming data” situation, and how we came up with a good solution in the end.
Hadoop legacy
Hadoop vs data lake implementation problems.
Navigating data lakes using Atlas
Lineage of date in Apache Atlas
Diving in the data lake
The rapid growth of unstructured data is a serious business challenge for organizations.