![Using Shapeless for Data Cleaning in Apache Spark](/images/post/2018-03-02-generic-derivation-for-spark-data-cleaning_huf204ca923e9c13678a0936e032a01eeb_111446_1110x0_resize_q95_box.jpg)
Using Shapeless for Data Cleaning in Apache Spark
When it comes to importing data into a BigData infrastructure like Hadoop, Apache Spark is one of the most used tools for ETL jobs. Because input data – in this case CSV – has often invalid values, a data cleaning layer is needed.Most tasks in data cleaning are very specific and therefore need to be implemented depending on your data, but some tasks can be generalized…