FrankNeff.com
  • Blog
  • Categories
  • Tags
  • About
  • Contact

Showing posts from spark

  • Home
  • /   Tags
  • /   Spark
Using Shapeless for Data Cleaning in Apache Spark
  • Frank Neff Frank Neff
  • 02 Mar, 2018
    • spark
    • typelevel
    • shapeless

Using Shapeless for Data Cleaning in Apache Spark

When it comes to importing data into a BigData infrastructure like Hadoop, Apache Spark is one of the most used tools for ETL jobs. Because input data – in this case CSV – has often invalid values, a data cleaning layer is needed.Most tasks in data cleaning are very specific and therefore need to be implemented depending on your data, but some tasks can be generalized…

  • About
  • Source
  • Sitemap

All rights reserved – Copyright © by Frank Neff