Mindful Machines Original Series, Big Data: Batch Storage

S3? HDFS? Druid? Cassandra? MySQL? How do they and others compare for storing your batch data? Find out in this first part of the Mindful Machines series on Big Data.

Read More
/

Peapod: A Scala and Spark Data Pipeline and Dependency Manager

Peapod is a new dependency and data pipeline management framework for Spark and Scala. The goals is to provide a framework that is simple to use, automatically saves/loads the output of tasks, and provides support for versioning.

Read More

Data Pipeline and Task Management: The Unsolvable Problem?

There’s probably more well known data pipeline dependency management and scheduling frameworks than you can say in one breath. Is there a reason for that beyond mere not invented here syndrome?

Read More
/

In the Spirit of Thanksgiving

We don't take ourselves seriously but are two curious folks passionate about applying Machine Learning and Deep Learning to industry and sharing that knowledge with the broader engineering community.

Read More