Wikipedia Data in Apache Spark and Scala (Updated)

More than you possibly ever wanted to know about parsing various Wikipedia data sources in Spark and Scala.

Read More

The Flavors of Data Science and Engineering

Data Science means something different to everyone and is more of a marketing terms than a job description nowadays. That said, certain definition for it and the related disciplines are starting to emerge from what I've seen so I wanted to write down my perceptions.

Read More

ONC Patient Matching Challenge: Part 2

This is the second of a two part series on tackling the ONC Patient Matching Challenge In the first part we went over the background and high level approach while in this part we cover the matching engine that was built (and how to use it).

Read More

ONC Patient Matching Challenge: Part 1

This is the first of a two part series on tackling the ONC Patient Matching Challenge. The first part reviews background and high level topics while the second part covers the matching engine that was built (and how to use it). 

Read More

Wikipedia Data in Spark and Scala

More than you possibly ever wanted to know about parsing various Wikipedia data sources in Spark and Scala.

Read More

Spark EC2 Setup and Workflow

So how do we run and deploy code using Scala and Spark in EC2?

Read More

The Limitations of Apache Spark

... all that's Spark doesn't shine.

Read More

In the Spirit of Thanksgiving

We don't take ourselves seriously but are two curious folks passionate about applying Machine Learning and Deep Learning to industry and sharing that knowledge with the broader engineering community.

Read More