Peapod is a new dependency and data pipeline management framework for Spark and Scala. The goals is to provide a framework that is simple to use, automatically saves/loads the output of tasks, and provides support for versioning.Read More
I feel terrible dragging LadyDi into another controversy. But this is not so much a poor reflection on her as it is me. You see, I have spoken endlessly about the merits of Scala and Functional Programming. For a time, I was even the most viewed writer on Scala on Quora because I wrote so much on the topic!
Unfortunately I have come to realize I have been a lot of talk and very little action. Whoops.
Recently, I have come to take a good hard look at my code as well as libraries I have released into the open source community- and you know what? Most of it is not functional. In most places, I just use (abuse really) the functional properties somewhat enforced in Scala to make my OOP more stable. But that's really just about it. In reality I have written very little functional code, and no strictly functional program. As evident from my passionate remarks on the topic, I am obviously really excited about FP, and even understand how to go about it; but somehow, when push comes to shove, I remain stuck in my un-functional ways.
Upon speaking to other fellow engineers at work and elsewhere, I have come to see that there are many strong, experienced, creative, innovative engineers who have trouble picking up Scala as a functional programming language. But I had trouble understanding why it was so, considering that many concepts in Functional Programming have very long, deep roots in the history of computer science was well as analytical philosophy.
Let's take a look at one of the most critical concepts in Functional Programming: referential transparency. Its origins date back to Bertrand Russell's Principia Mathematica. Sure, at first glance you may dismiss this as just another old part of "mathematica", but William Van Orman Quine adopted this concept in his Word and Object published in 1960 where he coined the term "referential transparency".
RT was then thrusted into the spotlight of computer programming just 7 years later in Christopher Strachey's monumental piece Fundamental Concepts in Programming Languages (1967).
And that's just RT. I have not and will not even begin to talk of lambda calculus - doing so would derail this post completely.
So let me reel this post back in. If FP has such footing in both Mathematics and Computer Science, why do most engineers have such a hard time adopting it in how they think about code?
Well, to start, all the above are mainly theoretical concepts. Software Engineering is by nature an applied field so most software engineers are actually employed in industry rather than academia. And so most folks usually don't go beyond a Bachelors or Masters degree (which is a lot given that many of the most dominant and influential figures in the Tech Industry are college dropouts). A core part of almost any undergraduate CS curriculum is a subject typically titled "Data Structures and Algorithms". It is quite the fixture actually. Many of the most prestigious employers in technology (Google, FB, Amazon, ... all the way to Slack, etc) outright inform candidates - regardless of seniority - to expect some questions directly from "Data Structures and Algorithms". In fact Google is notorious for turning renown experts in fields it was hiring for because they failed the DSA part of this rigid interview process.
So what does a typical DSA course consist of and how is this pertinent to our topic here?
It's actually most made up of search and sorting algorithms, problem-solving strategies, and space and time complexity problems. Here is the root of our headache. It is not really the content of the course, but how fundamental the course has become to landing a job in industry and having a career.
Let's take just two minor topics that feature prominently in any DSA course:
- hash tables
- the QuickSort algorithm (or MergeSort or any D&C algorithm really)
From a functional outlook, any change to a hash table would require creating a copy of the original hash table. This is obviously very inefficient and fundamentally goes against the underlying premise of a course like DSA - solve problems and efficiently so, even if you are going to trade-off reusability and safety. As far as sorting is considered, again we are faced with the matter of efficiency - as in place modifications are axiomatically impossible from a functional perspective.
So from a very impressionable point in our time as software engineers we are repeatedly told to think about programs almost strictly in terms of efficiency gains and trade-offs; so much so that "good code" in some places has become synonymous with one that provides "efficient" solutions. We are not given the chance to think, and even meditate on other avenues that contribute to writing "good" code. Functional programming is almost never held in the same regard as imperative programming in most CS programs.
So yeah, it's hard to code functionally, and even more importantly, to think functionally when the dominant culture is one that forbids a multi-faceted and a rich discussion on what is important to solving problems.