Guest post by Bahul Jain, a colleague of mine, and a very talented software engineer. Bahul is a Scala enthusiast and a passionate advocate for functional programming. He is our in-house shapeless guru.
In the previous posts, we discussed how types are essential to us, and how they can be utilized without worrying about performance implications. Now let’s discuss a concept that compilers can leverage to provide a robust type system, improve type safety, ability to compose and model complex data and boost developer productivity. Of course, Scala offers this feature out-of-the-box.
Algebraic Data Types (ADT)
It may sound intimidating or highly mathematical, but it’s a simple concept. In numeric algebra, a variable is represented by either the sum or the product of other variables. Types in programming languages borrow the same concept. Let’s break this down.
Sum Type (OR)
Sum types define a value that can be one of many different yet finite set of types.
type Dog
type Cat
type Rabbit
// Animal type can be represented by either Dog OR Cat OR Rabbit
type Animal = Dog | Cat | Rabbit
In Scala 2, a sum type can be defined using a sealed trait
or a sealed abstract class
.
sealed trait Animal
final case object Dog extends Animal
final case object Cat extends Animal
final case object Rabbit extends Animal
The sealed
keyword ensures that all subclasses of the Animal
trait must be defined within the same file. This restriction allows the compiler to treat Animal
as a true sum type, as it has complete knowledge of all possible sub-types. In contrast, a non-sealed trait can have sub-classes scattered across multiple files or libraries, making it difficult for the compiler and developers to determine the complete set of types that constitute the Animal
type.
Product Type (AND)
Product
types are defined by doing a product of many different yet finite set of types.
type FirstName
type LastName
type BirthDate
type Gender
type Person = FirstName & LastName & BirthDate & Gender
A Person
type is defined using FirstName
AND LastName
AND BirthDate
AND Gender
.
Notice that the Gender
type can be represented as either Male
type OR Female
type OR something else. Similarly, BirthDate
can be defined using the Day
type AND Month
type AND Year
type. So you can have an ADT (e.g., Person
) composed using SUM or PRODUCT of other ADTs (such as Gender
and BirthDate
).
In Scala 2, you can define a product type using a case class
.
final case class Person(
firstName: String,
lastName: String,
birthDate: BirthDate,
gender: Gender
)
final case class BirthDate(day: Int, month: Int, year: Int)
The final
keyword ensures that subclasses cannot be created for the Person
or BirthDate
classes. While a case class
by itself can represent a Product
type, it does allow for unintended subclassing, which can cause confusion for both the developer and compiler when defining rules. Defining a final case class
allows for more aggressive optimizations on the JVM.
When deconstructing a case class
using pattern matching, the compiler knows exactly the constituent types of this product type and the order in which they are defined, thereby eliminating any guesswork on the developer’s part and ensuring a robust and productive development experience.
Value Type
These types usually represent a constant or a singular value such as Integer
, String
, Boolean
, Double
, etc. Essentially, they are types that stand on their own and not as a composition of other types.
Scala 2 supports value types using case object
, primitive types (Int, Boolean, etc.), or opaque types.
// value type defined as a case object
final case object Male extends Gender
// value type defined as a primitive class
Int, String, Boolean, etc.
// value type defined using AnyVal (opaque type)
final case class FirstName(value: String) extends AnyVal
// NewType using supertagged library (opaque type)
object FirstName extends supertagged.NewType[String]
type FirstName = FirstName.Type
Why use Algebraic Data Types in compilers?
Better Data Modeling Ability
As mentioned above, with ADTs, you can compose various data types to model complex domain data.
Prevent Illegal State Representation
With ADTs, one can only combine a fixed set of types using a fixed set of operations to derive the expected type.
type BirthDate = Day (Int) & Month (Int) & Year (Int)
In the above example, a BirthDate
value has to be composed using (Int, Int, and Int). If you use (String, String, and String), the compiler will complain. It’s a simple and obvious yet crucial benefit of ADTs.
Exhaustive Compiler Checking
When decomposing into its various constituent types, the compiler ensures all cases are handled.
type ForgetfulBoolean = Yes | No | DoNotKnow
def toBoolean(b: ForgetfulBoolean): Option[Boolean] =
b match {
case ForgetfulBoolean.Yes => Some(true)
case ForgetfulBoolean.No => Some(false)
case ForgetfulBoolean.DoNotKnow => None
}
Suppose we forget to add the case match for DoNotKnow
in the above example. The compiler will complain that the pattern matching is not exhaustive. Furthermore, it can also suggest which case has not been handled because it knows all possible types ForgetfulBoolean
can be (since it’s a sum type).
Improved Refactoring Support
With the compiler’s precise knowledge of type definitions, it becomes significantly easier to identify issues when the model undergoes refactoring. This capability can save developers time substantially, especially when working with large codebases. It eliminates the need to manually trace where changes might disrupt business logic. Moreover, since these checks are enforced at compile time, the likelihood of encountering runtime errors is greatly reduced.
Simplicity
ADTs are all about data, not functionality. This means they are lightweight and don’t require many dependencies. Their usage is fixed towards domain data modeling, nothing more.
Integration with Functional Programming
Every language may not want this, but noting this benefit of ADTs is worthwhile.
Functional programming favors pure functions that always return the same output for the same input and have no side effects. ADTs make this easy by:
- Capturing state in finite, known types
- Forcing handling of all possible cases (exhaustive matching)
Functional programming also emphasizes expressing programs through data, precisely using ADTs. Its compositional abilities align with mathematical set theory (hence “algebraic”) and lets you model complex domains cleanly.
Conclusion
Algebraic Data Types (ADTs) are a cornerstone of robust type systems, enabling developers to model complex domains precisely and clearly. By leveraging the compositional nature of sum and product types, ADTs provide a powerful mechanism for simplifying data modeling. Their integration with functional programming principles further enhances their utility, making them an indispensable tool for building reliable and maintainable software. Embracing ADTs improves code quality and boosts developer productivity by catching errors early in the development cycle. As we continue exploring advanced type system features, ADTs serve as a solid foundation for understanding and utilizing the full potential of modern programming languages.
In the next post, we will explore some cool features, such as type classes and ad-hoc polymorphism. These features will boost the type safety of your language multifold and allow you to reimagine coding in a whole new way. With these tools in our belt, we can further push the limits of what a compiler can accomplish in achieving type safety, improving developer productivity, and even deriving code, thereby eliminating boilerplate code.
Hero image courtesy by Steve Johnson