A world without types - Part 3

Bahul Jain Bahul Jain
Views
Article hero image

Guest post by Bahul Jain, a colleague of mine, and a very talented software engineer. Bahul is a Scala enthusiast and a passionate advocate for functional programming. He is our in-house shapeless guru.

In the previous posts, we discussed how types are essential to us, and how they can be utilized without worrying about performance implications. Now let’s discuss a concept that compilers can leverage to provide a robust type system, improve type safety, ability to compose and model complex data and boost developer productivity. Of course, Scala offers this feature out-of-the-box.

Algebraic Data Types (ADT)

It may sound intimidating or highly mathematical, but it’s a simple concept. In numeric algebra, a variable is represented by either the sum or the product of other variables. Types in programming languages borrow the same concept. Let’s break this down.

Sum Type (OR)

Sum types define a value that can be one of many different yet finite set of types.

type Dog
type Cat
type Rabbit

// Animal type can be represented by either Dog OR Cat OR Rabbit
type Animal = Dog | Cat | Rabbit

In Scala 2, a sum type can be defined using a sealed trait or a sealed abstract class.

sealed trait Animal
final case object Dog extends Animal
final case object Cat extends Animal
final case object Rabbit extends Animal

The sealed keyword ensures that all subclasses of the Animal trait must be defined within the same file. This restriction allows the compiler to treat Animal as a true sum type, as it has complete knowledge of all possible sub-types. In contrast, a non-sealed trait can have sub-classes scattered across multiple files or libraries, making it difficult for the compiler and developers to determine the complete set of types that constitute the Animal type.

Product Type (AND)

Product types are defined by doing a product of many different yet finite set of types.

type FirstName
type LastName
type BirthDate
type Gender

type Person = FirstName & LastName & BirthDate & Gender

A Person type is defined using FirstName AND LastName AND BirthDate AND Gender.

Notice that the Gender type can be represented as either Male type OR Female type OR something else. Similarly, BirthDate can be defined using the Day type AND Month type AND Year type. So you can have an ADT (e.g., Person) composed using SUM or PRODUCT of other ADTs (such as Gender and BirthDate).

In Scala 2, you can define a product type using a case class.

final case class Person(
  firstName: String,
  lastName: String,
  birthDate: BirthDate,
  gender: Gender
)

final case class BirthDate(day: Int, month: Int, year: Int)

The final keyword ensures that subclasses cannot be created for the Person or BirthDate classes. While a case class by itself can represent a Product type, it does allow for unintended subclassing, which can cause confusion for both the developer and compiler when defining rules. Defining a final case class allows for more aggressive optimizations on the JVM.

When deconstructing a case class using pattern matching, the compiler knows exactly the constituent types of this product type and the order in which they are defined, thereby eliminating any guesswork on the developer’s part and ensuring a robust and productive development experience.

Value Type

These types usually represent a constant or a singular value such as Integer, String, Boolean, Double, etc. Essentially, they are types that stand on their own and not as a composition of other types.

Scala 2 supports value types using case object, primitive types (Int, Boolean, etc.), or opaque types.

// value type defined as a case object
final case object Male extends Gender

// value type defined as a primitive class
Int, String, Boolean, etc.

// value type defined using AnyVal (opaque type)
final case class FirstName(value: String) extends AnyVal

// NewType using supertagged library (opaque type)
object FirstName extends supertagged.NewType[String]
type FirstName = FirstName.Type

Why use Algebraic Data Types in compilers?

Better Data Modeling Ability

As mentioned above, with ADTs, you can compose various data types to model complex domain data.

Prevent Illegal State Representation

With ADTs, one can only combine a fixed set of types using a fixed set of operations to derive the expected type.

type BirthDate = Day (Int) & Month (Int) & Year (Int)

In the above example, a BirthDate value has to be composed using (Int, Int, and Int). If you use (String, String, and String), the compiler will complain. It’s a simple and obvious yet crucial benefit of ADTs.

Exhaustive Compiler Checking

When decomposing into its various constituent types, the compiler ensures all cases are handled.

type ForgetfulBoolean = Yes | No | DoNotKnow

def toBoolean(b: ForgetfulBoolean): Option[Boolean] =
  b match {
    case ForgetfulBoolean.Yes => Some(true)
    case ForgetfulBoolean.No => Some(false)
    case ForgetfulBoolean.DoNotKnow => None
  }

Suppose we forget to add the case match for DoNotKnow in the above example. The compiler will complain that the pattern matching is not exhaustive. Furthermore, it can also suggest which case has not been handled because it knows all possible types ForgetfulBoolean can be (since it’s a sum type).

Improved Refactoring Support

With the compiler’s precise knowledge of type definitions, it becomes significantly easier to identify issues when the model undergoes refactoring. This capability can save developers time substantially, especially when working with large codebases. It eliminates the need to manually trace where changes might disrupt business logic. Moreover, since these checks are enforced at compile time, the likelihood of encountering runtime errors is greatly reduced.

Simplicity

ADTs are all about data, not functionality. This means they are lightweight and don’t require many dependencies. Their usage is fixed towards domain data modeling, nothing more.

Integration with Functional Programming

Every language may not want this, but noting this benefit of ADTs is worthwhile.

Functional programming favors pure functions that always return the same output for the same input and have no side effects. ADTs make this easy by:

  • Capturing state in finite, known types
  • Forcing handling of all possible cases (exhaustive matching)

Functional programming also emphasizes expressing programs through data, precisely using ADTs. Its compositional abilities align with mathematical set theory (hence “algebraic”) and lets you model complex domains cleanly.

Conclusion

Algebraic Data Types (ADTs) are a cornerstone of robust type systems, enabling developers to model complex domains precisely and clearly. By leveraging the compositional nature of sum and product types, ADTs provide a powerful mechanism for simplifying data modeling. Their integration with functional programming principles further enhances their utility, making them an indispensable tool for building reliable and maintainable software. Embracing ADTs improves code quality and boosts developer productivity by catching errors early in the development cycle. As we continue exploring advanced type system features, ADTs serve as a solid foundation for understanding and utilizing the full potential of modern programming languages.

In the next post, we will explore some cool features, such as type classes and ad-hoc polymorphism. These features will boost the type safety of your language multifold and allow you to reimagine coding in a whole new way. With these tools in our belt, we can further push the limits of what a compiler can accomplish in achieving type safety, improving developer productivity, and even deriving code, thereby eliminating boilerplate code.

Hero image courtesy by Steve Johnson

scala series series-types types