Parsing from scratch - Part 4

Views
Article hero image

Alright, let’s dive into the implementation of a parser for configuration key value pairs. Here is a sample configuration:

host = localhost
port = 8080
debug = true
max_retries = 3
app_name = my_cool_app

Let us start by defining the data types to represent the parsed configuration. First, let us define types that represent the value of a configuration entry, and the entry itself.

import Parser.*

enum ConfigValue:
  case CNumber(value: Int)
  case CBool(value: Boolean)
  case CString(value: String)

case class ConfigEntry(key: String, value: ConfigValue):
  override def toString: String = value match
    case ConfigValue.CNumber(n) => s"$key = $n (number)"
    case ConfigValue.CBool(b)   => s"$key = $b (bool)"
    case ConfigValue.CString(s) => s"$key = $s (string)"

The entire configuration is represented by a list of configuration entries.

case class Config(entries: List[ConfigEntry])

Now, let us define parsers for parsing the different types of configuration values.

val numberValue: Parser[ConfigValue] =
  oneOrMore(digit).map(chars => ConfigValue.CNumber(chars.mkString.toInt))
  
val boolValue: Parser[ConfigValue] =
  string("true")
    .map(_ => ConfigValue.CBool(true))
    .orElse(string("false").map(_ => ConfigValue.CBool(false)))

val stringValue: Parser[ConfigValue] =
  oneOrMore(charWhere(c => c != '\n' && c != '\r', "value character"))
    .map(chars => ConfigValue.CString(chars.mkString.trim))

With the above discrete parsers, we can define a parser for parsing any configuration value.

val valueParser: Parser[ConfigValue] =
  numberValue.orElse(boolValue).orElse(stringValue)

A configuration entry is represented by a key-value pair. The pair may be separated by whitespace.

<key>\s*=\s*<value>

We covered the value parser. Now, let us write the key parser:

val key: Parser[String] =
    oneOrMore(charWhere(c => c.isLetterOrDigit || c == '_', "key character"))
      .map(_.mkString)

So far we know how to parse the key and the value. We need a parser to parse a single config entry. Before that, let us throw in some required helper parsers:

val newline: Parser[Char] = char('\n')

val lineBreaks: Parser[Unit] =
  zeroOrMore(newline).map(_ => ())
    
val spaces: Parser[Unit] =
  zeroOrMore(charWhere(c => c == ' ' || c == '\t', "whitespace"))
    .map(_ => ())
    
// The separator between key and value: optional spaces, '=', optional spaces
val separator: Parser[Unit] =
  for
    _ <- spaces
    _ <- char('=')
    _ <- spaces
  yield ()

Now, let us write the parser for a single config entry:

// A single entry: key = value (with optional trailing whitespace)
val entry: Parser[ConfigEntry] =
  for
    k <- key
    _ <- separator
    v <- value
    _ <- spaces // consume any trailing spaces before newline
  yield ConfigEntry(k, v)

Since we can parse a single entry, let us write a parser for a list of entries viz. the entire configuration:

/ A full config: multiple entries separated by newlines
  val parser: Parser[Config] =
    for
      _     <- lineBreaks                   // skip leading blank lines
      first <- entry                        // at least one entry
      rest  <- zeroOrMore(newline ~> entry) // more entries after newlines
      _     <- lineBreaks                   // skip trailing blank lines
      _     <- eof
    yield Config(first :: rest)

That concludes it. Play with the parser with this handy main method:

@main def runConfigParser(): Unit =
  val input =
    """|host = localhost
       |port = 8080
       |debug = true
       |max_retries = 3
       |app_name = my_cool_app""".stripMargin

  println(s"Input:\n$input\n")

  Config.parser.parse(input) match
    case ParseResult.Success(config, _) =>
      println(s"Parsed ${config.entries.size} entries:\n${config.entries.mkString("\n")}")
      for entry <- config.entries do println(s"  $entry")
    case ParseResult.Failure(msg) =>
      println(s"Parse error: $msg")

You can read the entire code in ConfigParser.scala

scala parser series