In this post, we are going to build a parser for semantic version. Let us consider the following format for semver:
<major>.<minor>.<patch>[-<build>]
As you can see, the build part is optional, which is what makes it tricky.
Here are some examples of valid semantic versions:
"1.0.0"
"1.0.0-beta.1+build",
"1.0.0-beta.1+build.1",
"1.0.0-beta.1+build.1.2",
"1.0.0-beta.1+build.1.2.3"
First let us define a Semver case class that represents the result:
case class Semver(
major: Int,
minor: Int,
patch: Int,
build: Option[String] = None
):
override def toString: String =
val buildStr = build.map(b => s"-$b").getOrElse("")
s"$major.$minor.$patch$buildStr"
Next, let us define the parser:
object Semver:
import Parser.*
def parser: Parser[Semver] =
for
major <- digits()
_ <- char('.')
minor <- digits()
_ <- char('.')
patch <- digits()
build <- optional(char('-') ~> buildPart)
yield Semver(major, minor, patch, build)
private def buildPart: Parser[String] =
oneOrMore(charWhere(c => c.isLetterOrDigit || c == '.' || c == '+' || c == '_', "build character"))
.map(_.mkString)
As I mentioned already, parsing the build part is the tricy part. First, we create the parser for the build part. The build part can contain letters, digits, dots, plus signs, and underscores. We use the oneOrMore combinator to match one or more of these characters, from which we construct the build part string. The next bit is to optionally parse the build part: build <- optional(char('-') ~> buildPart). If we find a - then we combine it with the parser for the build part.
Here is a handy main method to test the parser:
@main def runSemverParser(): Unit =
List(
"abcd",
"1.0.0",
"1.0.0+build",
"1.0.0-beta.1",
"1.0.0-beta.1+build",
"1.0.0-beta.1+build.1",
"1.0.0-beta.1+build.1.2",
"1.0.0-beta.1+build.1.2.3"
).foreach { input =>
Semver.parser.parse(input) match
case ParseResult.Success(semver, _) => println(s" OK: \"$input\" -> $semver ${if input == semver.toString then "√" else "X"}")
case ParseResult.Failure(msg) => println(s"FAIL: \"$input\" -> $msg")
}
Next time, let us see a more involved parser - parser for config key value pairs.