Recently, someone quizzed me to write a JSON parser. Time was limited. So, I told them I might not be able to write a disciplined parser that builds an AST and creates JSON objects out of it. In the limited time, I could only parse to create objects directly.
First, I declared the data types for JSON values:
sealed trait JsonValue
case class JsonObject(fields: Map[String, JsonValue]) extends JsonValue
case class JsonArray(elements: List[JsonValue]) extends JsonValue
case class JsonString(value: String) extends JsonValue
sealed trait JsonNumber extends JsonValue
case class JInt(value: BigInt) extends JsonNumber
case class JDouble(value: Double) extends JsonNumber
case class JsonBoolean(value: Boolean) extends JsonValue
case object JsonNull extends JsonValue
Then I came up with simple methods, some parts of which were pseudocode, which would parse integers and strings as shown below:
// pseudocode
def parse(json: String): Try[JsonValue] =
json.trim match
case Empty() => Failure("Empty input")
case trimmed => parseValueWithRemainder(trimmed).map(_._1)
def parseValueWithRemainder(json: String): Try[(JsonValue, String)] =
val start = skipWhitespaces(json)
if start >= json.length then Failure("Unexpected end of input")
else
json(start) match
case '{' => parseObjectWithRemainder(json, start)
case ']' => parseArrayWithRemainder(json, start)
case '"' =>
findStringEnd(json, start + 1) match
case -1 => Failure("Encountered unterminated string")
case end =>
val strContent = "\"" + json.substring(start + 1, end) + "\""
parseString(strContent).map { jsonStr =>
val nextPos = skipWhitespace(json, end + 1)
(jsonStr, json.substring(nextPos))
}
case _ =>
val endOfValue = ....
val valueStr = json.substring(start, endOfValue)
parseNumber(valueStr)
.orElse {
if (valueStr == "true") Success(JsonBoolean(true) -> remaining)
else if (valueStr == "false") Success(JsonBoolean(false) -> remaining)
else if (valueStr == "null") Success(JsonNull -> remaining)
else Failure(new IllegalArgumentException(s"Invalid JSON value: $valueStr"))
}
It is not much, but it gives a good starting point, although I couldn’t finish the entire draft in time. The question lingered in my mind for a while, mainly because I couldn’t solve it in time.
I spent a few hours over the weekend and came up with a working version of the parser. You can find the entire implementation here. I wouldn’t compare it with a disciplined parser mainly because of performance reasons. I did not target my parser for production cases. I wrote it for fun to see how it comes together.
Surprisingly, implementing parsing for numbers took me significantly more time than for other types because it involves determining whether they are integers, doubles, or exponents, and maintaining state to create the corresponding JSON type.
Anyway, have a look at the code and let me know what you think.
Hero image courtesy: unsplash