Forge configuration parser

June 27, 2015

An overview of how I wrote a configuration file format and parser.

Recently I have finished the initial work on a project, forge, which is a configuration file syntax and parser written in go. Recently I was working on a project where I was trying to determine what configuration language I wanted to use and whether I tested out YAML or JSON or ini, nothing really felt right. What I really wanted was a format similar to nginx but I couldn’t find any existing packages for go which supported this syntax. A-ha, I smell an opportunity.

I have always been interested by programming languages, by their design and implementation. I have always wanted to write my own programming language, but since I have never had any formal education around the subject I have always gone about it on my own. I bring it up because this project has some similarities. You have a defined syntax that gets parsed into some sort of intermediate format. The part that is missing is where the intermediate format is then translated into machine or byte code and actually executed. Since this is just a configuration language, that is not necessary.

Project overview

You can see the repository for forge for current usage and documentation.

Forge syntax is a file which is made up of directives. There are 3 kinds of directives:

settings: Which are in the form <KEY> = <VALUE>
sections: Which are used to group more directives <SECTION-NAME> { <DIRECTIVES> }
includes: Used to pull in settings from other forge config files include <FILENAME/GLOB>

Forge also supports various types of setting values:

string: key = "some value";
bool: key = true;
integer: key = 5;
float: key = 5.5;
null: key = null;
reference: key = some_section.key;

Most of these setting types are probably fairly self explanatory except for reference. A reference in forge is a way to have the value of one setting be a pointer to another setting. For example:

global = "value";
some_section {
  key = "some_section.value";
  global_ref = global;
  local_ref = .key;
  ref_key = ref_section.ref_key;
}
ref_section {
  ref_key = "hello";
}

In this example we see 3 examples of references. A reference value is one which is an identifier (global) possibly multiple identifiers separated with a period (ref_section.ref_key) as well references can begin with a perod (.key). Every reference which is not prefixed with a period is resolved from the global section (most outer level). So in this example a reference to global will point to the value of "value" and ref_section.ref_key will point to the value of "hello". A local reference is one which is prefixed with a period, those are resolved starting from the current section that the setting is defined in. So in this case, local_ref will point to the value of "some_section.value".

That is a rough idea of how forge files are defined, so lets see a quick example of how you can use it from go.

package main

import (
    "github.com/brettlangdon/forge"
)

func main() {
    settings, _ := forge.ParseFile("example.cfg")
    if settings.Exists("global") {
    	value, _ := settings.GetString("global");
    	fmt.Println(value);
    }
    settings.SetString("new_key", "new_value");

    settingsMap := settings.ToMap();
    fmt.Println(settingsMaps["new_key"]);

    jsonBytes, _ := settings.ToJSON();
    fmt.Println(string(jsonBytes));
}

How it works

Lets dive in and take a quick look at the parts that make forge capable of working.

Example config file:

# Top comment
global = "value";
section {
  a_float = 50.67;
  sub_section {
    a_null = null;
    a_bool = true;
    a_reference = section.a_float;  # Gets replaced with `50.67`
  }
}

Basically what forge does is take a configuration file in defined format and parses it into what is essentially a map[string]interface{}. The code itself is comprised of two main parts, the tokenizer (or scanner) and the parser. The tokenizer turns the raw source code (like above) into a stream of tokens. If you printed the token representation of the code above, it could look like:

(COMMENT, "Top comment")
(IDENTIFIER, "global")
(EQUAL, "=")
(STRING, "value")
(SEMICOLON, ";"
(IDENTIFIER, "section")
(LBRACKET, "{")
(IDENTIFIER, "a_float")
(EQUAL, "=")
(FLOAT, "50.67")
(SEMICOLON, ";")
....

Then the parser takes in this stream of tokens and tries to parse them based on some known grammar. For example, a directive is in the form <IDENTIFIER> <EQUAL> <VALUE> <SEMICOLON> (where <VALUE> can be <STRING>, <BOOL>, <INTEGER>, <FLOAT>, <NULL>, <REFERENCE>). When the parser sees <IDENTIFIER> it’ll look ahead to the next token to try and match it to this rule, if it matches then it knows to add this setting to the internal map[string]interface{} for that identifier. If it doesn’t match anything then it has a syntax error and will throw an exception.

The part that I think is interesting is that I opted to just write the tokenizer and parser by hand rather than using a library that converts a language grammar into a tokenizer (like flex/bison). I have done this before and was inspired to do so after learning that that is how the go programming language is written, you can see here parser.go (not a light read at 2500 lines). The scanner.go and parser.go might proof to be slightly easier reads for those who are interested.

Conclusion

There is just a brief overview of the project and just a slight dip into the inner workings of it. I am extremely interested in continuing to learn as much as I can about programming languages and parsers/compilers. I am going to put together a series of blog posts that walk through what I have learned so far and which might help guide the reader through creating something similar to forge.

Enjoy.

Project overview

How it works

Conclusion

Read more