Faxe - Fingerfriendly Alternative to Xml Editing

I have been using markup languages almost as long as I've used computers: First DWScript (the MS-DOS clone of Script, which begat GML which in turn eventually morphed into SGML), then LaTeX. Then came the Web and I started to use HTML and XML and some markup languages based on them such as RXML. And in the last few years I couldn't avoid using Wikis.

I switched from DWScript to LaTeX mostly because of the much more beautiful output and because LaTeX ran on just about any platform (most importantly, Unix) and DWScript only on MS-DOS. But that was at the end of the 1980s and I really don't remember much about the pros and cons of DWScript.

LaTeX has some nice properties: The markup was rather unobtrusive, the amount of whitespace didn't matter, so you could indent your documents properly to show the structure, and you could define your own tags. It also has some bad properties: Quite a lot of “normal” ASCII characters had as special meaning — not a problem if you were writing English text or Pascal, but not so nice if you were documenting C code. Changing the style was hard.

HTML uses only very few special characters: In normal running text only < and & need to be escaped. That's nice (unless you document HTML code). It also has rather simple rules, which is also nice and minimizes surprises. But it is very markup-heavy, especially if you want to use some semantic markup for categories which the authors of HTML haven't thought of: compare <span class="filename">/etc/passwd</span> with \filename{/etc/passwd}. XML lets you define your own elements, but it is even heavier in the markup because there are no optional tags. Marking up code is a sore point: Either you use <pre>...</pre> then you can't indent properly, or you get a tag soup.

Wikis. I don't like wikis. They make simple things simpler and hard things harder. Actually, let me rephrase that: One Wiki makes simple things simpler. If you use several Wikis (and I guess most of us do) even simple things are not so simple because they all have subtle syntax differences and you (well, me in any case) always have to keep remembering how to make a list in this wiki. So instead of having to learn one markup language you have to learn half a dozen of them. I already know HTML, so most of the time I spend editing wikis is mentally translating HTML to wiki-syntax so that the wiki can translate it back into HTML. And unlike LaTeX, HTML or most programming languages, most wikis put a lot of significance into whitespace, especially leading whitespace, so I can't format the text the way I want. And “nesting” is concept which is quite foreign to most wikis.

Ranting about the sorry state of the art is nice, but it doesn't make things better. So I finally decided to do something about it and write the umpteenth markup system: Faxe.

The name was inspired by a colleague, who once exclaimed:

Wiki! Wiki! Everybody wants a Wickie! I want an Ylvi!

Well, I couldn't shoehorn Ylvi into an acronym, but Faxe was easy :-).

Goals

Light markup
The markup should be visually light. Normal text should be readable with only minimal disruption by markup. The duplication of tags in HTML have to go. Instead start and end of an element must be indicated by some kind of brackets.
Special characters must not clash with content
We need special characters to signify markup, but documents may also contain arbitrary characters. There are no characters which can be safely assumed to never occur in a document, but if there are few characters and they are rare, we can at least minimize conflicts.
Special characters must be easy to type
That rules out using some exotic unicode characters. While it is probable that just about any system could be configured to make entry easy, we cannot depend on the user being able (or willing!) to do this.
Straighforward translation from HTML
If you know (X)HTML you should be able to use Faxe within a few minutes. Every valid HTML document should have and equivalent Faxe document.
Equivalent to XML
It should be possible to translate any XML document to Faxe and back.
All Faxe documents are UTF-8
No need to mess around with different encodings.

The Language

Each Faxe document starts with a shebang line:

#!faxe ※

Instead of just faxe, a full path to a faxe executable (e.g., /usr/local/bin/faxe) may be used. It should contain the string faxe and it MUST NOT contain any spaces. After the space there is a single unicode character. This character is the markup character. In this document we use “※”, but any character can be used. Every Faxe document can use a different character, and one should use a character which doesn't naturally occur in the document and which is easy to type (I've temporarily mapped ※ (Ctrl-K : X in vim) to ü on my German keyboard. Since I'm writing this in English, I don't expect to need that letter very often).

A name (as defined in XML) immediately followed by the markup character starts an element. It is followed by an open bracketing character. The corresponding closing bracketing character marks the end of the element. Currently the following pairs of characters are recognized as brackets: (), <>, [], {}, «». Others may be added in the future.

p※{This is a paragraph with an em※{emphasized} word.}

Brackets can nest, so the p element ends after word., not after emphasized.

p※{This paragraph mentions code※(}).}

The matching of brackets is always done at the most local scope, so in this example the } within code※(}) cannot be mistaken for the end of the paragraph.

If the first non-blank character after the opening bracket is the markup character, parameters follow within another set of quotes.

a※{※{href="http//www.example.com" class=foo}a link}

Parameters have the same syntax as in XML, except that if a value consists only of name characters, the quotes can be omitted.

There is one special element, E, which is converted to a character entity:

The Euro sign can be written as E※{20AC}, but normally should be written as €.
Other special elements may be added in the future. Some ideas:
$Date: 2010-10-03 23:05:24 +0200 (Sun, 03 Oct 2010) $