writing about things, sometimes.

Parsing text/gemini with Clojure revisited

Written by Omar Polo on 16 April 2021 while listening to How to Disappear Completely” by Radiohead.

Some time ago I wrote a text/gemini parser for Clojure. More than a real parser was a hack, an extremely verbose one, just to play with this transducer thing.

Well, I got tired of that and rewrote it as a standalone library that is now available on clojars.

There’s still a transducer at the heart of the library, but it’s a more reasonable one this time. It allows to build up pipelines where parsing text/gemini is only one of the steps, or to parse streaming text/gemini, which is a cool thing (even tho not widespread.)

Since this is a library and not a hack inside this blog codebase, ‘parse’ is now a multimethod (Clojure own generic functions if you came from CL-land) with default implementations for strings, sequence of strings and java Reader.

Since the parser is built using nothing more than the clojure stdlib, I thought “why not” and called the file ‘core.cljc’, so it’s available also for ClojureScript. (The beforementioned multimethod is available also there, with a default implementation for vectors, lists and strings.)

The library emits “almost usual” hiccup:

user=> (gemtext/parse "some\nlines\nof\ntext")
[[:text "some"] [:text "lines"] [:text "of"] [:text "text"]]

user=> (gemtext/parse (repeat 3 "* test"))
[[:item "test"] [:item "test"] [:item "test"]]

and is able to turn ’em back to strings:

user=> (gemtext/unparse [[:link "/foo" "A link"]])
"=> /foo A link\n"

but also to return “HTML” hiccup

user=> (gemtext/to-hiccup [[:header-1 "text/gemini"] [:text "..."]])
[[:h1 "text/gemini"] [:p "..."]]

so you can use it with other Clojure/script libraries, and to convert text/gemini to HTML.

It was fun: I use clojure a lot, but never actually wrote a library, so this was a chance to play with different things. First of, the (small) documentation is available also on cljdoc, and second I played with ‘seancorfield/readme’, a Clojure library that transforms your README to a REPL session!

A final note: the design is done, but in the following weeks I may slighly change something here and there (for instance, only now I realize that you can parse text/gemini on the fly, but not convert it to HTML one bit at a time, i.e. “convert text/gemini to html streamingly” (?) which can be useful, or unparse into anything other than a string.)

(P.S. I took the chance to also to restyle the capsule. I removed the ASCII banners and followed the subscription spec, yay!)