HTTP Handlers

Sun May 3, 2015

I've had occasion to work with these relatively often, and in various different contexts lately. So I'm going to do the survey just to get some things straight in my mind.

There are two basic approaches to doing routing. The table-oriented one, and the handler-oriented one.

Table Oriented

Is probably the most widely known at this point. You have some mechanism for defining handlers, which is entirely separated from the routing, and a central routing table somewhere that contains all the bindings of routes to handlers. You'll see this approach in Python's Tornado

...
urls = [(r"/", Index),
        (r"/show-directory", ShowDirectory),
        (r"/play", Play),
        (r"/command", Command),
        (r"/status", ServerStatus),
        (r".*", Index)]
...

as well as Clojure's compojure (snippet from thephoeron.com, rather than one of my projects)

...
(defroutes app-routes
  (GET "/" [req] (res/splash req))
  (GET "/quantum-computing" [req] (res/quantum-computing req))
  (GET "/physics" [req] (res/physics req))
  (GET "/programming" [req] (res/programming req))
  (GET "/linguistics" [req] (res/linguistics req))
  (GET "/philosophy" [req] (res/philosophy req))
  (GET "/music" [req] (res/music req))
  (GET "/art" [req] (res/art req))
  (GET "/sci-fi" [req] (res/sci-fi req))
  (GET "/impressum" [req] (res/impressum req))
  (route/resources "/static")
  (route/not-found "Not Found"))
...

These two have a few minor differences (The compojure version routes off to functions, and does so explicitly, while the Tornado version specifies classes to handle routing, and does so implicitly (the actual method call itself is generated for you). The compojure version is explicit about the HTTP method a particular handler takes, while the Tornado version handles that at the class level; the route targets are expected to have .get/.post/.put/etc. methods defined, which are called as specified by the client request. Finally, the compojure version explicitly gives you a req argument to pass to your target, which is handled behind the scenes in the Tornado version), of course, but the core concept is the same centralized table of URIs to handlers.

Handler Oriented

Involves putting the routing and handler definition machinery together. This is the approach taken by the go server, house and hunchentoot.

...
http.HandleFunc("/edit/", ShowEdit(wiki))
...
func ShowEdit (wiki *Wiki) func (http.ResponseWriter, *http.Request) {
        t := template.Must(template.ParseFiles("static/templates/edit.html", "static/templates/base.html"))
        return func (w http.ResponseWriter, r *http.Request) {
                pg, err := wiki.GetPage(r.URL.Path[len("/edit"):])
                if err == nil { t.ExecuteTemplate(w, "base", pg) }
        }
}
...

...
(define-handler (article) ((name :string))
  (aif (for-all `(and (?id :file ,name) (?id :title ?title) (?id :body ?body))
                :in *base*
                :collect (page ((str ?title) :section "blog")
                           (str ?body)
                           (:hr)
                           (prev+next-links ?id)))
       (first it)
       (page ((fmt "Not found: ~s" name) :section "blog"))))
...

...
(define-easy-handler (llthw-reference :uri "/reference/") (ref-page)
  (let ((the-ref-page (format nil "reference/~(~A~).md" (cl-who:escape-string-all ref-page))))
    (if (probe-file the-ref-page)
        (reference-basic-page ()
          (cl-who:with-html-output (hunchentoot::*standard-output*)
            (str (3bmd:parse-and-print-to-stream the-ref-page hunchentoot::*standard-output* :format :html))))
        ;else
        (reference-basic-page ()
          (cl-who:with-html-output (hunchentoot::*standard-output*)
            (:h4 "Error 404: Not Found"))))))
...

Again, minor differences. house lets you annotate your parameters and handles validation, hunchentoot lets you specify a URI that's different than the procedure name, and the go server asks for a function rather than giving you a piece of syntax to define it in-line. But the common point they share is that there isn't a table being defined in one fell swoop. It's implicit, and incrementally added to by each handler you define in your codebase.

Comparing...

First off, they're mechanically equivalent. Both of them produce some sort of lookup structure that later gets used in the decision of what response needs to be sent back to a particular client. Which means that the final output of both approaches is ultimately something like Map URI (Params -> Response). The difference is how they get there, and what the implications are for you as the reader of the program.

The Table-Oriented approach keeps all handlers in one place. After having read through that table, you can be reasonably sure that there aren't any handlers sitting around that you've missed. Because it's centralized, this approach

Lends itself to functional handler composition. You don't need side-effects to compose this table, because you're doing it all at once (so it can be a declaration), and you can imagine writing functions that transform handler tables without breaking abstraction.
Is less flexible regarding runtime handler definition. Once you're running a server, defining a new handler involves side-effect. In languages that are uppity about side effects, such as Clojure, Haskell or ML, this means it's somewhat more difficult and needs to be explicitly planned for, and the table-oriented approach doesn't allow it out of the box (Although, to be fair, runtime handler-definition is something you only really want while you're writing the program, and almost never something you want to be part of your deployed application. It's very useful while you're writing, but depending on how you set up your environment, you might not actually end up needing it).
Implicitly produces no state clashes. This is actually a detriment of the handler-oriented approach. The go variant doesn't suffer from this, but the Lisp versions do. Imagine what would happen if you used the handler-oriented approach to write two separate micro-service projects, for instance. They'd both be defining handlers into some global table, and if any routes clashed, one would end up clobbering the other. Unless you took some pains to plan for the eventuality, you'd sometimes get a handler silently stomping on another one.

The handler oriented approach is more or less the inverse. Handlers can be scattered about anywhere, so the only real way to be sure you've seen all of them is by loading up your server and inspecting the final table. The advantage you get out of this is that it's more convenient for incremental development, since you can modify one handler definition without touching the rest and can do so without restarting any servers. Additionally, this approach groups parameter validation/parsing structure (where that structure exists) along with the parameter body. That second one is the main win, because as I'll discuss in the next section, it presents an obvious path to removing a level of repetition otherwise found in handler definition.

It's enough to make me wonder whether you could build a hybrid system that had all the advantages and mitigated all of the disadvantages without a large increase in complexity. I'll leave that for another time.

Incoming context shift.

The Validation Structure

Having worked with a few other web frameworks and servers lately, the main piece of house that I end up missing is the automated parameter validation and extraction. And as I've been saying in various real-life conversations, that's a piece that can be abstracted from any particular server. For demonstration purposes, the article handler above is actually a bad example

...
(define-handler (article) ((name :string))
  (aif (for-all `(and (?id :file ,name) (?id :title ?title) (?id :body ?body))
                :in *base*
                :collect (page ((str ?title) :section "blog")
                           (str ?body)
                           (:hr)
                           (prev+next-links ?id)))
       (first it)
       (page ((fmt "Not found: ~s" name) :section "blog"))))
...

...because it has a single string parameter, which means we don't need to do any conversion or validation. Here's a somewhat contrived, but more illustrative example:

(define-json-handler (v0/api/add) ((a :integer) (b :integer))
  (+ a b))

We're expecting a and b to be integers here. And, we're expecting the return value to be automatically JSON-encoded before its sent back. If you wanted this, you'd normally have to write something along the lines of

(lambda (a b)
  (let ((a (parse-integer a))
        (b (parse-integer b)))
    (json:encode-to-string (+ a b))))

And if you had done that, you would have introduced the subtle bug involving a failing parse on either a or b. What you'd really want is something closer to

(lambda (a b)
  (handler-case
      (let ((a (parse-integer a))
            (b (parse-integer b)))
        (http-200-response (json:encode-to-string (+ a b))))
    (parse-error ()
      (http-400-error
       "Invalid argument. Expected two numbers, got (~a ~a)"
       a b))))

And suddenly, a program that should only span seven characters is complex enough that you need to exert non-trivial effort to understand it. And that's without even considering the routing mechanism and actual parameter lookup. As a rule, I like to avoid this level of incidental complexity where I can. And this is a place where I can, because the machinery to automatically do this work is extensive but regular and fairly simple. The current version of house has a built-in solution, but it's bound to the handler-oriented style, and is specialized to work with the house server. Maybe that's not such a bad thing, but lets think about what we're doing and how we'd generalize, just for fun.

Thinking About It

Basically, a handler is a function of some number of parameters to a response. Which doesn't sound hard at all. The problem is that those parameters:

don't necessarily arrive in the right order, since they're usually either x-www-form-encoded, part of the URI or some combination
arrive in string format, even if they represent values of other types
don't originate from trusted sources, so each parameter might contain invalid data given its expected type

Additionally, in many languages it's possible for the handler function to fail in some way unrelated to the input parameters.

Bottom line, we'd like to be able to write a function that naively takes parameters and naively operates on them, and that isn't expected to deal with its own exceptional conditions. Which means that we need some way to take a Map String String (our parameters), pull out the relevant parameters, parse them, validate them, pass them to our "core handler" in the proper order, then take the result and send it back to our client using some decoder. But, an error during parsing or validation should immediately trigger a 400 error, while an error during the execution of the "core handler" should cause a 500 error to be sent.

My first instinct for a Haskell implementation is clunky, since I can't think of a good way to type functions with arbitrary numbers of parameters. Though it does still gain something; if we do it right, we effectively get the type system for free, and so no longer need to explicitly annotate handlers. Something like

module Handlers where

import Control.Monad

type Params = [(String, String)]

class Read a => Param a where
    parse :: String -> Maybe a
    parse str = case reads str of
                  [(thing, "")] -> Just thing
                  _ -> Nothing

instance Param Int
instance Param a => Param [a]

lookP :: Param a => Params -> String -> Maybe a
lookP ps k = lookup k ps >>= parse

withParam :: (Param a) => String -> (a -> r) -> (Params -> Maybe r)
withParam arg fn = \ps -> liftM fn $ lookP ps arg

withParam2 :: (Param a, Param b) => (String, String) -> (a -> b -> r) -> (Params -> Maybe r)
withParam2 (arg, arg') fn = wrapped
    where wrapped ps = liftM2 fn (lookP ps arg) (lookP ps arg')

We need parse to be a typeclass method because we don't want to lock users into using the reads approach to decoding their custom types, and we'd basically need a bunch more individual withParam<n> declarations, but we could then write something like

handlers = [ ("/v0/add/<a>/<b>", withParam2 ("a", "b") (+))
           , ("/v0/sub/<a>/<b>", withParam2 ("a", "b") (-))
           , ("/v0/sum/<nums>", withParam "nums" sum) ]

Which isn't the prettiest thing I've ever seen, but spares us the trouble of validation boilerplate every damned time. I'll need to give this some more thought before putting together something more concrete. I'll let you know how it goes.