Draft - 3/10/04
When I tell programmers that my favorite programming language is Scheme, they tend to react as though I'd told them my favorite snack were packing styrofoam or my favorite music were junior high orchestra recordings: those who can even bring themselves believe me mostly wonder what on Earth I could be thinking. Some wonder if I've ever programmed in a real language like C or C++ (I have), if I'm one of those academic nuts who cares more about proving theorems about programs than actually using them for real work (I'm not), or if all those parentheses have rotten my brain (probably). So what could make me like this weird language with no for loop and so many parentheses it makes you crosseyed?
In this article, I'll tell you what, and by the time I'm finished I hope I'll have convinced you that you might want to give it a shot yourself. I'm going to assume that you're a working programmer who comes from a C, C++, Java, or similar background, so I'm not going to go into the details of how Scheme differs from other more-or-less exotic languages like Common Lisp, Standard ML, or Haskell, as interesting and worthy of discussion as those languages are. Suffice it to say that many of the ideas I present here as features of Scheme are also present in some other languages.
That disclaimer out of the way, let's get on to the good stuff.
Scheme head-first
I'm not going to try to teach you how to program in Scheme, but instead to give an impression of how a Scheme programmer tackles problems. So, rather than showing some boring pedagogical example like "Hello world" or a factorial function, I figured I'd show something I've actually had occasion to use in real work: a multithreaded port scanner. Here it is, written basically the way I'd really write it if I needed to:
; scan : string[hostname] int int -> listof (list int string)
; produces a list of all open TCP ports on the given host and their
; associated well-known service names, listed in order from low to high
(define (scan host low high)
(build/threaded port (from low to high)
(if (can-connect? host port)
(list port (port->name port))
#f)))
See, that's not so bad. The first three lines are comments describing the function; naturally they're not necessary for the program to run. The next defines a function called scan which takes three arguments: host, low, and high. Conceptually we'd like scan to return an ordered list of all the ports that are open on the given host, where all the decisions about each particular port were made in parallel with each other. That sounds complicated, so with our Scheme hats on we'll just make up some new syntax called build/threaded that does it for us and worry about how that syntax works later. We'll say that you give it the name of a new variable (port in this case), specify a range using a for ... to ... construct, and give it a body of code to execute, and it executes the body in parallel for each number in the range and returns an ordered list of all the results. And just because we want to make our lives really simple, we'll say that output list only includes a result if the body didn't return #f (false).
Assuming we've got build/threaded, implementing scan is easy. The body of the loop just needs to test if the given port is open (we'll write a function can-connect? to check that) — if it is we'll return a list containing the port number and the well-known service name (we'll need to write port->name to do that), and if it's closed we return false (in Scheme, the body of a function is an expression, not a statement, and whatever that expression evaluates to is implicitly considered the function's return value).
That's the whole overall program. All that's left is to fill in the details for can-connect?, port->name, and build/threaded.
The details
The can-connect? function isn't bad at all:
; can-connect? : string[url] number -> bool
; determines if the host is listening on the given port
(define (can-connect? host port)
(with-handlers ([exn:i/o:tcp? (lambda (e) #f)])
(let-values ([(ip op) (tcp-connect host port)])
#t)))
It's just a small wrapper around the tcp-connect library function, which returns two values, an input port and an output port, if it succeeds at connecting, and raises an exception if it doesn't. PLT Scheme has a built-in exception mechanism very similar to Java's or C++'s in that any code can raise an exception, and that exception propagates up the stack until it reaches an appropriate handler (specified by try/catch in C++ or Java, with-handlers in PLT Scheme). The can-connect? function calls tcp-connect and returns #t (true) if it succeeds. If tcp-connect raises an i/o:tcp exception (which it will do if it can't establish a connection), with-handlers catches that and returns #f (false).
The port->name function is a little more complicated. If I only wanted my program to run on Unixish systems I might just call the getservbyport C function through PLT Scheme's foreign function interface (it only takes a few lines, actually), but I'd actually like it to run on Windows and Macs and various other platforms that don't have access to getservbyport (or even the /etc/services file it uses). Fortunately, the Internet Assigned Numbers Authority has an online list of ports in the same format as /etc/services, so we'll make our program fetch that if a local file isn't available. Here's the code:
(define (port->name p) (hash-table-get NAMES p (lambda () "unknown")))
(define NAMES
(let ([ip (if (file-exists? "/etc/services")
(open-input-file "/etc/services")
(get-pure-port (string->url "http://www.iana.org/assignments/port-numbers")))]
[nametable (make-hash-table)])
(while m (regexp-match "([^ \t\r\n])[ \t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1)))
nametable))
First we define the global variable NAMES to be a hash table that maps numbers to strings. To do that, we get a new input port that's connected to either /etc/services (if it exists) or to the IANA's web page, and we also make an empty hash table. Then we loop over the input port getting each port-number/name combination (as determined by a regular expression) and add each to the hash table, and finally we return the hash table which becomes the value of NAMES. Unfortunately Scheme has no while loop, but since it's such a nice fit for this problem we'll use it anyway and remember to define it later.
Once we've got NAMES defined port->name is just a hash-table lookup with the string "unknown" returned if the port we're looking for isn't in the table. That completes the function, so all we have left to do is define the loops we used.
Macros and concurrency
We can't put it off any longer: we've got to make definitions for while and build/threaded. The former is easier, so we'll do it first.
The key to making new syntax forms is a very unusual facility called macros (only a cousin of C macros, the well-known black sheep of the family) that take a piece of a parse tree as input and produce another one as output, allowing programmers to automatically rewrite code as they see fit. Here's the macro for while:
(define-syntax (while stx)
(syntax-case stx ()
[(_ var test body)
(identifier? #'var)
#'(let loop ((var test))
(when var body (loop test)))]))
It's scary-looking, but don't fear, it's really pretty simple once you get used to the way macros are written. There's some junk at the top that says that this is a macro and sets us up to do some work, but the important stuff begins on the third line, which specifies what shape the syntax tree should take. It should be
(while [some-variable] [some-test] [some-body])
where the variable has to be a literal identifier but the test and the body can be any old Scheme expression. It also gives each piece a name (var, test, and body, respectively). Those names are important for the rest of the macro: starting with the #'(let ...) line, the whole rest of the code is actually a template for what the output syntax should be, and it's returned literally except that every time the name of one of the blanks appears, whatever actually appeared in that blank is substituted in instead.
Once you get the hang of it, macros like these are pretty simple. For example, our use of while in defining the NAMES table turns into the following after Scheme expands it (code from the original scan in bold, code introduced by the macro in normal font weight):
(let loop ((m (regexp-match "([^ \t\r\n])[ t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)))
(when m
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1))
(loop (regexp-match "([^ \t\r\n])[ t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip))))
Of course we won't actually ever see this code or type it in — Scheme generates it for us behind the scenes when it runs our program. What the expanded code actually does is use one of Scheme's built-in loops, the so-called "named let", to run the test expression and bind it to the variable m. If m isn't false (in Scheme anything that isn't #f is true), it evaluates the body with m still bound to the test's result and then loops, checking the test again. It's a simple pattern, and one that would be hard to write down in other languages. People from imperative backgrounds often criticize Scheme for making you use recursion to express looping, but that's missing the point: recursion is the "assembly language" for loops in Scheme, but (in addition to a large library of prebuilt loops) the language gives you the tools you need to easily build up the most powerful flow-control abstractions you can think of.
The last loop we need to implement to complete our program demonstrates that point nicely. Remember we needed to make build/threaded, a flow construct that lets you specify a range and a body and builds a list of all the non-false evaluations of that body on the given range in parallel. As it happens, for another project I once implemented a nice library function called threaded-for-each that takes a function and a list and calls the function on every element of the list in parallel. It looks like this:
(define (threaded-for-each f lst)
(let ((chan (make-channel)))
(for-each
(lambda (x) (thread (lambda () (f x) (channel-put chan 'done))))
lst)
(for-each (lambda (ignored) (channel-get chan)) lst)))
You may notice that the thread interface is unusual. Rather than the traditional semaphore/mutex lock system, PLT Scheme uses unusual concurrency mechanisms taken from Concurrent ML to create and synchronize threads. Rather than locks on shared memory, the CML primitives allow threads to send values to each other over channels and block until those values are transmitted. In this example, threaded-for-each creates a new channel, spawns a thread for each item in the input list that runs the input function on it and then sends a message over the channel indicating that it is done. Once all threads are spawned, the function waits for a message from each of the spawned threads and then returns.
We can use threaded-for-each to implement build/threaded. Here's an implementation:
(define-syntax (build/threaded stx)
(syntax-case stx (from to)
[(_ x (from start to end) body)
(identifier? #'x)
#'(let ([v (make-vector (- hi lo) #f)]
[i start]
[j end])
(threaded-for-each
(lambda (x)
(let ((ans body))
(when ans (vector-set! v (- x i) ans))))
(range i j))
(filter (lambda (y) y) (vector->list v)))]))
This is the biggest chunk of Scheme code in the program, and it's a macro too, so you know it's got to do something cool. Here's how it works: we create a vector that has a slot for every element in the target range, all initialized to #f. Then we make a list of all the numbers in our range and use threaded-for-each with a function that runs the macro's body on each number in that list, updating the appropriate cell in the vector when it's finished. Finally, once the threaded-for-each is over with (and therefore all of its threads have completed) we turn the vector into a list, strip out all the #fs, and return it.
Some conclusions
That's the whole port scanner. The interesting thing about this solution isn't its size or efficiency, but how at nearly every stage of the process Scheme gave me the ability to say exactly what I meant, so I could make the language do things my way rather than it making me do things its way. Schemers take full advantage of that ability, and its hard to find a group of programmers more stubborn in their pursuit of writing down exactly the concept they are trying to convey rather than having to clutter their programs with extraneous concepts.
| < Curses! Foiled again! | BBC White season: 'Rivers of Blood' > |

Post to Twitter
