(Also: attention hulver infidel: I think there's a bug in the <code> tag implementation. If you have a line of white space between your tags, everything after that line gets rendered as regular text rather than code. If you put a nonbreaking space on the line the formatter does the right thing, though.)
Why I Like Scheme
Draft - 3/11/04
When I tell programmers that my favorite programming language is Scheme, they tend to react as though I'd told them my favorite snack were packing styrofoam or my favorite music were junior high orchestra recordings: those who can even bring themselves believe me mostly wonder what on Earth I could be thinking. Some wonder if I've ever programmed in a real language like C or C++ (I have), if I'm one of those academic nuts who cares more about proving theorems about programs than actually using them for real work (I'm not), or if all those parentheses have rotted my brain (probably). So what could make me like this weird language with no for loop and so many parentheses it makes you crosseyed?
In this article, I'll tell you what. By the time I'm finished I hope I'll also have convinced you that you might want to give it a shot yourself.
Some words of warning
I'm going to assume that you're a working programmer who comes from a C, C++, Java, or similar background, so I'm not going to go into the details of how Scheme differs from other more-or-less exotic languages like Common Lisp, Standard ML, or Haskell, as interesting and worthy of discussion as those languages are. Suffice it to say that many of the ideas I present here as features of Scheme are also present in some other languages.
Furthermore, there are a large variety of Scheme systems out there, and they tend to do some important things in different ways. For the code in this article I've chosen to use PLT Scheme; it will almost certainly not run in other Schemes without some modification. Most serious Scheme systems have equivalents or near-equivalents to everything I use here, but the details differ.
Those disclaimers out of the way, let's get on to the good stuff.
I'm not going to try to teach you how to program in Scheme, but instead to give an impression of how a Scheme programmer tackles problems. So, rather than showing some boring pedagogical example like "Hello world" or a factorial function, I figured I'd show something I've actually had occasion to use in real work: a multithreaded port scanner. Here it is, written basically the way I'd really write it if I needed to:
; scan : string[hostname] (listof int) -> listof (list int string)
; gives the number and well-known service name of each port in the given
; list that is open on the given host
(define (scan host ports)
(lambda (p) (list p (port->name p)))
(open-ports host ports)))
So what does this do? I've defined a function called scan that takes two arguments, a string called host and a list of numbers called ports. It gets every open port using the function open-ports we'll define later and returns each one paired with its service name. To understand how it does it, we're going to need to make a brief diversion into looping.
A brief diversion into looping
Scheme doesn't do looping the way mainstream languages do. That leads a lot of people to think that Scheme doesn't do looping at all, which isn't true — Scheme actually has a very sophisticated and flexible way of writing loops that just takes a while to get used to (no pun intended).
The "assembly language" of loops in Scheme is recursion. Recursion in Scheme is not at all inefficient and does not waste memory, and can frequently lead to nicely clear and elegant solutions to problems. For instance, here's an implementation of a range function (similar to features of Python and Perl) that takes two numbers and builds a list of all the numbers between them in order, written recursively:
(define (range low high)
[(> low high) null]
[else (cons low (range (+ low 1) high))]))
This might look alien at first, but there's a very compelling logic to what the function says. A range where the low number is higher than the high number is just an empty list (null represents an empty list in Scheme) and any other range is just a list consisting of the low number followed by the range with the same high point but a low point that's 1 bigger (cons is a function that takes an element and a list and returns a new list consisting of the given element followed by the given list). It's a logically elegant way of thinking of the problem that also happens to be an executable program.
I say recursion is the assembly language of loops because while it's the fundamental construct that allows you to get looping to happen in Scheme, real Scheme programmers frequently use constructs built on it rather than using it directly. Owing to the fact that functions can be declared anonymously and used as values (if you're familiar with anonymous inner classes in Java, Scheme functions are like those except that they all have exactly one method that gets run when they're applied and they're much less of pain to write down syntactically), you can write a function that captures the essence of a particular style of loop and takes another function as input to decide what to do at each iteration.
The function map I used the scan function is one example of such a loop: it accepts a function and a list as input, applies the function to each individual element of the list, and returns a new list containing the results. So for instance (map add1 (list 0 1 2 3)) would yield (list 1 2 3 4).
With this in mind, we can get back to our port scanner.
On with the show
So what's going on with the scan function? It's mapping a function (that funny lambda thing) that takes a port number and produces a list of port number and service name over the list of all open ports (the result of calling open-ports). What we'll get back is a list consisting of port-number/service-name pairs, which is exactly what we want.
Next we need to implement open-ports. Here it is:
; open-ports : string[hostname] (listof int) -> (listof int)
; returns the sublist of numbers that represent open ports on the
; given host, performing all checks concurrently
(define (open-ports host ports)
(lambda (port) (can-connect? host port))
The open-ports function checks every port in the list it's provided with in parallel to see if that port is open, and when it's done returns the sublist of those ports that are actually open. It does this using with two new functions: threaded-filter and can-connect?. Since can-connect? is easier, I'll explain it first.
can-connect?: exceptions and sockets
The can-connect? function isn't bad at all:
; can-connect? : string[url] number -> bool
; determines if the host is listening on the given port
(define (can-connect? host port)
(with-handlers ([exn:i/o:tcp? (lambda (e) #f)])
(let-values ([(ip op) (tcp-connect host port)])
(close-input-port ip) (close-output-port op) #t)))
It's just a small wrapper around the tcp-connect library function, which returns two values, an input port and an output port, if it succeeds at connecting, and raises an exception if it doesn't. PLT Scheme has a built-in exception mechanism very similar to Java's or C++'s in that any code can raise an exception, and that exception propagates up the stack until it reaches an appropriate handler (specified by try/catch in C++ or Java, with-handlers in PLT Scheme). The can-connect? function calls tcp-connect and returns #t (true) if it succeeds. If tcp-connect raises an i/o:tcp exception (which it will do if it can't establish a connection), with-handlers catches that and returns #f (false).
threaded-filter: more parallel loops than you can shake a stick at
The threaded-filter function takes a little more explanation. I'll give you the whole code at once:
; threaded-for-each : (X -> Y) * (listof X) -> void
; applies the function to each element of the list in parallel
(define (threaded-for-each f l)
(let ((c (make-channel)))
(for-each (lambda (x) (thread (lambda () (f x) (channel-put c #t)))) l)
(for-each (lambda (x) (channel-get c)) l)))
; threaded-map : (X -> Y) * (listof X) -> (listof Y)
; maps the given function over the given list with each computation
; done in parallel
(define (threaded-map f l)
(let ((boxes (map box l)))
(threaded-for-each (lambda (b) (set-box! b (f (unbox b)))) boxes)
(map unbox boxes)))
; threaded-filter : (X -> bool) * (listof X) -> (listof X)
; returns the sublist of the given list for which the given predicate
; returns true, with each computation done in parallel
(define (threaded-filter pred l)
(let ((filters (threaded-map pred l)))
(let loop ((items l) (filters filters))
[(null? items) '()]
[(car filters) (cons (car items) (loop (cdr items) (cdr filters)))]
[else (loop (cdr items) (cdr filters))]))))
That's a lot of code to digest, but fortunately the work we've done to make all our functions generic has paid off: these three functions can be library code from now on and we've got three useful new loops we can use over and over and over (again, no pun intended).
First take a look at threaded-for-each. You may notice that the thread interface it uses is unusual. PLT Scheme uses unusual concurrency mechanisms taken from Concurrent ML to create and synchronize threads. Rather than locks on shared memory, the CML primitives allow threads to send values to each other over channels and block until those values are transmitted. In this example, threaded-for-each creates a new channel, spawns a thread for each item in the input list that runs the input function on it and then sends a message over the channel indicating that it is done. Once all threads are spawned, the function waits for a message from each of the spawned threads and then returns.
We use that to build threaded-map by building a list with a bunch of places that need to be destructively updated and then using threaded-for-each to update them all concurrently. We can then use threaded-map to implement threaded-filter, the function we needed that removes elements from a given list if they fail to pass a test, by concurrently mapping that test function over the given input list to produce a list of booleans, and then using that list of booleans along with the original list to build up the result.
With that done, we've got everything we need to detect open ports, and all that's left is to look up service names.
port->name: libraries, HTTP communication and regular expressions
There's a little complication in implementing port->name. If I only wanted my program to run on Unixish systems I might just call the getservbyport C function through PLT Scheme's foreign function interface (it only takes a few lines, actually), but I'd actually like it to run on Windows and Macs and various other platforms that don't have access to getservbyport (or even the /etc/services file it uses). Fortunately, the Internet Assigned Numbers Authority has an online list of ports in the same format as /etc/services, so we'll make our program fetch that if a local file isn't available. Here's the code:
(require (lib "url.ss" "net"))
(let ([ip (if (file-exists? "/etc/services")
(get-pure-port (string->url "http://www.iana.org/assignments/port-numbers")))]
(while m (regexp-match "([^ \t\r\n])[ \t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1)))
(define (port->name p) (hash-table-get NAMES p (lambda () "unknown")))
After loading in a library that defines some useful HTTP-related functions, we define the global variable NAMES to be a hash table that maps numbers to strings. To do that, we get a new input port that's connected to either /etc/services (if it exists) or to the IANA's web page, and we also make an empty hash table. Then we loop over the input port getting each port-number/name combination (as determined by a regular expression) and add each to the hash table, and finally we return the hash table which becomes the value of NAMES. Unfortunately Scheme has no while loop, but since it's such a nice fit for this problem we'll have to define it ourselves.
while: the power of macros
We've implemented lots of loops so far, but while is especially tricky because of the way we want to use it. In our previous loops we required users to pass in an anonymous function, and furthermore none of our loops introduced their own variable names. We could probably make a variant of the while loop that conformed to those expectations, but I'm stubborn and I'd rather have Scheme conform to me than to have me conform to it. If I were programming in Java or C, I'd just be out of luck, but Scheme lets me have my way. Scheme has a very unusual facility called macros (only a cousin of C macros, the well-known black sheep of the family) that take a piece of a parse tree as input and produce another one as output, allowing programmers to automatically rewrite code as they see fit. We can use that facility to make a while loop that will be have just like I want it to. Here's the macro:
(define-syntax (while stx)
(syntax-case stx ()
[(_ var test body)
#'(let loop ((var test))
(when var body (loop test)))]))
It's scary-looking, but don't fear, it's really pretty simple once you get used to the way macros are written. There's some junk at the top that says that this is a macro and sets us up to do some work, but the important stuff begins on the third line, which specifies a pattern for what a use of while should look like. It should be
(while [some-variable] [some-test] [some-body])
where the variable has to literally be a legal variable name but the test and the body can be any old Scheme expression. It also gives each of these pieces a name (var, test, and body, respectively). Those names are important for the rest of the macro: starting with the #'(let ...) line, the whole rest of the code isn't actually code, but rather a template for what the output syntax should be, and it's returned literally except that every time var, test, or body appears in the template, whatever actually appeared in that blank is substituted in instead.
Once you get the hang of it, macros like these are pretty simple. For example, our use of while in defining the NAMES table turns into the following after Scheme expands it (code from the original scan in bold, code introduced by the macro in normal font weight):
(let loop ((m (regexp-match "([^ \t\r\n])[ t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip)))
(hash-table-put! nametable (string->number (list-ref m 2)) (list-ref m 1))
(loop (regexp-match "([^ \t\r\n])[ t]+([0-9]+)/tcp[ \t]+([^\r\n])" ip))))
Of course we won't actually ever see this code or type it in — Scheme generates it for us behind the scenes when it runs our program. What the expanded code actually does is use one of Scheme's built-in loops, the so-called "named let", to run the test expression and bind it to the variable m. If m isn't false (in Scheme anything that isn't #f is true), it evaluates the body with m still bound to the test's result and then loops, checking the test again. It's a simple pattern, and one that would be hard to write down in other languages.
Since that loop was the last thing we needed to write, we're done and we've got ourselves a complete working port scanner. Hooray!
Wait a minute, where's the I/O?
I don't need any. Scheme, like Python, is designed around an interactive read-eval-print loop. I can open DrScheme, type in all the definitions, and then just use my function however I want. Here's an example run:
> (scan "localhost" (range 1 100))
((22 "ssh") (25 "smtp"))
(lambda (entry) `(port (number ,(number->string (car entry))) (name ,(cadr entry))))
(scan "localhost" (range 1 100))))))
So there's the whole port scanner. The interesting thing about this solution isn't its size or efficiency, but how at nearly every stage of the process Scheme gave me the ability to say exactly what I meant, so I could make the language do things my way rather than it making me do things its way. Schemers take full advantage of that ability, and its hard to find a group of programmers more stubborn in their pursuit of writing down exactly the concept they are trying to convey rather than having to clutter their programs with extraneous concepts.
If you actually want to execute this code, install DrScheme (when it asks what language level to use, set it to "textual" or "graphical"), open it, and copy all the code from this article into the definitions window. The order you put them in doesn't matter except that the while macro needs to be at the top. Hit the "execute" button and you're off.
|< What's that sound | BBC White season: 'Rivers of Blood' >|