Print Story And the bells were ringing out for Listless Day
Diary
By yicky yacky (Wed Jun 07, 2006 at 08:06:28 AM EST) World Cup, code (all tags)

Inside:

  • World Cup Prediction Challenge: Put your trouser where your mouth is.
  • The most irritating indentation style evar [Don't lie to me; I know some of you use it].
  • Regular Expression Fight: Perl versus C (libpcre).


World Cup Mystic Meg

So: As cannot have been escaped, The World Cup Of All Things Football is finally upon us. The newspapers and magazines are full of players, pundits and pop stars vomiting forth their ill-informed opinions on who will win what and how each team will fare.

Well, we're just as good as their kind of scum, so there's nothing preventing us doing it, too. Post your predictions below, so that, in five weeks time, we can all have a good chuckle about how ludicrously off-the-ball we all were. Predictions can range over anything you like (commentators having coronaries, tourist attacks, most disgraceful fans etc.), but should at least include a guess at who the overall winners will be, plus an estimation of how your home nation will fare (Rogerborg and gpig caveat: if they qualified).

Nobody likes sticking their neck out, so I'll get this ball rolling:

  • Winners: The Argies. Hardly a controversial proposition, but not the Brazilians, at least. Why? They've got just as much skill as the Brazilians, have a more rigorous defensive mentality, have the will to cheat, have developed inner grit, most of them play in Europe, they've a great team mentality - and Carlos Tevez. And Messi. And Riquelme. And Crespo. And Sorin. etc.
  • England's prospects: Much like the rest of the country, I think this is hard to predict as it depends on the form and fitness of the Rune-child. If Rooney doesn't play (and the team is largely the same as that which walloped Jamaica at the weekend) there's a large part of me which genuinely thinks they won't get out of the group. However: If they get out, they'll either die against the Argies in the quarters or, if they avoid the Argies, get to the semis.
  • Crazy arbitrary prediction number one: The Czechs won't do shit. Yes: They're ranked number two (2nd) in the world, but they'll die on their arse this time. Nedved's been underperforming, Baros is heinously over-rated and their defence looks creaky. Even the US (5th) might beat them in the group stage.
  • Batshit arbitrary prediction number two: The Aussies will come second in Group F. Not Japan. Not Croatia. The colonial crims.

 


 

The indentation chestnut

Coders often love to argue about indentation and brace styles - indeed, some people here may have bones to pick over the way I formatted some of the code in the next section, especially the many-parametered function calls.

Code style is one of those things everyone has a preference over but, in reality, most hackers can happily read most styles, so don't really care that much. An indentation style is nothing more than a way of formatting code layout, to reflect scope and structure. It is usually expessed most through the positions of the braces ('{' and '}'), and the indentation.

If given the choice, I tend to use what's known as K&R style (aka OneTrueBrace style). It looks like this:

if (conditional) {
    code
    code
    code etc.
}

A lot of the code I see uses the so-called Allman style (aka BSD style), like so:

if (conditional)
{
    code
    code
    code etc.
}

In truth, there isn't a vast amount of difference between these two styles - just the location of the opening brace - and you can often see people using combinations of the two. For instance, in what I call 'Stroustrup style', people use K&R for conditionals and loops, but Allman style for functions and methods.

Some of the more esoteric ones I'd come across include Whitesmith's style:

if (conditional)
    {
    code
    code
    code etc.
    }

and GNU style, which is kind of a half-way house between Allman and Whitesmith's:

if (conditional)
  {
    code
    code
    code etc.
  }

There are various rationalizations for criteria which cover style, including vertical space conservation, code scope determination, semantic connotations of code placement etc. Until this morning, I had never come across the following style in my day-to-day work, but now I have:

if (conditional) {
    code
    code
    code etc.
    } // end brace goes here! Why, sweet Jesus, would you ever fucking do such a thing?

According to wikipedia, it's called 'Banner Style', maybe because Bruce came up with it when he was cross. It's a travesty. Yeah, it looks innocuous enough there, but imagine how it looks with nested loops and conditionals. Even something not-very-complicated gives completely the wrong impression:

for(i = 0; i < foo; ++i) {
    for(j = 0; j < bar; ++j) {
        if(my_monkey->eats_nuts) {
            while(the_tree->has_nuts) {
                eat(my_monkey, the_tree->nuts);
                }
            }
        }
    }

Completely. Bloody. Useless.

 


 

Perls in the C

Just how do you check for the existence of needle and retrieve it from "Haystackhaystackneedlehaystackhaystack" (where needle may itself be a regular expression and hence not amenable to a straight substring search)?

 

Doing it with Perl

my $subject = "Haystackhaystackneedlehaystackhaystack";
my $tok;
if ($subject =~ /\w*(needle)\w*/)  {
    $tok = $1;
}

 

Doing it with C in one of the most terse but legal ways possible (avoiding dynamic allocation and all things malloc)

#include <stdio.h>
#include <string.h>
#include <pcre.h>

#define OVEC_SIZE 9

static const char haystack[] = "Haystackhaystackneedlehaystackhaystack";
static const char reg_exp[] = "\\w*(needle)\\w*";

char *tok;
int toklen;

int find_needle() {
    
    pcre *re;
    int re_res;
    int ovec[OVEC_SIZE];
    const char *re_error;
    int re_erroffset;
    const int options = PCRE_MULTILINE;
    
    re = pcre_compile (
        reg_exp,
        options,
        &re_error,
        &re_erroffset,
        NULL
    );
    
    /* Essentially a debug - it means regexp compilation failed */
    if(re == NULL) {
        /*
        * Print a "regexp compilation failed" message ...
        * Examine the error returns if need be ...
        */
        return SOME_KIND_OF_FAILURE;
    }
    
    re_res = pcre_exec(
        re,
        NULL,
        haystack,
        strlen(haystack),
        0,
        0,
        ovec,
        OVEC_SIZE
    );
    
    pcre_free(re);
    re = NULL;
    
    /*
    * 2 == number of pattern matches - one for entire match, one for the capture
    * group. If we've got two matches, it means the parenthesis pattern matched.
    */
    if(re_res < 2) {
        /* No match found ...*/
        /* Examine results if need be ...*/
        return SOME_KIND_OF_FAILURE;
    }
    
    tok = (char*) haystack + ovec[2];
    toklen = ovec[3] - ovec[2];
    
    return SOME_KIND_OF_SUCCESS;
}

/*
* Throw in proper error checking and run-time dynamic allocation and this gets
* a lot, lot longer.
*/

Obvious lesson for today: If you can use perl for text processing, do use it.

Full discussion: http://www.hulver.com/scoop/story/2006/6/7/8628/12497