Jonathan Wilkes <jancsika@???> writes: > I cannot for the life of me understand the quote from djb starting,
> "Don't parse." What is it he doesn't like, and how does his text0
> format keep him from doing what he doesn't like?
I can only speculate about the reasons but 'parsing', especially in the
sense it is used here, namely, not to refer to 'parsing' (interpret a
sequence of tokens according to the rules of some grammar), but to
'lexical analysis of some text' seems deceptively simple in C but it
actually isn't and the standard C string support routines are virtually
useless for this. Because it seems so simple, people tend to writing
code which handles all input they considered to be valid but fails,
often in serious ways, ie, causing invalid memory accesses, when
encountering 'crafted' invalid input. If you want to do a lexical
analyser in C, it will have to become a often complicated[*] finite
state machine analysing the input character-by-character and writing one
is (IMHO) a very tedious business as one proceeds in tiny steps towards a
distant goal. Considering that "doing it right" is a lot of work and
shortcuts tend to cause disaster, avoiding the problem completely seems
like a smart move.
OTOH, 'seems', because one purpose of a parser is to detect and reject
invalid input. This means "don't parse" implies "don't do input
valdiation", and while that's surely popular :->, it's usually not an
option. But "avoid writing parsers where feasible" is IMHO a sound piece
of advice. Eg, in case some sort of 'config file format' is needed, it's
often possible to get by by writing a set of
variable=value
statements in Bourne shell syntax and replace the 'start the program'
command with a shell script sourcing the config file and starting the
real program with command-line arguments corresponding to the values
from the config file: The shell already has a parser and since typical
shells were written to perform adequately on computers with far less
horsepower than a current-day smartphone, it's even going to be a 'fast'
parser. This, of course, means one first has to get over "OMG!!1 Fork
and exec!!2" which still serves as justification for writing a few
hundredthousands of lines of C code in the quest for 'performance' but
the already mentioned, relatively puny, large computers could fork and
exec all day without their usability being seriously impaired so that's
IMNSHO a red herring.
[*] I once wrote a parser for SMTP headers which actually required an
additional state-stack in order to be able to 'go back to where we were
before encountering this'.