[DNG] C string handling (was: Systemd Shims)

Author: Rainer Weikusat
Date:
To: dng
Old-Topics: Re: [DNG] Systemd Shims
New-Topics: [DNG] *****SPAM***** Re: C string handling
Subject: [DNG] C string handling (was: Systemd Shims)

Roger Leigh <rleigh@???> writes:
> On 20/08/2015 11:27, Rainer Weikusat wrote:
>> Roger Leigh <rleigh@???> writes:
>>> On 19/08/2015 17:39, Rainer Weikusat wrote:

[...]

>>>>     p_len = strlen(IFACES_PATH); >>>>     e_len = strlen(essid); >>>>     path = alloca(p_len + e_len + 2);

>>>>
>>>>     strcpy(path, IFACES_PATH); >>>>     path[p_len] = '/'; >>>>     strcpy(path + p_len + 1, essid);

[...]

> The rationale for the use of the constants is fine. But consider that
> the code does not document where those numbers come from, and fact
> that code to calculate the buffer size and the code to copy data into
> the buffer are separate steps.

There is no good reason to document them because (as I tried to explain
in the earlier mail), they're immediately obvious when considering what
the code does (concatenate a string, a char and another string) and how
it does that (by employing the C standard library strlen and strcpy
functions to manipulate strings represented as 0-terminated sequence of
characters).

The underlying convention and its implications are not immediately
obvious in themselves but that's "you are expected to understand this"
stuff for anyone using C. They're arguably arbitrary but this is true
for any such convention. As a simple and probably almost beaten to death
example, what's the meaning of

"-1" + "-1"

?

In C and C++, this is an error, in Java, the result is "-1-1" and in
Perl -2 (the number).

> This is where problems can occur. Maybe not right now, after all you
> got it right when you wrote it, one would hope. But when it comes to
> future modifications, you must update all the size calculations and
> constants in line with any changes to how the buffer is filled, and
> this is a prime candidate for mistakes and consequent crashes and/or
> buffer overflows. When it comes to making modifications, you or
> whoever is making the change, needs to work out exactly what the
> intent of the orginal code was--i.e. re-derive all the constants and
> re-compute them correctly to match the new behaviour. This is often
> non-trivial depending on the nature of the string manipulation.

It's already non-trivial for the given case but simple enough to
understand if the necessary background knowledge is available. I also
generally agree that this code is much too complicated considering what
it does (but this is very much a matter of perpsective) and that the
complexity of the C implementation of even fairly simple string
operations limits the possible complexity of (abstract) string
manipulation humans can be expected to handle reliably, or, put into
other terms, that C string handling is a PITA, the more the more
complicated it gets.

But it doesn't always get complicated and in this situation, it has
another desirable property: There's no rabbit hidden somewhere in it
which could jump out at an inconvenient moment[*]: The correctness of
this simple algorithm is easy to assert. Pulling in a few thenthousands
lines of highly abstracted C++ library code would make that much more
difficult.

[*] In theory. In practice, people working on glibc are "mad
scientist"-style x86 machine code hackers and the actual implementation
of something like strcpy might (and likely will) be anything but
straight-forward.

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/strcpy.S;h=23231088fdc6ab7e2ff423a13ed32eff3884c3c0;hb=HEAD
?
(Partial) explanation of this (only of the interesting part): Adding a
same-sized anything to a number constructed like 0xfefefeff causes all 1
bits in the sum to be the opposite of what they were in the original
anything provided all anything bytes were non-zero, ie each addition
caused a carryover into the next higher byte. The final increment will
only result in zero if all 1 bits are set after the xor (the or sets all
other bits). There are four cases here (or two with two subcases):

Assuming the original byte was non-zero,

1) The 1 bit of the original byte was set

1a) An overflow happened when adding the previous byte. The 1-bit is now
    clear and the xor will set it back to 1.

1b) No overflow. 1 still set, xor will clear it.

2) 1 bit not set

2a) overflow, one bit now set, xor will leave it alone

2b) no overflow, still zero, will remain zero.

If there was no zero byte, either 1a or 2a will have happened for every
byte, hence, all 1 bits are now set and the increment results in a
0.

I sort-of appreciate this as a puzzle and (as I verified at some time in
the past) the algorithm is either optimal or at least better than any
simpler variant I could come up with, however, I can see no particular
value in it beyond that it's surely a complicated card trick and that I
don't expect to come into a situation where the performance of

char *strcpy(char *d, char *s)
{
    char *r;

        r = d;
        while (*r = *s) ++r, ++s; /* no cutesy increments */

        return d;
}

would be a problem for a string one would realistically consider
copying.

This message is part of the following thread:
	the complete thread tree sorted by date

	Rainer Weikusat at

Donate to Dyne.org