:: Re: [DNG] [OT] [Re: Studying C as t…
Αρχική Σελίδα
Delete this message
Reply to this message
Συντάκτης: Adam Borowski
Ημερομηνία:  
Προς: dng
Αντικείμενο: Re: [DNG] [OT] [Re: Studying C as told. (For help)
On Tue, Jun 21, 2016 at 04:38:50PM +0200, Irrwahn wrote:
> On Tue, 21 Jun 2016 16:00:53 +0200, Adam Borowski wrote:
> > On Tue, Jun 21, 2016 at 03:13:21PM +0200, Irrwahn wrote:
> >> On Tue, 21 Jun 2016 14:42:46 +0200, Edward Bartolo wrote:
> >> [...]
> >>>     if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z')) {
> >> [...]
> >> You should *never* assume that the latin letters occur in 
> >> the execution character set in ascending consecutive order.
> >> (Though similar *is* guaranteed for the digits '0' to '9'!)

> >
> > Not really -- assuming ASCII is like assuming 8-bit bytes[1]. Both could be
> > false at the dawn of time, but today trying to support that is a waste of
> > time. Too much code you rely on makes that assumption.
>
> Neither is an excuse for not using isalpha() et al. instead
> of above abomination. I'm not even going to mention the notion
> of "character representing an alphabetic letter" differs between
> locales. WTF, I just did, nevermind!


It's isalpha() that's an abomination, exactly because its behaviour varies
between locales. If you want consistency, you need either:
* said c>='a'&&c<='z'||c>='A'&&c<='Z' which does ASCII
* iswalpha() which does Unicode
The latter is somewhat buggy in glibc -- if the locale is unset (ie, you
don't call setlocale()) or "C", character tables are not loaded. On the
other hand, it's ok in any other locale, including ancient ones like
*.ISO-8859-1 (including characters not in ISO-8859-1). Other libcs I
tested do it right in all cases.

> And "there's so much broken code already you rely on" should
> never be an excuse to deliberately produce even more broken code.


I wouldn't call code that assumes 8-bit bytes "broken". I'd call it "sane".
You'd need to really go out of your way to be able to test it, and the code
would be complex... for what gain?

Same for EBCDIC. If you have to maintain code inherited from '60s IBM
mainframes, I pity you, but it's a fact that the rest of the world agreed on
ASCII. It's same principle as when every town had a different cubit length.

--
An imaginary friend squared is a real enemy.