:: Re: [DNG] netman GIT project
Top Page
Delete this message
Reply to this message
Author: Rainer Weikusat
Date:  
To: dng
Subject: Re: [DNG] netman GIT project
"tilt!" <tilt@???> writes:
> On 08/25/2015 02:09 PM, Rainer Weikusat wrote:
>> Considering that this enforces some kind of 'bastard URL-encoding'
>> (using + as prefix instead of %) for all other bytes, it's also going
>> make people who believe that UTF-8 would be a well supported way to
>> represent non-ASCII characters very unhappy.
>
> 1. This encoding is not about URLs but filenames.
>
>    Your wording "bastard URL-encoding" is unclear to me, apart from
>    that i would much prefer it if you could restrain yourself
>    from using pejoratives when doing code reviews.


'URL encoding' is part of an internet standard. Since you're basically
using the same method (possibly unknowingly) but with a +-prefix instead
of the usual %-prefix, that classifies as "bastard URL encoding". AFAIK,
'bastard' means 'illegitmate child'. I don't know what else it means or
what else it can be construed to mean.

> 2. It is not safe to assume that SSIDs contain UTF-8.
>
>    The relevant IEEE standard is botched.

>
>    https://en.wikipedia.org/wiki/Service_set_%28802.11_network%29

>
>    "Note that the 2012 version of the 802.11 standard defines a
>    primitive SSIDEncoding, an Enumeration of UNSPECIFIED and UTF-8,
>    indicating how the array of octets can be interpreted."

>
>    Imagining how many service sets still operate using the pre-2012
>    standard (and/or are botched implementations themselves that fail
>    to recognize the issue), i think it is safe to assume that the
>    character encoding of an SSID is "UNSPECIFIED" in the general case.

>
>    Therefore, it is handled encoding-agnostic on a byte-per-byte basis,
>    and this is what the code accomplishes.


The code replaces everything which is neither an ASCII letter nor a
digit nor - with a three byte escape sequence composed of + followed by
the hexadecimal representation of the byte value. This implies that it
will eliminate any use of non-ASCII letters both UTF-8 and otherwise.