"tilt!" <tilt@???> writes:
> On 08/25/2015 02:09 PM, Rainer Weikusat wrote:
>> Considering that this enforces some kind of 'bastard URL-encoding'
>> (using + as prefix instead of %) for all other bytes, it's also going
>> make people who believe that UTF-8 would be a well supported way to
>> represent non-ASCII characters very unhappy.
>
> 1. This encoding is not about URLs but filenames.
>
> Your wording "bastard URL-encoding" is unclear to me, apart from
> that i would much prefer it if you could restrain yourself
> from using pejoratives when doing code reviews.
'URL encoding' is part of an internet standard. Since you're basically
using the same method (possibly unknowingly) but with a +-prefix instead
of the usual %-prefix, that classifies as "bastard URL encoding". AFAIK,
'bastard' means 'illegitmate child'. I don't know what else it means or
what else it can be construed to mean.
> 2. It is not safe to assume that SSIDs contain UTF-8.
>
> The relevant IEEE standard is botched.
>
> https://en.wikipedia.org/wiki/Service_set_%28802.11_network%29
>
> "Note that the 2012 version of the 802.11 standard defines a
> primitive SSIDEncoding, an Enumeration of UNSPECIFIED and UTF-8,
> indicating how the array of octets can be interpreted."
>
> Imagining how many service sets still operate using the pre-2012
> standard (and/or are botched implementations themselves that fail
> to recognize the issue), i think it is safe to assume that the
> character encoding of an SSID is "UNSPECIFIED" in the general case.
>
> Therefore, it is handled encoding-agnostic on a byte-per-byte basis,
> and this is what the code accomplishes.
The code replaces everything which is neither an ASCII letter nor a
digit nor - with a three byte escape sequence composed of + followed by
the hexadecimal representation of the byte value. This implies that it
will eliminate any use of non-ASCII letters both UTF-8 and otherwise.