On Tue, 25 Aug 2015 13:49:39 +0100, Rainer Weikusat wrote:
> "tilt!" <tilt@???> writes:
>> On 08/25/2015 02:09 PM, Rainer Weikusat wrote:
>>> Considering that this enforces some kind of 'bastard URL-encoding'
>>> (using + as prefix instead of %) for all other bytes, it's also going
>>> make people who believe that UTF-8 would be a well supported way to
>>> represent non-ASCII characters very unhappy.
>>
>> 1. This encoding is not about URLs but filenames.
<snip>
>> 2. It is not safe to assume that SSIDs contain UTF-8.
>>
>> The relevant IEEE standard is botched.
>>
>> https://en.wikipedia.org/wiki/Service_set_%28802.11_network%29
>>
>> "Note that the 2012 version of the 802.11 standard defines a
>> primitive SSIDEncoding, an Enumeration of UNSPECIFIED and UTF-8,
>> indicating how the array of octets can be interpreted."
>>
>> Imagining how many service sets still operate using the pre-2012
>> standard (and/or are botched implementations themselves that fail
>> to recognize the issue), i think it is safe to assume that the
>> character encoding of an SSID is "UNSPECIFIED" in the general case.
>>
>> Therefore, it is handled encoding-agnostic on a byte-per-byte basis,
>> and this is what the code accomplishes.
>
> The code replaces everything which is neither an ASCII letter nor a
> digit nor - with a three byte escape sequence composed of + followed by
> the hexadecimal representation of the byte value. This implies that it
> will eliminate any use of non-ASCII letters both UTF-8 and otherwise.
Since the encoding is solely used to construct names for configuration
files (one per SSID), the only inconvenience I can think of is you might
end up with completely unintelligible names for those files, and only in
extreme cases. AIUI these files are not intended to be maintained by a user
or administrator but rather only be created, manipulated or destroyed by
the software.
Unless you are manually debugging the software in an environment which is
crowded with wireless stations "ééééé", "ééééá", "ééééç" and the like, you
shouldn't worry too much about it. As a user, you shouldn't care at all -
could as well use a sensible hashing algorithm, or some database, or black
magic. Or just go with hex encoding from the get go, since an SSID is just
a sequence of octets. "\x00\x00\x00\x00" (in C string literal notation)
would make a perfectly fine SSID, composed of five (sic!) null bytes, but
it is not a sensible code sequence in any character set I am aware of.
It is totally sensible to break down the character set to something that
is more or less guaranteed to be valid for building names in any file
system currently in use on this planet. That having said, I'm not sure how
the dash (minus) ended up in the allowed character set, as this would allow
for names starting with '-', which is not something I would consider good
style, but other's mileages may vary.
As mentioned above: if there is any real issue with the code at all, it is
the fact that null characters (zero bytes) are not handled correctly by the
code. But that's a feat it has in common with many consumer WiFi appliance
configuration utilities (and a pile of professional tools too, I suspect).
--
Irrwahn