:: Re: [DNG] netman GIT project
Góra strony
Delete this message
Reply to this message
Autor: Rainer Weikusat
Data:  
Dla: dng
Temat: Re: [DNG] netman GIT project
Irrwahn <irrwahn@???> writes:
> On Tue, 25 Aug 2015 13:49:39 +0100, Rainer Weikusat wrote:
>> "tilt!" <tilt@???> writes:
>>> On 08/25/2015 02:09 PM, Rainer Weikusat wrote:
>>>> Considering that this enforces some kind of 'bastard URL-encoding'
>>>> (using + as prefix instead of %) for all other bytes, it's also going
>>>> make people who believe that UTF-8 would be a well supported way to
>>>> represent non-ASCII characters very unhappy.
>>>
>>> 1. This encoding is not about URLs but filenames.
>
> <snip>
>
>>> 2. It is not safe to assume that SSIDs contain UTF-8.
>>>
>>>    The relevant IEEE standard is botched.

>>>
>>>    https://en.wikipedia.org/wiki/Service_set_%28802.11_network%29

>>>
>>>    "Note that the 2012 version of the 802.11 standard defines a
>>>    primitive SSIDEncoding, an Enumeration of UNSPECIFIED and UTF-8,
>>>    indicating how the array of octets can be interpreted."

>>>
>>>    Imagining how many service sets still operate using the pre-2012
>>>    standard (and/or are botched implementations themselves that fail
>>>    to recognize the issue), i think it is safe to assume that the
>>>    character encoding of an SSID is "UNSPECIFIED" in the general case.

>>>
>>>    Therefore, it is handled encoding-agnostic on a byte-per-byte basis,
>>>    and this is what the code accomplishes.

>>
>> The code replaces everything which is neither an ASCII letter nor a
>> digit nor - with a three byte escape sequence composed of + followed by
>> the hexadecimal representation of the byte value. This implies that it
>> will eliminate any use of non-ASCII letters both UTF-8 and otherwise.
>
> Since the encoding is solely used to construct names for configuration
> files (one per SSID), the only inconvenience I can think of is you might
> end up with completely unintelligible names for those files, and only in
> extreme cases. AIUI these files are not intended to be maintained by a user
> or administrator but rather only be created, manipulated or destroyed by
> the software.
>
> Unless you are manually debugging the software in an environment which is
> crowded with wireless stations "ééééé", "ééééá", "ééééç" and the like, you
> shouldn't worry too much about it. As a user, you shouldn't care at all -
> could as well use a sensible hashing algorithm, or some database, or black
> magic. Or just go with hex encoding from the get go, since an SSID is just
> a sequence of octets. "\x00\x00\x00\x00" (in C string literal notation)
> would make a perfectly fine SSID, composed of five (sic!) null bytes, but
> it is not a sensible code sequence in any character set I am aware of.
>
> It is totally sensible to break down the character set to something that
> is more or less guaranteed to be valid for building names in any file
> system currently in use on this planet.


This targets Linux with no chance of (and no intention to be) portable
to anything else as it's basically a wrapper around a bunch of Linux
(and even Debian) commands. There are exactly two bytes/ chars which must not
appear in a filename under these circumstances, '\0' and '/'. Anything
else is valid. Encoding other non-printable characters makes some sense
in case these files are intended to be perused by humans, however, if
this is not intended, it can as well be skipped as software doesn't need
to 'look' at graphemes to distinguish byte sequences. Blindly extending
this to "anything with bit 8 set" means it will replace all non-ASCII
characters with "something completely unintelligible to humans" despite
the machine still doesn't care. And that's not only the odd 'national
characters' which appear in Western European alphabets but potentially,
completely independent ones like Greek of Cyrillic.