Author: Irrwahn Date: To: dng Subject: Re: [DNG] grep handles ISO-8859 encoded text file as binary file.
On Thu, 28 Apr 2016 13:16:53 -0400, Hendrik Boom wrote: > On Thu, Apr 28, 2016 at 06:53:35AM +0000, Noel Torres wrote:
>> Hughe Chung <janpenguin@???> escribió: [...] >>> $ grep tesselate dome_math.c
>>> Binary file dome_math.c matches [...] >> If I were to bet, I would say that the file dome_math.c is not
>> correctly formatted, or has an incorrect BOM at start, or so.
>
> I've occasionally had a program that accepted UTF-8 reject a file
> because it *had* a valid BOM at the start. [...]
That would be because the notion of a BOM makes not much
sense at all for UTF-8. There is no byte order issue with
UTF-8, yet some brilliant mind thought it would be a good
idea to define and allow one (EF BB BF) anyway. And, pray
tell, other brilliant minds decided to use it as a way to
tell UTF-8 from traditional single byte encodings. This is
absurd, as it is just as bad as any other heuristic one
may come up with to deduce text file character encoding.
To add insult to injury, some poorly written text editing
tools insert a BOM without any need or even being asked to,
deliberately breaking otherwise perfectly fine 7-bit ASCII
files and rendering them incompatible to legacy software.