[Gardeners] Unicode support among different CL implementations (Re: XML Parser Comparison doc)
Marco Antoniotti
marcoxa at cs.nyu.edu
Tue Jan 17 12:43:18 CST 2006
Hi
following this thread, it seems to me that what would be extremely
valuable would be a reasoned comparison of how different CL
implementations (including the commercial ones) support Unicode w.r.t.
the ANSI standard.
Now: don't look at me for actually doing this. I have no time. I just
think it is a good idea.
Cheers
--
Marco
On Jan 17, 2006, at 1:12 PM, David Lichteblau wrote:
> Quoting Peter K.Lee (saint at corenova.com):
>>> * What does the "char-sets" column mean? It says "UTF-8 w/o
>>> Unicode" for
>>> cxml; I can't make sense of that.
>> Me neither. :) But that is how it is reported in the cxml page.
>
> I take that to mean that the CXML documentation is not elaborate enough
> on this. Do you have a suggestion where in the documentation to write
> more about it? What kind of information would you have liked to see?
>
>> Other parsers make cursory notes about character sets it supports as
>> well. I'd be happy to update the column to make it more sane if
>> someone can shed some light on what it really means...
>
> Well, partly I was asking what the column was meant to be about.
>
> UTF-8 is not a character set, it's an encoding.
>
> * The "character set" XML parsers use is, by definition, Unicode.
> Every XML parser must deal with Unicode.
>
> * A different question is which "encodings" a parser supports. Now,
> every
> parser is required by the spec to support both UTF-8 and and UTF-16.
> If it doesn't, that's a topic for a bugs section, not so much for a
> features comparision. In a feature comparison, it would be
> interesting
> to know which *other* encodings a parser supports.
>
> For example, CXML seems to support iso-8859-n and koi8-r (hmm,
> whatever
> that is :-)) in addition to UTF-8 and UTF-16.
>
> (Ideally, an XML parser in Lisp [an a Unicode-ware implementation]
> would support all external formats supported by the host Lisp, but
> that can be a portability issue.)
>
> * Yet another question is which encodings the serializer supports.
>
> For example, CXML has built-in support for UTF-8 serializer (even
> on
> non-unicode aware Lisps) and leaves all other encodings to the host
> Lisp. (Prepend your own XML declarations and use a character
> stream
> sink with the external-format you need.)
>
>>> * Somehow I'd like a column "Makes an effort to conform to the
>>> standards". AFAIK only CL-XML and CXML qualify for a "yes" there.
>>
>> I'm not exactly sure how to quantify "making an effort to conform to
>> the standards". It appears that XML syntax is a particular standard
>> that all the XML parsing libraries conform to, and the rest of the
>
> Well, there is a indeed standard for XML 1.0
> http://www.w3.org/TR/REC-xml/
> and there is a very good test suite for that standard
> http://www.w3.org/XML/Test/
>
>> "techniques" of parsing vary widely. If the XML parser does not do
>> validation,
>
> No, there are validating and non-validating parsers. The XML test
> suite
> has tests for both of them. It's fine for a parser to state that it
> doesn't support validation, it is still a conforming non-validating
> parser.
>
>> or provide the W3C DOM API, does that mean it is not
>> making an effort to conform to the standards?
>
> A XML parser does not have to implement DOM by any means. It is
> definitely an optional feature. If it does claim to implement it, it
> should pass the DOM test suite, however.
>
> Same for XML namespaces. That is also an optional, separate
> specification and covered by specially tagged tests in the XML
> conformance test suite.
>
>
>> -Peter
>
> Thanks,
> David
> _______________________________________________
> Gardeners mailing list
> Gardeners at lispniks.com
> http://www.lispniks.com/mailman/listinfo/gardeners
>
--
Marco Antoniotti http://bioinformatics.nyu.edu/~marcoxa
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
715 Broadway 10th FL fax. +1 - 212 - 998 3484
New York, NY, 10003, U.S.A.
More information about the Gardeners
mailing list