[vwnc] BUG: XML Parser fails in parsing XHTML

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[vwnc] BUG: XML Parser fails in parsing XHTML

David Siegel-6
The W3C is now rejecting requests for the XHTML DTD.

See
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

It seems like, within the last few days, they've begun rejecting requests for
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
that originate from programs other than the major browsers.

Many XML parsers, including the VW parser, are having problems.

See reports:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=525089
http://weblogs.java.net/blog/archive/2009/08/11/spoonful-scala

VW's XMLParser fetches DTDs even when it asked not to validate.
Since w3c now returns
503 Service Unavailable due to Unknown abuse from requesting IP
when http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd is fetched,
the XMLParser fails.

You can see this failure by trying to view an XHTML file in the FileBrowser,
using the "XML Tree" tab.

I can work around the problem by modifying XMLParser>>dtdFile:

I replaced:

    input == nil ifTrue: [input := (uriList at: 2) asURI resource].

with

    input == nil ifTrue: [
        self isValidating
        ifTrue: [input := (uriList at: 2) asURI resource]
        ifFalse: [input := InputSource uri: nil encoding: nil stream: '' readStream]
    ].

The patch avoids fetching the contents of dtds, instead returning an empty InputSource.
Of course, the patch doesn't handle validation.

A better solution would cache the dtd locally. This approach would also have the benefit
of permitting validating parses even when VW cannot connect to the internet.

Thanks,
-dms


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc