[jdom-interest] special characters breaking parse??
Matthew MacKenzie
matt at xmlglobal.com
Fri Jan 26 00:15:40 PST 2001
Jason,
I am using SAXBuilder, maybe I am using it wrong??...my parse code is:
Document d = new SAXBuilder().build(inStream);
Am I doing something wrong?
The XML is coming to me from emusic.com, and it obviously has problems - I
already have to call inputstream.skip(1) before passing the inputStream to
SAXBuilder because there is a '\n' before the XML declaration :-P It seems
I have found a good case for always declaring your encoding when authoring
XML :-)
Thanks for the info.
-matt
<<| message from: Jason Hunter <jhunter at collab.net> |>>
It works OK if you specify in the decl:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> When files look ASCII, I believe the parser defaults to UTF-8 unless you
> have an encoding to say differently. See
> http://www.w3.org/TR/REC-xml#sec-guessing.
>
> For the record, I saw the same error with DOMBuilder (why are you using
> DOMBuilder?). In SAXBuilder you get a better description:
>
> org.jdom.JDOMException: Error on line 3: An invalid XML character
> (Unicode: 0x84) was found in the element content of the document.
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:348)
>
> BTW, make sure you outputter.setEncoding("ISO-8859-1") on output.
>
> -jh-
>
>
> Matthew MacKenzie wrote:
> >
> > Hello,
> >
> > I am parsing an XML file, and when characters with accents and such are
> > encountered,
> > the following stack trace is thrown. I tried changing the encoding to
> > UTF-8, but that didn't work.
> >
> > Has anyone else had this problem?
> >
> > <stackTrace>
> >
> > org.jdom.JDOMException: The element type "TITLE" must be terminated by
the
> > matching end-tag "</TITLE>".: Error on line 180: The element type
"TITLE"
> > must be terminated by the matching end-tag "</TITLE>".
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:315)
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:337)
> > </stackTrace>
> >
> > Relevant Data:
> >
> > 169 <TRACK>
> > 170 <TRACKID>41676</TRACKID>
> > 171 <TITLE>Tannhäuser / Derivè</TITLE>
> > 172 <ALBUM>The Shape Of Punk To Come</ALBUM>
> > 173 <ARTIST>Refused</ARTIST>
> > 174 <GENRE></GENRE>
> > 175
> >
176<FILENAME>Refused-The_Shape_Of_Punk_To_Come-11-Tannhäuser_Derivè.mp3</FILENAME>
> > 177 <SIZE>7797864</SIZE>
> > 178 <FORMAT>.mp3</FORMAT>
> > 179 <QUALITY>128000</QUALITY>
> > 180 <CHANNELS>2</CHANNELS>
> > 181 <DURATION>489</DURATION>
> > 182 </TRACK>
> >
> > --
> > Matthew MacKenzie
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> >
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
<<| end message from Jason Hunter <jhunter at collab.net> |>>
--
Matthew MacKenzie
VP Research & Development, Founder
XML Global Technologies, Inc.
More information about the jdom-interest
mailing list