[jdom-interest] Location-based parsing

Rolf Lear jdom at tuis.net
Wed Feb 29 05:26:30 PST 2012


(most) SAX parsers set a 'Location' instance for each parse. The 
Location instance is updated just prior to calling any SAX event on the 
SAX handler.

The protocol is that, if the SAX Handler supports the Location system, 
it calls setDocumentLocator(Locator) on the handler *before* calling the 
startDocument() method.

JDOM has 'always' listened for the Locator call because it also supplies 
the publicID and SystemID values.

The Change I committed yesterday 'just' extends the current usage of the 
locator so that in each SAX event it passes on the Locator co-ordinates 
to the new 'location aware' methods on the JDOMFactory.

There are no 'call-backs' to do this.

The location information is tracked for every SAX event (every 
startElement, processingInstruction, etc.), including nested content.

Thus, in essence, every call to the JDOMFactory contains the coordinates 
for the item.

In addition to changing the SAXHandler and JDOMFactory to track and 
recieve the lcoation information, I have also included a new JDOMFactory 
(the LocatedJDOMFactory) which actually uses the location data when it 
creates special 'Located' JDOMContent (like LocatedText, LocatedCDATA, 
LocatedElement, etc.).

Thus, if you give the SAXBuilder a 'LocatedJDOMFactory' instance, you 
will get a document back which consists of 'Located' Content, and you 
can say:
Element emt = doc.getRootElement();
Located lemt = (Located)emt;
System.out.printlf("Root document has a SAX location of line %d column 
%d\n", lemt.getLine(), lemt.getColumn);

The thing to remember (and what confused me for a bit), is that the SAX 
specification has an odd idea of what the event 'location' is. It is 
defined as: "Return the column number where the current document event ends"

So, if you have the root document:
<root>text</root>

then the location of the 'text' Text JDOM content will be: line 1, 
column 11 (the char after 'text'), and not the more 'obvious' line 1, 
column 7.
Additionally, the location of the 'root' element is line 1, column 7.

I decided that it was better for me to keep the behaviour of the 
'source' system (in this case, SAX), and have a 'general' 
LocatedJDOMFactory, than to try to calculate actual start positions of 
content (which would be very challenging).

What it means is that the details of 'Located' JDOM content is dependant 
on the system that parses the document. Additionally, the SAX 
specification is 'loose' about the location too, with the documentation: 
"The return value is an approximation of the line number in the document 
entity or external parsed entity where the markup triggering the event 
appears"

Rolf

On 29/02/2012 2:19 AM, Paul Libbrecht wrote:
> Interesting Rolf,
>
> I had to use call-backs to produce the same functionality long ago.
> Does it include locations inside the elements (optional I suppose)?
>
> paul
>
>
> Le 29 févr. 2012 à 05:56, Rolf Lear a écrit :
>
>> Hi all.
>>
>> I have moved the 'Line-Number' based SAXParser code to core. It is not a straight move, but rather, a rewrite to be more general, and to fit in to the JDOM2 way of doing things in the SAX parsing area.
>>
>> A big side-effect of this move is that I have Extended the JDOMFactory interface. The changes are all 'extending' the interface, and the changes are fully compatible with JDOM 1.x ... unless you have created your own custom JDOMFactory.
>>
>> If you have your own factory then you will need to implement a number of new methods. If you extend the DefaultJDOMFactory (like most people will have, I imagine) then you will find that you are overriding final methods (which fails to compile), but the fix is easy... for example if you overrode the 'text(String)' method (which returns a Text instance), you will have (for example):
>>
>> public Text text(String text) {
>>   return new Text(text);
>> }
>>
>> You will need to change this to:
>>
>> public Text text(int line, int col, String text) {
>>   return new Text(text);
>> }
>>
>> and you will again have your full functionality.
>>
>> Rolf
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list