<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.6000.16608" name=GENERATOR><!--[if !mso]>
<STYLE>v\:* {
        BEHAVIOR: url(#default#VML)
}
o\:* {
        BEHAVIOR: url(#default#VML)
}
w\:* {
        BEHAVIOR: url(#default#VML)
}
.shape {
        BEHAVIOR: url(#default#VML)
}
</STYLE>
<![endif]-->
<STYLE>@font-face {
        font-family: Courier;
}
@font-face {
        font-family: Tahoma;
}
@font-face {
        font-family: Trebuchet MS;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.25in 1.0in 1.25in; }
P.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
LI.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
DIV.MsoNormal {
        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman"
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
        COLOR: blue; TEXT-DECORATION: underline
}
P {
        FONT-SIZE: 12pt; MARGIN-LEFT: 0in; MARGIN-RIGHT: 0in; FONT-FAMILY: "Times New Roman"; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
SPAN.EmailStyle18 {
        COLOR: navy; FONT-FAMILY: Arial; mso-style-type: personal-reply
}
DIV.Section1 {
        page: Section1
}
</STYLE>
</HEAD>
<BODY lang=EN-US
style="WORD-WRAP: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space"
vLink=blue link=blue>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2>First you need to define what you mean by duplicate - ie.
what variations you want to tolerate (e.g. order of attributes, choice of
namespace prefixes, insignificant whitespace, whether to ignore comments, use or
non-use of entities, Unicode normalization).</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2>For many purposes a good approach is to convert both
documents into XML Canonical Form and then compare them
lexically.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2>There's the deep-equal() function in XPath
which defines one particular set of rules, and Saxon has a
saxon:deep-equal() function that allows you more choice over the rules you want
to apply.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2>Michael Kay</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=347101115-18032008><FONT face=Arial
color=#0000ff size=2><A
href="http://www.saxonica.com/">http://www.saxonica.com/</A></FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> jdom-interest-bounces@jdom.org
[mailto:jdom-interest-bounces@jdom.org] <B>On Behalf Of
</B>vijayaraghavan.subramaniam@wipro.com<BR><B>Sent:</B> 18 March 2008
13:59<BR><B>To:</B> paul@activemath.org; frode@fritid.as;
jdom-interest@jdom.org<BR><B>Subject:</B> RE: [jdom-interest] Comparing two
XML files using JDOM<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">My objective to find
duplicate xml files or elements & generate a report
(PDF)<o:p></o:p></SPAN></FONT></P>
<P class=MsoNormal><FONT face=Arial color=navy size=2><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><o:p> </o:p></SPAN></FONT></P>
<DIV>
<DIV class=MsoNormal style="TEXT-ALIGN: center" align=center><FONT
face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">
<HR tabIndex=-1 align=center width="100%" SIZE=2>
</SPAN></FONT></DIV>
<P class=MsoNormal><B><FONT face=Tahoma size=2><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=2><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Tahoma">
jdom-interest-bounces@jdom.org [mailto:jdom-interest-bounces@jdom.org]
<B><SPAN style="FONT-WEIGHT: bold">On Behalf Of </SPAN></B>Paul
Libbrecht<BR><B><SPAN style="FONT-WEIGHT: bold">Sent:</SPAN></B> Tuesday,
March 18, 2008 6:31 PM<BR><B><SPAN style="FONT-WEIGHT: bold">To:</SPAN></B>
frode@fritid.as; jdom-interest@jdom.org interest<BR><B><SPAN
style="FONT-WEIGHT: bold">Subject:</SPAN></B> Re: [jdom-interest] Comparing
two XML files using JDOM</SPAN></FONT><o:p></o:p></P></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt">Oh but if it's just about equality, and not reporting
differences, then I have already implemented
one.<o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt">Vijay, what was your
objective?<o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt">paul<o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P>
<DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt">Le 18 mars 08 à 13:10, <A
href="mailto:frode@fritid.as">frode@fritid.as</A> a écrit
:<o:p></o:p></SPAN></FONT></P></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><BR><BR><o:p></o:p></SPAN></FONT></P>
<P><FONT face="Times New Roman" size=3><SPAN style="FONT-SIZE: 12pt">XMLUnit
does comparisons, and you can plug in differnce-listners that can 'simulate'
equal values even if they're not<o:p></o:p></SPAN></FONT></P>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"> <o:p></o:p></SPAN></FONT></P></DIV>
<P style="MARGIN-BOTTOM: 12pt"><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><BR><BR></SPAN></FONT><B><FONT size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 9pt">Paul Libbrecht <<A
href="mailto:paul@activemath.org">paul@activemath.org</A>></SPAN></FONT></B><BR><FONT
size=1><SPAN style="FONT-SIZE: 9pt">Sent by: <A
href="mailto:jdom-interest-bounces@jdom.org">jdom-interest-bounces@jdom.org</A></SPAN></FONT><BR><FONT
size=1><SPAN style="FONT-SIZE: 9pt">03/18/2008 11:09
AM</SPAN></FONT><BR><BR> <FONT size=1><SPAN
style="FONT-SIZE: 9pt">To</SPAN></FONT> <FONT size=1><SPAN
style="FONT-SIZE: 9pt"><<A
href="mailto:vijayaraghavan.subramaniam@wipro.com">vijayaraghavan.subramaniam@wipro.com</A>></SPAN></FONT><BR> <FONT
size=1><SPAN style="FONT-SIZE: 9pt">cc</SPAN></FONT> <FONT size=1><SPAN
style="FONT-SIZE: 9pt"><A
href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</A></SPAN></FONT><BR> <FONT
size=1><SPAN style="FONT-SIZE: 9pt">bcc</SPAN></FONT> <BR> <FONT
size=1><SPAN style="FONT-SIZE: 9pt">Subject</SPAN></FONT> <FONT
size=1><SPAN style="FONT-SIZE: 9pt">Re: [jdom-interest] Comparing two XML
files using JDOM</SPAN></FONT><BR> <o:p></o:p></P>
<P><FONT face=Courier size=3><SPAN
style="FONT-SIZE: 12pt; FONT-FAMILY: Courier">Mmmh... I doubt there... it's
all loaded in memory, just as JDOM. Both seem to be somewhat lightweight
though (compared to monsters such as Xerces' DOM).</SPAN></FONT><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">You'd need quite an elaborate
diff engine to do something that does not load in memory, SAX or even StAX
parsing is not enough there since you may need to go
backwards.</SPAN></FONT><BR><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">I have made no tests with big
files.</SPAN></FONT><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">Do you have a solution to
propose?</SPAN></FONT><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">Do others?</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">paul</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">Le 18 mars 08 à 09:47,
<<U><FONT color=blue><SPAN style="COLOR: blue"><A
href="mailto:vijayaraghavan.subramaniam@wipro.com">vijayaraghavan.subramaniam@wipro.com</A></SPAN></FONT></U><FONT
color=black><SPAN style="COLOR: black">> a écrit
:</SPAN></FONT></SPAN></FONT><BR><BR><FONT face="Trebuchet MS" size=1><SPAN
style="FONT-SIZE: 8pt; FONT-FAMILY: 'Trebuchet MS'">Paul,</SPAN></FONT><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">How about performance
for parsing/comparing two large XML documents using 3dm
tool?</SPAN></FONT><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">Vijay</SPAN></FONT><o:p></o:p></P>
<DIV class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt">
<HR align=left width="99%" SIZE=4>
</SPAN></FONT></DIV>
<P class=MsoNormal><FONT face="Times New Roman" color=black size=3><SPAN
style="FONT-SIZE: 12pt; COLOR: black"><BR></SPAN></FONT><B><FONT face=Tahoma
size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">From:</SPAN></FONT></B><FONT
face=Tahoma size=1><SPAN style="FONT-SIZE: 8pt; FONT-FAMILY: Tahoma"> Paul
Libbrecht [<U><FONT color=blue><SPAN style="COLOR: blue"><A
href="mailto:paul@activemath.org">mailto:paul@activemath.org</A></SPAN></FONT></U><FONT
color=black><SPAN
style="COLOR: black">]</SPAN></FONT></SPAN></FONT><BR><B><FONT face=Tahoma
size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">Sent:</SPAN></FONT></B><FONT
face=Tahoma size=1><SPAN style="FONT-SIZE: 8pt; FONT-FAMILY: Tahoma"> Tue
3/18/2008 2:08 PM</SPAN></FONT><BR><B><FONT face=Tahoma size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">To:</SPAN></FONT></B><FONT
face=Tahoma size=1><SPAN style="FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">
Vijayaraghavan Subramaniam (WT01 -
Technology,Media,Transportation&Services)</SPAN></FONT><BR><B><FONT
face=Tahoma size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">Cc:</SPAN></FONT></B><FONT
face=Tahoma size=1><SPAN style="FONT-SIZE: 8pt; FONT-FAMILY: Tahoma"> <U><FONT
color=blue><SPAN style="COLOR: blue"><A
href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</A></SPAN></FONT></U></SPAN></FONT><FONT
color=black><SPAN style="COLOR: black"><BR></SPAN></FONT><B><FONT face=Tahoma
size=1><SPAN
style="FONT-WEIGHT: bold; FONT-SIZE: 8pt; FONT-FAMILY: Tahoma">Subject:</SPAN></FONT></B><FONT
face=Tahoma size=1><SPAN style="FONT-SIZE: 8pt; FONT-FAMILY: Tahoma"> Re:
[jdom-interest] Comparing two XML files using JDOM</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">I've never found such but 3dm
is a tool that can do XML diffing and with a few lines of code you plug the
jdom document through a sax stream into 3dm.</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">paul</SPAN></FONT><BR><BR><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">Le 18 mars 08 à 07:50,
<<U><FONT color=blue><SPAN style="COLOR: blue"><A
href="mailto:vijayaraghavan.subramaniam@wipro.com">vijayaraghavan.subramaniam@wipro.com</A></SPAN></FONT></U><FONT
color=black><SPAN style="COLOR: black">> a écrit
:</SPAN></FONT></SPAN></FONT><BR><BR><FONT face="Trebuchet MS" size=1><SPAN
style="FONT-SIZE: 8pt; FONT-FAMILY: 'Trebuchet MS'">Hi
All,</SPAN></FONT><BR><FONT face="Trebuchet MS" size=1><SPAN
style="FONT-SIZE: 8pt; FONT-FAMILY: 'Trebuchet MS'">Is there any utility class
exists in JDOM to compare two XML files.</SPAN></FONT><BR><FONT
face="Trebuchet MS" size=1><SPAN
style="FONT-SIZE: 8pt; FONT-FAMILY: 'Trebuchet MS'">Regards,</SPAN></FONT><BR><FONT
face="Trebuchet MS" size=1><SPAN
style="FONT-SIZE: 8pt; FONT-FAMILY: 'Trebuchet MS'">Vijay</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">The information contained in
this electronic message and any attachments to this message are intended for
the exclusive use of the addressee(s) and may contain proprietary,
confidential or privileged information. If you are not the intended recipient,
you should not disseminate, distribute or copy this e-mail. Please notify the
sender immediately and destroy all copies of this message and any
attachments.</SPAN></FONT><BR><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">WARNING: Computer viruses can be transmitted via
email. The recipient should check this email and any attachments for the
presence of viruses. The company accepts no liability for any damage caused by
any virus transmitted by this email.</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">www.wipro.com</SPAN></FONT><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">_______________________________________________</SPAN></FONT><BR><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">To control your jdom-interest
membership:</SPAN></FONT><BR><U><FONT face=Courier color=blue><SPAN
style="COLOR: blue; FONT-FAMILY: Courier"><A
href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</A></SPAN></FONT></U><FONT
color=black><SPAN style="COLOR: black"><BR><BR></SPAN></FONT><FONT
face=Courier><SPAN style="FONT-FAMILY: Courier">The information contained in
this electronic message and any attachments to this message are intended for
the exclusive use of the addressee(s) and may contain proprietary,
confidential or privileged information. If you are not the intended recipient,
you should not disseminate, distribute or copy this e-mail. Please notify the
sender immediately and destroy all copies of this message and any
attachments.</SPAN></FONT><BR><BR><FONT face=Courier><SPAN
style="FONT-FAMILY: Courier">WARNING: Computer viruses can be transmitted via
email. The recipient should check this email and any attachments for the
presence of viruses. The company accepts no liability for any damage caused by
any virus transmitted by this email.</SPAN></FONT><BR><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">www.wipro.com</SPAN></FONT><BR><BR><BR><FONT
face=Courier><SPAN
style="FONT-FAMILY: Courier">_______________________________________________<BR>To control
your jdom-interest membership:<BR><A
href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</A></SPAN></FONT><o:p></o:p></P>
<DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><smime.p7s><o:p></o:p></SPAN></FONT></P></DIV></DIV>
<P class=MsoNormal><FONT face="Times New Roman" size=3><SPAN
style="FONT-SIZE: 12pt"><o:p> </o:p></SPAN></FONT></P></DIV>
<P>The information contained in this electronic message and any attachments to
this message are intended for the exclusive use of the addressee(s) and may
contain proprietary, confidential or privileged information. If you are not
the intended recipient, you should not disseminate, distribute or copy this
e-mail. Please notify the sender immediately and destroy all copies of this
message and any attachments.</P>
<P>WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses. The
company accepts no liability for any damage caused by any virus transmitted by
this email.</P>
<P>www.wipro.com</P></BLOCKQUOTE></BODY></HTML>