Omission in near() when used in mixed contentStefan Majewski stefan.majewski at univie.ac.at
Wed Jan 21 14:03:26 PST 2009
Dear all, we are currently seeing Problems with near() when used with words span over element boundaries. We have a fulltext index with content="mixed" defined for the collection. We know that the index as such works, as near() works as expected with single words, even when they overlap element tags. Nevertheless when searching for a succession of multiple words the search fails if at least one of the words is split by an element. Assume the following xql: --- declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $q := "mixed test" return //tei:u[near(. , $q)] --- and this sample document: --- <?xml version="1.0" encoding="utf-8"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> <!-- snipped header --> <text> <body> <div> <u xml:id="u1"> this the first mi<seg type="overlap">xed test </seg> </u> <u xml:id="u2"> this the second mi<anchor/>xed test </u> <u xml:id="u3"> this is the third <seg type="overlap"> mixed </seg> test </u> <u xml:id="u4"> this is last <seg type="overlap"> mixed test </seg> </u> </div> </body> </text> </TEI> --- several searches yield very different results, even though they should imho be equal 1) $q="mixed" returns tei:u with id u1,u2,u3,u4 2) $2="mixed test" only returns tei:u with id u3,u4 Does anybody see a different behaviour? I might have misinterpreted something in the docs, such that the assumption that the second search should return the same four tei:u elements is wrong, or maybe there could also be a bug in near() or the fulltext index causing this issue. However it might be, I would be very glad to get some hints how I could circumvent this issue as I currently implement searches over highly segmented texts. cheers, Stefan
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format