Howard Katz <howardk@fatdog.com>
June 27, 2004
XsRQL is a query language for RDF that derives much of its syntax and style from XQuery, hence its name: an XQuery-style RDF Query Language. The idea is to reuse many of the useful and innovative features and metaphors the XML Query Working Group spent so many thousands of hours developing, while omitting the more complex parts of the XQuery specification that are specific to XML and not required in an RDF environment.
The "style" qualifier in the name is important: XsRQL sits on top of an RDF data model and knows nothing about XML or the complexities of the XQuery data model; on the other hand, it borrows happily and unashamedly from the XQuery surface syntax, its concept of an underlying, formal data model, its functional programming metaphor, and a number of the other innovations pioneered by the XML Query working group that are described below.
The basic idea is to reuse some of the fruits of the tens of thousands of long hours and hard work the working group put into XQuery, arguably the W3C's most complex specification. In the end, RDF is far simpler than XML, and XsRQL is correspondingly far simpler than XQuery. It shamelessly steals, er, borrows, much of what's best about XQuery and ignores the rest.
I've tried to keep the amount of blue-skying in the following to a minimum (though I might not have always succeeded; some of the sample code below has yet to see silicon, and it's hard not to occasionally wax rhapsodic, particularly with a cider or two in hand.) I've implemented much of the path language in prototype form and hope to be able to demo some live code at the face-to-face in San Diego.
The main objectives of XsRQL are:
This document looks briefly at the XsRQL feature set, most of which are drawn directly or indirectly from existing mechanisms in XQuery, provides numerous code snippet examples of the path language that's central to XsRQL, works its way through a somewhat herky-jerky tutorial overview of the language in general, and finishes by providing several illustrative examples of XsRQL queries compared and contrasted with similar queries in RDQL and other existing RDF query languages.
As well, a working prototype of an early first cut at a JavaCC grammar is attached (if only to prove that the author isn't living entirely in cloud cuckoo land.)
For those who are impatient to get up to speed and don't derive as much pleasure listening to the author speak as he does himself, I'd suggest jumping right into working code: work your way through the numerous code snippets in the sections on the XsRQL path language and the Examples.
Lastly, thanks to Andy Seaborne of HP Labs whose Jena tutorial was helpful in bringing me up to speed on RDQL, and whose look and feel so impressed me with its straightforward simplicity (the document that is, not Andy) that I have, with his concurrence, adopted its style as my own.
XsRQL is a functional language in that the output resulting from evaluating each expression in the query tree becomes input to the expression above it. The sequence that ultimately emerges from the top of the query tree is the result of the query.
This makes it possible to cascade XsRQL functions together and is part of what makes the language composable. For example:
count( sorted( distinct( @* ) ) ) + 1
The output of distinct() (a graph-oriented, unique list of every resource in the datastore) feeds into sorted() (which sorts it into alphabetic order if the implementation hasn't already done so in distinct()), which in turn feeds into count().
The major difference from XQuery is that the XsRQL data model understands entities that are germane to RDF and not XML. At present, XsRQL knows about:
Let's walk through an almost trivial example of how queries are evaluated and result sequences created in XsRQL. Here's a very simple query:
"I like this language.", " So do I!", "+", 1
The parse tree for this query looks something like the following:
commaOp / \ "I like this language." commaOp / \ " So do I!" commaOp / \ "+" 1
The comma operator concatenates its lefthand operand with its righthand operand. As each item is encountered as the query processor walks the query tree, the comma operator at each stage first evaluates its lefthand side. The result of evaluating a string expression (they're all strings in this example, except for the single integer "1" at the end of the sequence) is to create a singleton sequence containing the item. The operator then combines that singleton sequence with the result of evaluating its righthand side, which causes it to recursively call the next comma operator down the line.
At the end of this sequence of recursive evaluations, the following five-item heterogeneous sequence emerges from the top of the query:
"I like this language"-[str] " So do I!"-[str] "+"-[str] 1-[int] "!"-[str]
Once query evaluation is complete, the results are serialized: String items are printed to the result stream as they're encountered in the result sequence, and the lexical string representation of the single integer value is likewise printed. The final result of this is:
I like this language. So do I! +1
It looks like a single string, but from the query processor's perspective, it's a sequence of four consecutive items in a heterogeneous result sequence. Result sequence is a more accurate term than result set, since order is often important and duplication is allowed.
let $libbysMailboxes := @foaf:mbox[ "mailto:libby.miller@bristol.ac.uk" ]/* return if ( count( $libbysMailboxes ) = 0 ) then "Libby doesn't have a mailbox" else if ( count( $libbysMailboxes ) > 1 ) or ( count( *[ $libbysMailBoxes ] ) > 1 ) then "Libby's mailbox isn't inverse functional!" else "Libby has a single @foaf:mbox as expected: ", $libbysMailBoxes
XsRQL adopts a navigational style of maneuvering through an RDF graph that is very similar to the way XPath navigates through XML, with a few interesting differences. The main difference is that RDF is not XML, and the entities you specify in an XsRQL path are RDF entities, not XML ones. XsrPath (so-called) knows about such RDF concepts as subjects, predicates, and objects, as well as various node types: uri-addressable resources, bnodes, and literals, as well as triples and quads. It allows both an instances-, triples-based view of the datastore, as well as a graph-based view, depending on the user's needs and preferences.
Paths can be of any length, from a single node or predicate on up. A "striped" style of alternating nodes and "@"-prefixed predicates makes it easy to orient yourself visually as you move down the path.
What might most surprise those familiar with XPath is that the "attributes" in XsrPath are not terminal leaves. In XQuery/XPath, attributes are leaves; they terminate a path. In XsRQL, they simply mark property arcs that are way stations on the way to somewhere else.
Non-XPath 2.0 users are sometimes surprised to see such strange things in the path as function calls and constructed elements, such as in:
doc( "bib.xml" )/bib/books-with-editors( book )/editor
This XPath says "Call the user-defined function, books-with-editors(), passing in all <book> children of the <bib> root in the document "bib.xml", and return the <editor> children of those books that have <editor> children. Once that function returns its <book>-sequence result, dereference that and return the <editor>s themselves."
(Wonderful example tho this might be, it's a no-op: the same path without the inserted function would work just as well. This is called pedagogy.)
XsrPaths can likewise contain embedded functions and triple constructors.
XQuery supports approximately 150-or-so built-in functions. XsRQL could easily cherry-pick a dozen or two of the most useful of these, adapting them to an RDF context where necessary. Interesting to note: by rough count more than half of these are in place to support operations on XML Schema datatypes.
My current, very immature prototype of an XsRQL processor implements the following built-ins at this point:The argument types above are part of a type-expression-language subset of the grammar that's yet to be worked out in its entirety (though I don't have any concerns this will cause any great difficulty).
It's my personal belief that some degree of typing and type-checking is a good thing. At the very least, we can use it to document what sort of operands need to be delivered to built-in and user-defined functions, as shown above, and enforce that in code if desired. I'm of the opinion that XsRQL should be "lightly typed," if for no other reason than to inform the user when he or she is doing something that's patently foolish or doesn't make sense.
If that perspective is adopted, it will be interesting for the working group to work out what to do when operators have type-mismatched operand types, something that occurs primarily in comparisons and arithmetic expressions. The possible choices seem to be:
The fact that one or both operands can also be either singleton or multi-valued sequences adds a further wrinkle. Because of the complexity caused by adding XML nodetypes to the mix, this caused the XQuery group no end of time and effort in determining how to handle all eventualities. I don't think however that it should be all that difficult to do with the much simpler RDF data model. I definitely think it's worth doing.
This query uses a full wildcard in the subject position, saying, "Get me all subjects of a dc:title predicate:
declare prefix dc: = <http://purl.org/dc/elements/1.1/>; *[ @dc:title ]
This query uses a partially wildcarded predicate to say, "Get me all the Dublin Core predicates, and only the Dublin core predicates, in the datastore".
declare prefix dc: = <http://purl.org/dc/elements/1.1/>; @dc:*
Coming soon ...
XsRQL has two mechanisms for inserting triples into the result sequence. The first, a triples constructor syntax, provides a fairly free-form method of generating new triples, either built completely from scratch or partially or fully seeded by existing values. The second mechanism, a built-in triples() function, returns triples that already exist in the datastore, seeding the function with a single argument.
The main distinction between the two is that triple constructors let you specify all three subject, predicate, and object positions using either constants or XsRQL path-language expressions. This triples-generating capability can be used both to return existing triples as well as to create new ones, and makes it possible to do XSLT-like transformations on existing graphs. The triples() function, by contrast is a triples-finder, only allows a single path-language expression as its solitary argument and only returns triples that already exist.
As an example, the triple constructor in the following code snippet, adapted from Example 6, is being used to transform a triple from an existing vocabulary into a new one:
declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; declare prefix newFoaf: = <http://some/foafish/vocab/>;let $libby := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ] return { $libby, @newFoaf:Name, $libby/@foaf:name/* }
Example 1 provides another example of triple constructor usage.
The triples() function in the following snippet, on the other hand:
declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; triples( @foaf:mbox )
returns a sequence of all existing triples in the datastore that contain a foaf:mbox predicate. Any XsRQL path-language expression is allowed as the argument, so you can select triples based on whatever path-language constraints you wish. See the XsRQL path language for a fairly good sample of what those are.
A quads() function similarly generates quads where provenance is required.
In XsRQL, the path language is everything. Here's a quick introductory walk-through. Some of the guiding principles are:
Here's a number of short snippets demonstrating the above principles:
Return all nodes (subject and objects) in the datastore*
resource()
count( * )
*/@*
@*
*[ @* ]
subject()
@*/*
object()
@*/literal()
literal()
triples( sorted( literal() ))
count( object() ) = count( literal() ) + count( @*/resource() )
distinct( @* )
count( distinct( @* ) )
sorted( distinct( @*/* ))
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; count( distinct( @ciafb:* ) )
<http://www.odci.gov/cia/publications/factbook/af.html>
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; triples( ciafb:af.html )
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@*
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@ciafb:GDP_per_capita
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@ciafb:GDP_per_capita/*
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@ciafb:GDP_per_capita/literal()
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@ciafb:GDP_per_capita/resource()
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html[ @ciafb:GDP_per_capita/resource() ]
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:*[ @ciafb:Airports_with_paved_runways ]
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; *[ @ciafb:Airports_with_paved_runways ]
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; { *, @ciafb:Airports_with_paved_runways, * }
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; *[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; *[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]/@ciafb:Name/*
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; distinct( *[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]/@* )
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; { *, @ciafb:Airports_with_paved_runways, * } | { *, @ciafb:Airports_with_unpaved_runways, * }
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@*/*
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:af.html/@*/bnode()
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; ciafb:*
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; distinct( ciafb:* )
declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>; let $afghanistan = *[ @ciafb:Name = "Afghanistan" ] return { $afghanistan, $afghanistan/@*, $afghanistan/@*/* }
declare prefix = <http://www.w3.org/2001/02pd/rec54#>; declare prefix dc: = <http://purl.org/dc/elements/1.1/">; declare prefix contact: = <http://www.w3.org/2000/10/swap/pim/contact#">; declare datasource w3c = <http://www.w3.org/2002/01/tr-automation/tr.rdf>; <w3c>//*/dc:title/*[ @editor/*/@contact:fullName = "Steven Pemberton" ]
declare prefix = <http://www.w3.org/2001/02pd/rec54#>; declare prefix dc: = <http://purl.org/dc/elements/1.1/">; declare prefix contact: = <http://www.w3.org/2000/10/swap/pim/contact#">; declare datasource w3c = <http://www.w3.org/2002/01/tr-automation/tr.rdf>; <w3c>//@editor/*/[ @contact:fullName/endsWith( "Pemberton" ) ]/@contact:fullName
declare prefix dc: = <http://purl.org/dc/elements/1.1/">; datasource( "http://www.w3.org/2002/01/tr-automation/tr.rdf" )//*[ dc:date >= "1999-06-01" ]
Let's walk through a few simple XsRQL query examples to get a better idea of how things work.
Here's an example from the Jena Tutorial that says, "Give me every vcard: Full Name (Formal Name?) in the repository":
select ?x, ?fname where (?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> ?fname)
x | fname ================================================ <http://somewhere/JohnSmith/> | "John Smith" <http://somewhere/RebeccaSmith/> | "Becky Smith" <http://somewhere/SarahJones/> | "Sarah Jones" <http://somewhere/MattJones/> | "Matt Jones"
If you were satisfied in seeing just the names, your query could be as simple as this in XsRQL:
@<http://www.w3.org/2001/vcard-rdf/3.0#FN>/*
The rightmost wildcard, which is what we're returning, represents any object that is downstream of a vard:FN predicate. "@..." means predicate.
(This following output assumes that the query engine is automatically outputting an auto-linefeed option after every line, but that's at the discretion of the implementation):
John Smith Becky Smith Sarah Jones Matt Jones
If your query engine doesn't have an auto-linefeed feature or it's not enabled (how you set environment options is shown below), you'd have to use a for statement to isolate each individual person node in turn, and use the chr() function to insert your own linefeeds as follows:
for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN>/* return $person, chr(10)
chr() is a simple built-in function that takes an integer argument and returns the Unicode equivalent. In this case, it's a linefeed which is injected right into the result stream with expected, ahem, results. The comma (",") in the clause is an expression-concatenating operator that takes two arguments, in this case a $person node on its left and a single-character string on its right, and concatenates the two together. The effect at emit time of serializing the two items in sequence is to produce the name of the person, followed by a linefeed as expected.
Note that we've left off the heading to the report, which RDQL (or at least Jena?) produces automatically. We can make the same thing happen in XsRQL:
"x | fname\n", "-------------------------------------------------\n", for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN>/* return $person, chr(10)
Again, we're evaluating expressions and injecting them into the data model instance as they're encountered and evaluated in the query tree. In this case we evaluate and embed two string items before encountering the for statement and evaluating that. In this case we can embed an escaped "\n" linefeed character directly at the end of both strings without having to evaluate a chr() function.
Finally, the same query can be shortened by rewriting it using QName notation as in the following. The results would be identical. Using QNames doesn't save you much in this particular example; they're more useful when your queries become significantly longer than this one.
declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>; @foaf:FN/*, chr(10)
This query also shows our first use of a query prolog to set up the namespace prefix. Our parser recognizes this statement as part of a prolog because:
Any number of prolog declarations can be strung together and used to:
QNames are also useful when you both want to create more readable results in the result sequence, as well as reducing bandwidth. The XsRQL:emitQNames declaration in the following query reports all distinct predicates in the datastore of interest. This query assumes a long list of results coming back. XsRQL:emitQNames reduces bandwidth by shipping a short result-sequence preamble that provides the QName-to-uri mapping the client needs, followed by the result-sequence proper. This query:
declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>; XsRQL:emitQNames; distinct( @* )
might produce something like the following:
XsRQL:resultPreamble { declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>; } @foaf:accountName @foaf:accountServiceHomePage @foaf:aimChatID @foad:based_near ...
The client can easily parse the incoming result sequence to strip off the preamble, as well as noting the QName definition(s) needed to reconstitute the full uris.
Nota: There's been some discussion in the DAWG group about latency issues involved in shipping QNames; I've been able to implement a version that appears to have very little latency (yet to be tested). The key to making this work is that the prefix mappings must first be explicitly set in the query prolog by the user, as above.
XsRQL:emitQNames; declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>; @foaf:FN/*, chr(10)
I mentioned earlier that some implementations might provide a user option to do auto-linefeeds. You would declare that option as follows:
XsRQL:autoLineFeed; declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>; @foaf:FN/*
If you wanted something a bit closer to the RDQL result format, we'd need to be able to access each individual person using a for statement, saying some like:
for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN> return $person/*, " ", \"$person/*\", chr(10)
with the following results:
<http://somewhere/JohnSmith/> "John Smith" <http://somewhere/RebeccaSmith/> "Becky Smith" <http://somewhere/SarahJones/> "Sarah Jones" <http://somewhere/MattJones/> "Matt Jones"
$person is actually a sequence of @foaf:FN predicates and not subjects as you might expect. We do that in this case because we're dereferencing from the predicate to its downstream literal in the return clause.
Note the leading wildcard preceding the filter on @foaf:FN. This says we're grabbing subjects and not predicates.
If you were using this particular reporting style a lot, you might consider writing it up as a user-defined function and possibly making it external (neither capability is discussed in this version of the language spec).
In RDQL:
SELECT?x, ?y WHERE (?x :marriedTo ?y) (?x :age ?xAge ) ( ?y :age ?yAge ) and ?xAge < ?yAge
In XsRQL:
for $x in *[ @<marriedTo> ] for $y in $x/@<marriedTo>/* where $x/@<age>/* < $y/@<age>/* return { $x, @<marriedTo>, $y }
If we wanted to walk our way through this code at runtime, we would say:
Note the use of positional context to distinguish the subject partner in the first for statement from the object partner in the second for statement. The wildcarded resource being assigned to the $x variable in the first statement is a subject because (1) it immediately precedes a predicate, and (2) it's the rightmost, non-filtered item in the path. (It's also the only non-filtered item in the path.) The wildcarded resource being assigned to the $y variable in the second for statement is an object because it immediately follows a predicate. (The predicate could be embedded in a filter or directly inline as it is here; it wouldn't make any difference.)
This is one of the examples from the "Query and Rule languages Use Cases and Examples" document at http://rdfstore.sourceforge.net/2002/06/24/rdf-query/query-use-cases.html.
The query ties together triples from the FOAF vocabulary and the RDF Interest Group's GEO vocabulary, which uses WGS84 (World Geodesic Survey) longitude and latitudes.
In RDQL:
select ?uri,?name, ?lat, ?lon from <http://foaf.asemantics.com/dirkx> where (?person, <rdf:type>, <foaf:Person>), (?person, <foaf:name>, ?name), (?person, <foaf:based_near>, ?bn), (?person, <foaf:mbox>,?uri), (?bn, <pos:lat>, ?lat), (?bn, <pos:long>, ?lon) using rdfs FOR <http://www.w3.org/2000/01/rdf-schema#>, foaf FOR <http://xmlns.com/foaf/0.1/>, pos FOR <http://www.w3.org/2003/01/geo/wgs84_pos#>,
In XsRQL:
declare prefix pos: = <http://www.w3.org/2003/01/geo/wgs84_pos#>; declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; declare datasource dirksFoafFile = <http://foaf.asemantics.com/dirkx>; for $person in dirksFoafFile//*[ @rdf:type/foaf:Person ] return $person/@foaf:mbox/*, ", ", $person/@foaf:name/*, ", ", $person/@foaf:based_near/*/@pos:lat/*, ", ", $person/@foaf:based_near/*/@pos:long/*, chr(10)
I've omitted the rdf: prefix declaration, since this is a well-known namespace prefix to XsRQL.
If you wanted to simplify the query slightly and improve performance a bit (since you wouldn't have to dereference down the path from $person quite as far), you could add a temporary $location variable and say:
declare prefix pos: = <http://www.w3.org/2003/01/geo/wgs84_pos#>; declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; let $dirksFile := datasource( <http://foaf.asemantics.com/dirkx> ) for $person in $dirksFile//*[ @rdf:type/foaf:Person ] let $location := $person/@foaf:based_near/* return $person/@foaf:mbox/*, ", ", $person/@foaf:name/*, ", ", $location/@pos:lat/*, ", ", $location/@pos:long/*, chr(10)
This example comes from an IBM developerWorks article by Philip McCarthy titled an Introduction to Jena, which looks at using Jena with a WordNet ontology. The following RDQL query is used to find all the WordNet "hypernyms" of the words "panther" and "tiger":
SELECT ?wordform, ?definition WHERE (?firstconcept, <wn:wordForm>, "panther"), (?secondconcept, <wn:wordForm>, "tiger"), (?firstconcept, <wn:hyponymOf>, ?hypernym), (?secondconcept, <wn:hyponymOf>, ?hypernym), (?hypernym, <wn:wordForm>, ?wordform), (?hypernym, <wn:glossaryEntry>, ?definition) USING wn FOR <http://www.cogsci.princeton.edu/~wn/schema/>
The RDQL resultset is:
wordform | definition ===================================================================================== "big cat" | "any of several large cats typically able to roar and living in the wild" "cat" | "any of several large cats typically able to roar and living in the wild"
The equivalent query in XsRQL is:
declare prefix wn: = <http://www.cogsci.princeton.edu/~wn/schema/>; "wordform | definition\n", "=======================================================================\n", for $concept in *[ @wn:wordForm = "panther" or @wn:wordForm = "tiger" ] return $concept/@wn:wordForm/*, " | ", $concept/@wn:definition/*
select ?name, ?title, ?identifier where (dc::title ?paper ?title) (dc::creator ?paper ?creator) (dc::identifier ?paper ?uri) (foaf::name ?creator ?name) (foaf::mbox ?creator mailto:libby.miller@bristol.ac.uk) using dc for http://purl.org/dc/elements/1.1/ foaf for http://xmlns.com/foaf/0.1/
The main thing to note, in attempting to move from the above triples formulation to a path-based one, is that the "?creator" person who's the subject owner of the foaf:mailbox in statement #5 above, is also the object "?creator" person who's created the paper in statement #2.
Using an ad hoc amalgam of XsrPath with an RDQL-style variable-binding notation, we can concatenate the two relationships into a single path describing who knows what about what and who does what to whom:
?paper/@dc:creator/?libbyPerson/@foaf:mbox/"mailto:libby.miller.@bristol.ac.uk"
What we want to do is to isolate the "libbyPerson" in the middle of the path as follows:
declare prefix dc: = <http://purl.org/dc/elements/1.1/>; declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; let $tab := chr(9), $lf := chr(10), $libbyPerson := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ], $libbyPapers := *[ @dc:creator/$libbyPerson ] return ( $libbyPerson/@foaf:name/*, " has written ", count( $libbyPapers ), " papers:", $lf, for $paper in $libbyPapers return ( $tab, $paper/@dc:identifier/*, ": ", $paper/@title/*, $lf ) )
The key to understanding the two variable assignments in the middle of the query:
$libbyPerson := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ], $libbyPapers := *[ @dc:creator/$libbyPerson ]
is to note that any self-respecting implementation should first be able to readily find all foaf:mbox's with a value of "mailto:libby.miller@bristol.ac.uk" and from there be able to find the owner(s) of such a mailbox. Once that node has been found and assigned to $libbyPerson, the implementation should equally easily be able to examine all its dc:creator predicates to determine which one points to Libby, whether this is done by brute force, by doing joins on an SQL backend, or by following internal data pointers from predicate to object.
Note that we've added a few variable definitions to better document our use of tabs and linefeeds, as well as a shortcut for cascading let clauses that lets us use comma separators between clauses, instead of forcing us to repeat the word "let" over and over again.
Without knowing anything about the specifics of Libby's particular publishing history, the results might look something like the following:
Libby Miller has written 406 papers: 1987-03-02-1: By Gun and Camera Through the Alimentary Canal 1987-03-02-2: RDF: A History of Renal Dental Failure among the Flemish 1987-04-10-1: My Fabulous Childhood. Life amongst the Gypsies in Paris, Rome, and Bratislawa 1988-11-10-1: The Seduction of Technology ...
This last example looks at the usage of an if statement in XsRQL. We can use an if to check the validity of the datastore vis a vis the inverse-functional status of Libby's mailbox:
declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; let $libbysMailboxes := @foaf:mbox[ "mailto:libby.miller@bristol.ac.uk" ]/* return if ( count( $libbysMailboxes ) = 0 ) then "Libby doesn't have a mailbox" else if ( count( $libbysMailboxes ) > 1 ) or ( count( *[ $libbysMailBoxes ] ) > 1 ) then "Libby's mailbox isn't inverse functional!" else "Libby has a single @foaf:mbox as expected: ", $libbysMailBoxes
To close with one final, short example, here's the use of an if expression, combined with the built-in function exists(), to return an optional result. The following query returns a constructed triple containing Libby's name, followed by a triple containing her mailbox if she has one:
declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; let $libby := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ] return { $libby, @foaf:name, $libby/@foaf:name/* }, if ( exists( $libby/@foaf:mbox )) then { $libby, @foaf:mbox, $libby/@foaf:mbox/* } else ()
The grammar below is a first cut and is still incomplete. Along with most of the XPath-style kindtests shown in the XsrPath snippets above, the grammar is most noticeably still missing productions to handle:
The grammar is short and sweet, certainly when compared to the XQuery BNF, its progenitor. The XQuery grammar by comparison is several hundred productions long and uses some twenty-five (25) lexical states to enable proper lexing. Debugging it was a huge amount of fun. Not. This one's a piece of cake by comparison.
getQueryAST | ::= | mainModule |
mainModule | ::= | prolog ( queryBody ) |
prolog | ::= | ( ( prefixDecl | dawgDecl ) <SemiColon> )* |
prefixDecl | ::= | <DeclareNamespace> <NCPrefixName> <AssignEquals> <Uriref> |
dawgDecl | ::= | <QName> |
queryBody | ::= | exprSequence |
exprSequence | ::= | expr ( <Comma> exprSequence )? |
expr | ::= | ifExpr |
| | orExpr | |
ifExpr | ::= | ( <IfLpar> exprSequence <Rpar> <Then> expr <Else> expr ) |
orExpr | ::= | andExpr ( <Or> andExpr )? |
andExpr | ::= | generalComparison ( <And> andExpr )? |
generalComparison | ::= | additiveExpr ( ( <Equals> | <NotEquals> | <Lt> | <LtEquals> | <Gt> | <GtEquals> ) )? |
additiveExpr | ::= | multiplicativeExpr ( ( <Plus> | <Minus> ) additiveExpr )? |
multiplicativeExpr | ::= | unaryExpr ( <Multiply> unaryExpr )? |
unaryExpr | ::= | ( ( <UnaryMinus> ) | ( <UnaryPlus> ) )* unionExpr |
unionExpr | ::= | dawgPath ( "|" unionExpr )? |
dawgPath | ::= | sPath |
| | pPath | |
| | oPath | |
sPath | ::= | subjectStep ( filteredSubject )? ( <Slash> pPath )? |
subjectStep | ::= | primaryExpr |
| | qName | |
| | wildcard | |
| | uriRef | |
| | anyLiteralTest | |
filteredSubject | ::= | <Lbrack> pPath <Rbrack> ( filteredSubject )? |
pPath | ::= | predicateStep ( filteredPredicate )? ( <Slash> oPath )? |
predicateStep | ::= | ( <At> ( qName | wildcard | uriRef ) ) |
filteredPredicate | ::= | <Lbrack> oPath <Rbrack> ( filteredPredicate )? |
oPath | ::= | sPath |
| | literal | |
wildcard | ::= | <Star> |
| | <NCNameColonStar> | |
anyLiteralTest | ::= | <AnyLiteralLpar> <RparForAnyLiteralTest> |
primaryExpr | ::= | literal |
| | functionCall | |
| | variable | |
| | parensExpr | |
| | tripleCtor | |
variable | ::= | <VariableIndicator> <VarName> |
literal | ::= | integerLiteral |
| | stringLiteral | |
parensExpr | ::= | <Lpar> ( exprSequence )? <Rpar> |
tripleCtor | ::= | <Lbrace> sPath <Comma> pPath <Comma> oPath <Rbrace> |
qName | ::= | <QName> |
integerLiteral | ::= | <IntegerLiteral> |
stringLiteral | ::= | <StringLiteral> |
functionCall | ::= | <QName> <Lpar> ( exprSequence )? <Rpar> |
uriRef | ::= | <Uriref> |