man XTM::Path () - Topic Map management, XPath like retrieval and construction facility

NAME

XTM::Path - Topic Map management, XPath like retrieval and construction facility

SYNOPSIS

  use XTM::XML;
  $tm  = new XTM (tie => new XTM::XML (file => 'mymap.tm')); # binds variable to channel

  use XTM::Path;
  my $xtmp = new XTM::Path (default => $tm);

  # find particular topics and print topic id
  foreach my $t ($xtmp->find ('/topic[.//baseNameString/text() = "test"]')) {
    print $t->id;
  }

  # same using find twice
  foreach my $t ($xtmp->find ('/topic[.//baseNameString/text() = "test"]')) {
    print $xtmp->find ('@id', $t);
  }

  # create a topic
  $t = $xtmp->create ('topic[@id = "id0815"]');
  # same but with baseName
  $t = $xtmp->create ('topic[@id = "id0815"]/baseNameString[text() = "test"]');
  # associations are always cumbersome
  $a = $xtmp->create ('association[member
                                     [roleSpec/topicRef/@href = "#role1"]
                                     [topicRef/@href = "#player1"]]
                                  [member
                                     [roleSpec/topicRef/@href = "#role2"]
                                     [topicRef/@href = "#player2"]]');

DESCRIPTION

This class provides a simple way to drill down the XTM data structures by following an XPath like approach.

The XTM standard (http://www.topicmaps.org/xtm/) is used as the basis to formulate XTM-Path queries. To find a particular topic, for instance, you might use

  /topic[.//baseNameString = "some name"]

It is important to note that this package will NOT work on the original XTM document (this might even not exist if the map is created via other means), but is instead using the XTM::base data structure. This implies that all querying is done after merging and consolidation has been done.

Obviously, XTM::Path cannot be a complete query language, but it is useful in many development situations where drilling down the data structure is a cumbersome exercise. Together with intelligent CWadd methods in XTM::Memory and XTM::generic this should simplify drastically the access, creation and manipulation of XTM data structures.

Path Expressions

Axis:
While the syntax (see below) allows for child and descendant axes, both are ignored as the XTM structure is known apriori. This allows a considerable simplification compared to XPath. As a consequence, it does not make a difference to write
  /topic//resourceData
or
  /topic/resourceData
In both cases the interpreter knows that a CWresourceData element can only be within an occurrence. One caveat: The path expression
  /topic/instanceOf
addresses the CWinstanceOf elements directly below the CWtopic node but it hides those CWinstanceOfs inside the occurrences.
Context:
Path expressions are interpreted always relative to a particular context. That might be a complete topic map object, or any part of it. Thus the following expressions are equivalent:
  /topic
  ./topic
  //topic
  topic
Similarily for the '//' operator:
  //member
  .//member
  ...
Values:
As usual, the value of a Path is the text() addressed by it. In this sense
   /topic/baseName/baseNameString/text()
and
   /topic/baseName/text()
may have the same value (In XTM there is #PCDATA data allowed in other subelements of baseName).

Syntax

Currently expressions can have the following simple syntax (EBNF):

   path         --> { axis relativepath }

   axis         --> child | descendant

   child        --> './'  | '/'

   descendant   --> './/' | '//'

   relativepath --> ( XTM_element_name | '@' XTM_attribute_name | 'text()' ) { predicate }

   predicate    --> '[' expr ']'

   expr         --> simple_expr

   simple_expr  --> path | boolean_expr

   boolean_expr --> path compare_op value

   compare_op   --> '=' | '!=' | '<' | '>' | '<=' | '>='

   value        --> numeric | string | variable

   variable     --> ?name

Elements

Following XTM elements are not included: The XTM data structures are already completely merged. This is element would not appear. As the context is already a topic map object (or smaller), such an element would never been found.

Attributes

Following attributes are included: This is only applicable for CWtopic and CWassociation elements. When creating, the id attribute can be only be used together with topic, not with associations. This is only application for CWtopicRef, CWsubjectIndicatorRef, CWresourceRef elements.

Variables

See the hint about speed.

Examples

   # find a particular topic by id
   topic[@id = "sheryl_crow"]
   # find a topic by baseName
   topic[baseName/baseNameString = "If it Makes You Happy"]
   # equivalently
   topic[baseName = "If it Makes You Happy"]

   # find a particular association with a role
   association[member/roleSpec/topicRef/@href = "#artist"]
   # or a particular role player
   association[member/topicRef/@href = "#sheryl_crow"]
   # combine this
   association[member/roleSpec/topicRef/@href = "#artist"][member/topicRef/@href = "#sheryl_crow"]

Hints and Tips

Why is [0] and [position() = 2] not implemented?
The method CWfind will return a Perl list. Once you have this list, you can easily slice and index it. Also, the order in the data structure is a rather flaky criterion to search for. It makes sense to reference an order in a document, but after merging topics no simple and robust definition how a resulting topic is organized can be given.
Why is it not blindingly fast?
While I tried to be not too wasteful, there are some situations in which the code is evaluating some useless alternatives. This is when it has to 'guess' parent nodes, as in
  topic/@href
The more hints you provide, the more biased the traversal will be. So, for instance, the above can be sped up with:
  topic/instanceOf/topicRef/@href
The XTM syntax allows #PCDATA inside a baseNameString. The baseName may also may contain variants which - in turn - may contain another resourceData. So the above itself is not unambiguous. Use baseNameString[text() = something] instead.
How can I improve the speed?
Try to avoid parsing. The object will maintain cached copies of an already parsed expression, so here the package tries to take care itself. If you use always a slightly different expression, you might want to use variables, as in
  foreach my $n (...all names...) {
     $xtmp->find ('topic[baseNameString = ?n]', undef, { n => $name});
  }
That way the expression remains the same and can be cached.
It is still not fast. What else?
What you should also try to avoid is to create new objects too frequently. Every object needs a parser which has to be instantiated. This is also an expensive operation. There is no reason (aside from a slightly increased memory consumption) why you should not use one and the same object for various finds.
When creating data structures, they are not automatically filled with defaults according to XTM?
No, you should use the methods CWadd_default for XTM::topics and XTM::associations to explicitely control this once your are done with a particular create.

INTERFACE

Constructor

$xtmp = new XTM::Path (default => $tm)

The constructor returns a new XTM::Path objects which will be used further on to perform queries. Optional, you may pass any XTM object (maps or components thereof). This object will become the default context (ala XPath) which will be used in case no other context is explicitely used.

Example:

  $xtmp = new XTM::Path (default => $tm);

Methods

find
@nodelist = $xtmp->find ($path, [$context], [$value_hash]) find returns a unique list of subnodes of the context which conform to the XTM::Path specification provided as the first parameter. If the second parameter is missing the XTM::Path expression will be evaluated against the default context (see constructor). If there is no context (neither default or explicit), then an exception will be raised. Examples:
  # get the first topic with 'test' as baseName
  ($t) = $xtmp->find ('/topic[.//baseNameString == "test"]');
  # retrieve all baseNames of this topic
  @basenames = $xtmp->find ('/baseName', $t);
  # same in one step
  @basenames = $xtmp->find ('/topic[.//baseNameString == "test"]//baseName');
  # find all topics, providing explicitely a context
  @topics = $xtmp->find ('/topic', $tm2);
Since version 0.06 the object caches already parsed expressions to avoid expensive parsing at every invocation of find. To increase the cache rate you should consider to use variables (see Hints).
create
$node = $xtmp->create ($path) create returns exactly one new node conforming to the XTM::Path expression provided as first parameter. As the new data structure is built stand-alone, there is no need to pass-in or use a context. If the path specification is not consistent with XTM, an exception will be raised. If XTM::Path cannot find a UNIQUE path between two subsequent path steps, an exception will be raised (as in 'topic/member' or 'topic/topicRef'). Examples:
  my $o = $xtmp->create ('topic[baseNameString = "xxxx"][@id = "x11"]');
The object will cache successfully parsed expression. You cannot use variable inside path expressions here.

SEE ALSO

XTM::base

AUTHOR INFORMATION

Copyright 2002, Robert Barta <rho@telecoma.net>, Jan Gylta <jgylta@online.no>, All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. http://www.perl.com/perl/misc/Artistic.html