Next: Work Part #uncertainty>:
Up: 2.2 The Workplan
Previous: 2.2 The Workplan
Work Part 1: A logic for information
retrieval
Work Part 1 - ``A logic for information retrieval'' is organized in the
following way.

A short description of WP1 as a whole follows.
- WP1 - Objectives
- The objective of WP1 ``A logic for
information retrieval'' is to develop a logic for representing and
reasoning on the structure and the content of documents and queries, in a
way that captures the notion of relevance of documents to users' requests
and that is consistent with the theories developed in Work Parts
2 and 3. Besides being expressively adequate
for the task of document and query representation, this logic should be
computationally tractable, i.e., amenable to being implemented by means of
an algorithm with low computational cost.
- WP1 - Approach
- The underlying idea of this approach is that
document retrieval can be described as the extraction, from a given
document base, of those documents
that, given a query
, make the
formula
valid, where
and
are formulae of the
chosen logic and ``
'' denotes the brand of logical implication
formalized by the logic in question. The logic we are seeking should result
from the integration of different features, possibly belonging to different
families of logics from the mathematical logic literature, each addressing
a single aspect of the problem. As noted in the previous section, all the
logics that are being considered for integration have a well-known,
Tarski-style denotational semantics, thus ensuring that integration will
indeed be possible.
- WP1 - Expected results
- The main result that is expected from WP1
is a logic that combines a number of representational features into a
coherent whole. These features should include the possibility of
representing the ``internal architecture'' (structure) of a
multimedia document, its physical appearance (layout), and what the
document actually deals with (semantic content). Also, it is
expected that this logic will embody a notion of ``relevance of a document
to a query
'' that accounts for the complexities and the dynamic
nature of real-life information retrieval: this means that the notion of
partial relevance of
to
should be expressible. The chosen
logic will be implemented into a prototypical theorem prover that will be
the object of evaluation in Work Part 5.
Work Part 1 is further structured into Tasks T11 to
T13. We now give a concise description of the objectives, approaches
taken, and results expected from each of these tasks.
- T11 - Objectives
- The objective of T11 ``Modelling the
structure of documents'' is endowing the sought logic with primitives
allowing the representation and reasoning on the ``internal architecture''
(structure) of a multimedia document.
- T11 - Approach
- The approach taken in T11 will be to look at Terminological Logics (TLs) as the family of logics that are most likely to
provide an answer to the representation of document structure.
Terminological Logics have a denotational, Tarski-style semantics which
guarantees smooth integration with other logics that might be deemed
interesting for answering different needs within Work Part 1.
Within this task, however, the most interesting feature of these logics is
their being specifically oriented to the representation of objects
endowed with a complex internal structure, which is of considerable
prospective interest given the extremely complex structure of documents
(especially once integration is sought with the results expected from WP3
and dealing with multimediality). The investigation will mainly take the
form of an empirical study of the structure exhibited by real documents,
coupled with a formal study concerning the suitability of the primitives
discussed in the TLs literature to this representational task. This study
will possibly lead to the definition of brand new primitives specifically
geared to this representational task.
- T11 - Expected results
- The result expected from T11 is the
identification of representational primitives, to be integrated in the
sought logic, that be apt to the representation of structural information
about multimedia documents. It is also expected that those ``brand new''
primitives that will possibly have been identified as a consequence of this
investigation be given a full formal (i.e. syntactic and semantic)
specification.
- T12 - Objectives
- The objective of T12 ``Modelling the content
of documents'' is endowing the sought logic with primitives allowing the
representation and reasoning on what a document actually deals with (semantic content). This is of considerable importance in order to allow
the modelling of ``retrieval-by-content''.
- T12 - Approach
- The approach taken in T12 will be, again, to look
at Terminological Logics as the family of logics that are most likely
to provide an answer to the representation of the semantic content of
documents. The rationale of this is that such logics seem to have the
right blend of expressive power to allow representing and reasoning about
``what documents deal with''. This is a considerable step forward with
respect to former keyword-based content representations, such as the ones
adopted in both the vector model and the boolean model. Besides, this
approach is interesting in that it promises to allow a smooth integration
of lexical, domain knowledge (used in traditional information retrieval
systems under a variety of different forms, including dictionaries and
thesauri) into the representation so that it can participate in the content
analysis in a principled way.
- T12 - Expected results
- The result expected from T12 is the
identification of representational primitives to be integrated in the
sought logic that be apt to the representation of the content of multimedia
documents. Again, it is expected that those primitives that might have been
designed from scratch will be formally specified in full detail.
- T13 - Objectives
- The objective of T13 ``Modelling the
relevance of retrieval'' is finding, among the notions of implication
(``
'') formalized by the different classes of logics, the one
that is most suitable for modelling the relation of relevance of a document
to a query
in terms of the validity of the formula
.
- T13 - Approach
- The approach taken in T13 will be to look at a
number of ``non-classical'' logics that formalize implication relations
prospectively more interesting than the ``material implication'' formalized
by classical logic. Among the classes of logics that will be considered,
Modal Logics will be investigated as a possible framework in which to
express the ``semantic distance'' (in terms of knowledge about the document
content, about the user query, or about the domain knowledge) that
separates the document under exam from the query (i.e. to express ``how
partial'' partial relevance of
to
is). Fuzzy versions of
modal logics will also be investigated as a means of making explicit
various sources of uncertainty, such as the uncertainty related to domain
knowledge, or the uncertainty related to document indexes, that have an
impact in the final evaluation of the implication relation. Parallel to
the investigation into modal and fuzzy logics, an investigation into Relevance Logics will be pursued, as they show considerable promise of
embodying interesting features for the information retrieval endeavour (in
fact, relevance of premises to conclusion is the key concern underlying
relevance logics).
In this task too, attention will be paid only to logics endowed with a
denotational semantics, in order to ensure easy integration with the
results of other tasks within Work Part 1.
- T13 - Expected results
- The result expected from T13 is the
identification of a notion of implication (``
''), possibly
borrowed from some non-classical logic, that is most suitable for modelling
the relation of relevance of a document
to a query
in terms of the
validity of the formula
. Is is also expected that the
integration of this notion into the sought logic be given a full formal
(i.e. syntactic and semantic) specification.
- T14 - Objectives
- The objective of T14 ``Prototyping'' is
building a prototypical implementation of inferential algorithms for
reasoning in the logic resulting from Tasks T11 to T13.
- T14 - Approach
- The approaches that will be followed in order to
fulfil the objective of T14 will largely depend on the classes of logics
that are chosen within T11 to T13 for contributing the various
representational features of the sought logic; in fact, each class of logics
has its own approaches to inference, usually well-established from the
related literature. At the current stage, the only approach that seems a
relatively definite choice is the use of constraint propagation
techniques for implementing the representational primitives that the MIR
logic will likely borrow from terminological logics, as from Tasks T11 and
T12.
- T14 - Expected results
- The result expected from T14 is a prototype
of the MIR logic that will be the subject of evaluation (in WP4) against a
multimedia document base of realistic size.
The participating (P) consortium members for each of the Tasks in
Work Part 1 are listed in the following table.

Next: Work Part #uncertainty>:
Up: 2.2 The Workplan
Previous: 2.2 The Workplan