Next: Work Part #uncertainty>: Up: 2.2 The Workplan Previous: 2.2 The Workplan

Work Part 1: A logic for information retrieval

Work Part 1 - ``A logic for information retrieval'' is organized in the following way.

A short description of WP1 as a whole follows.

WP1 - Objectives
The objective of WP1 ``A logic for information retrieval'' is to develop a logic for representing and reasoning on the structure and the content of documents and queries, in a way that captures the notion of relevance of documents to users' requests and that is consistent with the theories developed in Work Parts 2 and 3. Besides being expressively adequate for the task of document and query representation, this logic should be computationally tractable, i.e., amenable to being implemented by means of an algorithm with low computational cost.

WP1 - Approach
The underlying idea of this approach is that document retrieval can be described as the extraction, from a given document base, of those documents that, given a query , make the formula valid, where and are formulae of the chosen logic and ``'' denotes the brand of logical implication formalized by the logic in question. The logic we are seeking should result from the integration of different features, possibly belonging to different families of logics from the mathematical logic literature, each addressing a single aspect of the problem. As noted in the previous section, all the logics that are being considered for integration have a well-known, Tarski-style denotational semantics, thus ensuring that integration will indeed be possible.

WP1 - Expected results
The main result that is expected from WP1 is a logic that combines a number of representational features into a coherent whole. These features should include the possibility of representing the ``internal architecture'' (structure) of a multimedia document, its physical appearance (layout), and what the document actually deals with (semantic content). Also, it is expected that this logic will embody a notion of ``relevance of a document to a query '' that accounts for the complexities and the dynamic nature of real-life information retrieval: this means that the notion of partial relevance of to should be expressible. The chosen logic will be implemented into a prototypical theorem prover that will be the object of evaluation in Work Part 5.

Work Part 1 is further structured into Tasks T11 to T13. We now give a concise description of the objectives, approaches taken, and results expected from each of these tasks.

T11 - Objectives
The objective of T11 ``Modelling the structure of documents'' is endowing the sought logic with primitives allowing the representation and reasoning on the ``internal architecture'' (structure) of a multimedia document.

T11 - Approach
The approach taken in T11 will be to look at Terminological Logics (TLs) as the family of logics that are most likely to provide an answer to the representation of document structure. Terminological Logics have a denotational, Tarski-style semantics which guarantees smooth integration with other logics that might be deemed interesting for answering different needs within Work Part 1. Within this task, however, the most interesting feature of these logics is their being specifically oriented to the representation of objects endowed with a complex internal structure, which is of considerable prospective interest given the extremely complex structure of documents (especially once integration is sought with the results expected from WP3 and dealing with multimediality). The investigation will mainly take the form of an empirical study of the structure exhibited by real documents, coupled with a formal study concerning the suitability of the primitives discussed in the TLs literature to this representational task. This study will possibly lead to the definition of brand new primitives specifically geared to this representational task.

T11 - Expected results
The result expected from T11 is the identification of representational primitives, to be integrated in the sought logic, that be apt to the representation of structural information about multimedia documents. It is also expected that those ``brand new'' primitives that will possibly have been identified as a consequence of this investigation be given a full formal (i.e. syntactic and semantic) specification.

T12 - Objectives
The objective of T12 ``Modelling the content of documents'' is endowing the sought logic with primitives allowing the representation and reasoning on what a document actually deals with (semantic content). This is of considerable importance in order to allow the modelling of ``retrieval-by-content''.

T12 - Approach
The approach taken in T12 will be, again, to look at Terminological Logics as the family of logics that are most likely to provide an answer to the representation of the semantic content of documents. The rationale of this is that such logics seem to have the right blend of expressive power to allow representing and reasoning about ``what documents deal with''. This is a considerable step forward with respect to former keyword-based content representations, such as the ones adopted in both the vector model and the boolean model. Besides, this approach is interesting in that it promises to allow a smooth integration of lexical, domain knowledge (used in traditional information retrieval systems under a variety of different forms, including dictionaries and thesauri) into the representation so that it can participate in the content analysis in a principled way.

T12 - Expected results
The result expected from T12 is the identification of representational primitives to be integrated in the sought logic that be apt to the representation of the content of multimedia documents. Again, it is expected that those primitives that might have been designed from scratch will be formally specified in full detail.

T13 - Objectives
The objective of T13 ``Modelling the relevance of retrieval'' is finding, among the notions of implication (``'') formalized by the different classes of logics, the one that is most suitable for modelling the relation of relevance of a document to a query in terms of the validity of the formula .

T13 - Approach
The approach taken in T13 will be to look at a number of ``non-classical'' logics that formalize implication relations prospectively more interesting than the ``material implication'' formalized by classical logic. Among the classes of logics that will be considered, Modal Logics will be investigated as a possible framework in which to express the ``semantic distance'' (in terms of knowledge about the document content, about the user query, or about the domain knowledge) that separates the document under exam from the query (i.e. to express ``how partial'' partial relevance of to is). Fuzzy versions of modal logics will also be investigated as a means of making explicit various sources of uncertainty, such as the uncertainty related to domain knowledge, or the uncertainty related to document indexes, that have an impact in the final evaluation of the implication relation. Parallel to the investigation into modal and fuzzy logics, an investigation into Relevance Logics will be pursued, as they show considerable promise of embodying interesting features for the information retrieval endeavour (in fact, relevance of premises to conclusion is the key concern underlying relevance logics). In this task too, attention will be paid only to logics endowed with a denotational semantics, in order to ensure easy integration with the results of other tasks within Work Part 1.

T13 - Expected results
The result expected from T13 is the identification of a notion of implication (``''), possibly borrowed from some non-classical logic, that is most suitable for modelling the relation of relevance of a document to a query in terms of the validity of the formula . Is is also expected that the integration of this notion into the sought logic be given a full formal (i.e. syntactic and semantic) specification.

T14 - Objectives
The objective of T14 ``Prototyping'' is building a prototypical implementation of inferential algorithms for reasoning in the logic resulting from Tasks T11 to T13.

T14 - Approach
The approaches that will be followed in order to fulfil the objective of T14 will largely depend on the classes of logics that are chosen within T11 to T13 for contributing the various representational features of the sought logic; in fact, each class of logics has its own approaches to inference, usually well-established from the related literature. At the current stage, the only approach that seems a relatively definite choice is the use of constraint propagation techniques for implementing the representational primitives that the MIR logic will likely borrow from terminological logics, as from Tasks T11 and T12.

T14 - Expected results
The result expected from T14 is a prototype of the MIR logic that will be the subject of evaluation (in WP4) against a multimedia document base of realistic size.

The participating (P) consortium members for each of the Tasks in Work Part 1 are listed in the following table.



Next: Work Part #uncertainty>: Up: 2.2 The Workplan Previous: 2.2 The Workplan