Abstract
When evaluating and comparing Answer Extraction and Question Answering systems one can distinguish between scenarios for different information needs such as the ``Fact Finding'', the ``Problem solving'', and the ``Generic Information'' scenarios. For each scenario, specific types of questions and specific types of texts have to be taken into account, each one causing specific problems. We argue that comparative evaluations of such systems should not be limited to a single type of information need and one specific text type. We use the example of technical manuals and a working Answer Extraction system, ``ExtrAns'', to show that other, and important, problems will be encountered in other cases. We also argue that the quality of the individual answers could be determined automatically through the parameters of correctness and succintness, i.e. measures for recall and precision on the level of unifying predicates, against a (hand-crafted) gold standard of ``ideal answers''.
Users
Please
log in to take part in the discussion (add own reviews or comments).