Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.
This page summarizes the work of the Semantic Annotations for WSDL (SAWSDL) Working Group which was started by W3C in April 2006 and is currently continuing. The objective of the Working Group is to develop a mechanism to enable semantic annotation of Web services descriptions.
Researchers at Google annotated English-language Web pages from the ClueWeb09 and ClueWeb12 corpora. The annotation process was automatic, and hence imperfect. However, the annotations are of generally high quality, as they strove for high precision (and, by necessity, lower recall). For each entity they recognized with high confidence, they provide the beginning and end byte offsets of the entity mention in the input text, its Freebase identifier (mid), and two confidence levels (computed differently, see below).
You might consider using this data in conjunction with the recently released Freebase annotations of several TREC query sets.
Gromit-MPX is an on-screen annotation tool that works with any Unix desktop environment under X11 as well as Wayland. - GitHub - bk138/gromit-mpx: Gromit-MPX is an on-screen annotation tool that works with any Unix desktop environment under X11 as well as Wayland.
T. Tran, N. Tran, A. Hadgu, und R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Seite 97--106. Association for Computational Linguistics, (September 2015)
R. Snow, B. O'Connor, D. Jurafsky, und A. Ng. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Seite 254--263. Honolulu, Hawaii, Association for Computational Linguistics, (Oktober 2008)
R. Jesus, D. Goncalves, A. Abrantes, und N. Correia. Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008. IEEE Computer Society Conference on, (Juni 2008)
P. Chirita, S. Costache, W. Nejdl, und S. Handschuh. WWW '07: Proceedings of the 16th International Conference on World Wide Web, Seite 845--854. New York, NY, USA, ACM, (2007)
R. Yan, A. Natsev, und M. Campbell. MS '07: Workshop on multimedia information retrieval on The many faces of multimedia semantics, Seite 13--20. New York, NY, USA, ACM, (2007)
J. Tang, M. Hong, J. Li, und B. Liang. International Semantic Web Conference, Volume 4273 von Lecture Notes in Computer Science, Seite 640-653. Springer, (2006)
L. von Ahn, und L. Dabbish. CHI '04: Proceedings of the SIGCHI conference on Human factors in computing systems, Seite 319--326. New York, NY, USA, ACM, (2004)