Inproceedings,

Intelligent GP fusion from multiple sources for text classification

B. Zhang, Y. Chen, W. Fan, E. Fox, M. Goncalves, M. Cristo, and P. Calado.
Proceedings of the 14th ACM international Conference on Information and Knowledge Management, Bremen, Germany, ACM Press, (October 2005)

Abstract

This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.

BibTeX key: Zhang:2005:IGF
entry type: inproceedings
address: Bremen, Germany
booktitle: Proceedings of the 14th ACM international Conference on Information and Knowledge Management
year: 2005
month: October 31-November 5
publisher: ACM Press
organisation: ACM: Association for Computing Machinery SIGIR: ACM Special Interest Group on Information Retrieval
size: 8 pages
notes: CIKM'05
url: http://doi.acm.org/10.1145/1099554.1099688

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 Zhang:2005:IGF %A Zhang, Baoping %A Chen, Yuxin %A Fan, Weiguo %A Fox, Edward A. %A Goncalves, Marcos %A Cristo, Marco %A Calado, Pavel %B Proceedings of the 14th ACM international Conference on Information and Knowledge Management %C Bremen, Germany %D 2005 %I ACM Press %K algorithms, genetic programming %T Intelligent GP fusion from multiple sources for text classification %U http://doi.acm.org/10.1145/1099554.1099688 %X This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.

@inproceedings{Zhang:2005:IGF, abstract = {This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.}, added-at = {2008-06-19T17:35:00.000+0200}, address = {Bremen, Germany}, author = {Zhang, Baoping and Chen, Yuxin and Fan, Weiguo and Fox, Edward A. and Goncalves, Marcos and Cristo, Marco and Calado, Pavel}, biburl = {https://www.bibsonomy.org/bibtex/29e008324062ef3b4b04668ee0c7235ce/brazovayeye}, booktitle = {Proceedings of the 14th {ACM} international Conference on Information and Knowledge Management}, interhash = {c7840d99311dee7dcde97ccacd9d087f}, intrahash = {9e008324062ef3b4b04668ee0c7235ce}, keywords = {algorithms, genetic programming}, month = {October 31-November 5}, notes = {CIKM'05}, organisation = {ACM: Association for Computing Machinery SIGIR: ACM Special Interest Group on Information Retrieval}, publisher = {ACM Press}, size = {8 pages}, timestamp = {2008-06-19T17:55:12.000+0200}, title = {Intelligent {GP} fusion from multiple sources for text classification}, url = {http://doi.acm.org/10.1145/1099554.1099688}, year = 2005 }

BibSonomy

Intelligent GP fusion from multiple sources for text classification

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on