My primary area of research is Arabic Computational Linguistics. Specifically:
Stemming: Details about the stemmer I have developed for Arabic. With link to Java code.
Tagging: Details about the Part-Of-Speech (POS) tagger I am developing for Arabic.
Corpora: Details about the Arabic corpora I am using. I have manually tagged 50,000 words of Arabic newspaper text with the basic tags (noun, verb, particle). I have also tagged 1,700 words with more detailed tags (i.e. singular, masculine, definite common noun). These are available for research purposes. Please e-mail me if you would like a copy of them.
Publications: I have included a couple of my publications here that can be viewed or downloaded.
B. Gambäck, F. Olsson, A. Argaw, and L. Asker. Proceedings of the First Workshop on Language Technologies for African Languages, page 104--111. Stroudsburg, PA, USA, Association for Computational Linguistics, (2009)
Z. Sheikh, and F. Sánchez-Martínez. Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, page 67--74. Alicante, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, (2009)