DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
H. C, и C. K. International Journal of Advanced Information Technology (IJAIT), 7 (1/2/3):
1-12(июня 2017)
DOI: 10.5121/ijait.2017.7301
Аннотация
Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. For given script we extracted different features like Gray Level Co-occurrence Method (GLCM) and Scale invariant feature transform (SIFT) features. The features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Extracted features are reduced using various dimension reduction techniques. The reduced features are fed into Nearest Neighbor classifier. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust in the process of scanning and relatively insensitive to change in font size. This proposed system achieves good classification accuracy on a large testing data set.
%0 Journal Article
%1 noauthororeditor
%A C, Hamsaveni L Pradeep
%A K, Chethan H
%D 2017
%J International Journal of Advanced Information Technology (IJAIT)
%K GLCM Nearest Neighbour PCA PLS SIFT SIR
%N 1/2/3
%P 1-12
%R 10.5121/ijait.2017.7301
%T DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
%U http://aircconline.com/ijait/V7N3/7317ijait01.pdf
%V 7
%X Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. For given script we extracted different features like Gray Level Co-occurrence Method (GLCM) and Scale invariant feature transform (SIFT) features. The features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Extracted features are reduced using various dimension reduction techniques. The reduced features are fed into Nearest Neighbor classifier. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust in the process of scanning and relatively insensitive to change in font size. This proposed system achieves good classification accuracy on a large testing data set.
@article{noauthororeditor,
abstract = {Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. For given script we extracted different features like Gray Level Co-occurrence Method (GLCM) and Scale invariant feature transform (SIFT) features. The features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Extracted features are reduced using various dimension reduction techniques. The reduced features are fed into Nearest Neighbor classifier. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust in the process of scanning and relatively insensitive to change in font size. This proposed system achieves good classification accuracy on a large testing data set.},
added-at = {2018-03-31T06:52:31.000+0200},
author = {C, Hamsaveni L Pradeep and K, Chethan H},
biburl = {https://www.bibsonomy.org/bibtex/28e1bce8b0ad3245c46e7e3e35bd39a4e/ijaitislive},
doi = {10.5121/ijait.2017.7301},
interhash = {9beffc41254d61956e03d4af7b350996},
intrahash = {8e1bce8b0ad3245c46e7e3e35bd39a4e},
issn = {2231-1548},
journal = {International Journal of Advanced Information Technology (IJAIT)},
keywords = {GLCM Nearest Neighbour PCA PLS SIFT SIR},
language = {english},
month = {June},
number = {1/2/3},
pages = {1-12},
timestamp = {2018-03-31T06:52:31.000+0200},
title = {DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS},
url = {http://aircconline.com/ijait/V7N3/7317ijait01.pdf},
volume = 7,
year = 2017
}