Collective sampling and analysis of high order tensors for chatroom communications
E. Acar, S. Çamtepe, and B. Yener. in ISI 2006: IEEE International Conference on Intelligence and Security Informatics, page 213--224. Springer, (2006)
Abstract
Abstract. This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results. 1
Description
CiteSeerX — Collective sampling and analysis of high order tensors for chatroom communications
%0 Conference Paper
%1 Acar06collectivesampling
%A Acar, Evrim
%A Çamtepe, Seyit A.
%A Yener, Bülent
%B in ISI 2006: IEEE International Conference on Intelligence and Security Informatics
%D 2006
%I Springer
%K chatlog chatroom
%P 213--224
%T Collective sampling and analysis of high order tensors for chatroom communications
%U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.2577
%X Abstract. This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results. 1
@inproceedings{Acar06collectivesampling,
abstract = {Abstract. This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results. 1},
added-at = {2010-03-24T03:55:56.000+0100},
author = {Acar, Evrim and Çamtepe, Seyit A. and Yener, Bülent},
biburl = {https://www.bibsonomy.org/bibtex/28063081a4b705083667b9d897f840292/zhenzhenx},
booktitle = {in ISI 2006: IEEE International Conference on Intelligence and Security Informatics},
description = {CiteSeerX — Collective sampling and analysis of high order tensors for chatroom communications},
interhash = {8fefe72f531ffefd56c483943cc60eb6},
intrahash = {8063081a4b705083667b9d897f840292},
keywords = {chatlog chatroom},
pages = {213--224},
publisher = {Springer},
timestamp = {2010-06-16T11:05:42.000+0200},
title = {Collective sampling and analysis of high order tensors for chatroom communications},
url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.2577},
year = 2006
}