Due to an explosion of data, there has been an increasing demand for scalable machine learning and data mining algorithms in many applications, such as social network analysis, information retrieval, recommendation system, biology applications, multimedia, and e-commerce. The objective of this special issue is to connect academia and industry on the methods and experiences of large scale data analysis. We look for scalable machine learning, data mining algorithms, implementations, frameworks and case studies that target at real and practical scenarios for large datasets. The focus is to identify the real challenges in large-scale data mining and to investigate the scalable methods and practical solutions of the core machine learning and data mining problems with respect to both theoretical and experimental perspectives.
The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric
BitC is a new systems programming language. It seeks to combine the flexibility, safety, and richness of Standard ML or Haskell with the low-level expressiveness of C.
In mathematics and physics, a small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps. A small world network, where nodes represent people and edges connect people that know each other, captures the small world phenomenon of strangers being linked by a mutual acquaintance.
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.
Producing Open Source Software is a book about the human side of open source development. It describes how successful projects operate, the expectations of users and developers, and the culture of free software. The book is released under an open copyright: it is available in bookstores and from the publisher (O'Reilly Media), or you can browse or download it here.
Welcome to OSDev.org, the largest online community of operating system developers. If you want to learn how to write your own OS we have all the information to get you started. Read our OS development wiki to learn where to start. The forums are a great place to discuss OS theory and ask for help when you get stuck. Don't forget to add a link on the OS List to your OS project once it gets going.
Netlib is a collection of mission-critical software components for linear algebra systems (i.e. working with vectors or matrices). Netlib libraries are written in C, Fortran or optimised assembly code. A Java translation has been provided by the F2J project but it does not take advantage of optimised system libraries.
In a recent piece called Strong Typing vs. Strong Testing, noted programmer and author Bruce Eckel makes an argument that dynamically typed languages such as Python are superior to statically typed languages such as Java and C++. I've done quite a bit of Python and Java programming, and even a little C++, so I can appreciate his position, but I think the conclusion goes too far. Whether Python is more productive than C++ or Java is one thing, whether static typing in general should be abandoned is quite another.
MegaMap is a Java implementation of a map (or hashtable) that can store an unbounded amount of data, limited only by the amount of disk space available. Objects stored in the map are persisted to disk. Good performance is achieved by an in-memory cache. The MegaMap can, for all practical reasons, be thought of as a map implementation with unlimited storage space.
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs based on simulation of (stochastic) flow in graphs.
LyX is a document processor that encourages an approach to writing based on the structure of your documents (WYSIWYM), and not simply their appearance (WYSIWYG).
LyX combines the power and flexibility of TeX/LaTeX with the ease of use of a graphical interface. This results in world-class support for creation of mathematical content (via a fully integrated equation editor) and structured documents like academic articles, theses, and books. In addition, staples of scientific authoring such as reference list and index creation come standard. But you can also use LyX to create a letter or a novel or a theatre play or film script. A broad array of ready, well-designed document layouts are built in.
Résumé, Curriculum Vitae or simply CV is an important brief about your professional life. It is likely to be one of the first contacts with a prospective employer. Curriculum Vitae means course of life in Latin. So what exactly should a Résumé contain and how detailed should it be? There is no silver bullet answer. ...
Lzz makes ordinary C++ programming seem low-level. How many times have you neglected to update a header file after editing a source file? This is a silly mistake, yet we do it again and again. C++ forces you to type and maintain duplicate code. Why not let a program generate it for you?
This is a guide to the LaTeX markup language. It is intended that this can serve as a useful resource for everyone from new users who wish to learn, to old hands who need a quick reference.
JCublas is providing Java bindings for the NVIDIA CUDA BLAS implementation, thus making the parallel processing power of modern graphics hardware available for Java programs.
The following article will describe how to configure a CentOS 5.x-based or Centos 6.x-based system to use Fedora Epel repos and third party remi package repos. These package repositories are not officially supported by CentOS, but they provide much more current versions of popular applications like PHP or MYSQL.
EM has been shown to have favorable convergence properties, automatical satisfaction of constraints, and fast convergence. The next section explains the traditional approach to deriving the EM algorithm and proving its convergence property. Section 3.3 covers the interpretion the EM algorithm as the maximization of two quantities: the entropy and the expectation of complete-data likelihood. Then, the K-means algorithm and the EM algorithm are compared. The conditions under which the EM algorithm is reduced to the K-means are also explained. The discussion in Section 3.4 generalizes the EM algorithm described in Sections 3.2 and 3.3 to problems with partial-data and hidden-state. We refer to this new type of EM as the doubly stochastic EM. Finally, the chapter is concluded in Section 3.5.
The goal of this book is to provide practical information on how to gain the largest possible benefit from your connection to the Internet. By applying the monitoring and optimisation techniques discussed here, the effectiveness of your network can be significantly improved.
We've all heard of 'six degrees of separation', the idea that everyone in the world can be connected in just a few steps. But what if those steps don't just relate to people but also to viruses, neurons, proteins and even to fashion trends? What if this 'six degrees of separation' allowed us an insight into something at the core of Nature?
In mathematical logic, Gödel's incompleteness theorems, proved by Kurt Gödel in 1931, are two theorems stating inherent limitations of all but the most trivial formal systems for arithmetic of mathematical interest. The theorems are of considerable importance to the philosophy of mathematics. They are widely regarded as showing that Hilbert's program to find a complete and consistent set of axioms for all of mathematics is impossible, thus giving a negative answer to Hilbert's second problem.
The GWT Window Manager provides a high level windowing system for the GWT applications. It offers a desktop component, dialog features , free floating windows and more. Try it by yourself and feel free to use it, it's free!
This website provides tutorials and sample course content so CS students and educators can learn more about current computing technologies and paradigms. In particular, this content is Creative Commons licensed which makes it easy for CS educators to use in their own classes.
The Courses section contains tutorials, lecture slides, and problem sets for a variety of topic areas:
AJAX Programming
Algorithms
Distributed Systems
Web Security
Languages
In the Tools 101 section, you will find a set of introductions to some common tools used in Computer Science such as version control systems and databases.
The CS Curriculum Search will help you find teaching materials that have been published to the web by faculty from CS departments around the world. You can refine your search to display just lectures, assignments or reference materials for a set of courses.
The FOSS in Research and Student Innovation Miniconf brings together researchers and students with an active interest in Free and Open Source Software with the broader Linux.conf.au community to highlight exciting work taking place within the often esoteric world of academia and educational institutions.
The Miniconf is part of Linux.conf.au 2011, being held at the QUT Gardens Point Campus in Brisbane, Queensland in January.
Topics are split into two streams: FOSS in Research, which invites presentations on research relating to Free and Open Source Software; and Student Innovation, which explores new and exciting work in the FOSS world conducted by students. Presentations may be proposed in a 25-minute talk format (20 minutes talk + 5 minutes discussion).
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; it also includes a fast I/O API for binary and text files. It is free software distributed under the GNU Lesser General Public License.
Elefant (Efficient Learning, Large-scale Inference, and Optimisation Toolkit) is an open source library for machine learning licensed under the Mozilla Public License (MPL). We develop an open source machine learning toolkit which provides
Delta Debugging automates the scientific method of debugging. The Delta Debugging algorithm isolates failure causes automatically - by systematically narrowing down failure-inducing circumstances until a minimal set remains.
Excellence of any sort--excellent dancing, excellent quarterbacking, excellent woodworking--has no waste. You fix wordy writing by doing the same job using fewer words.
This series of three talks will give a nontechnical, high level overview of geometric complexity theory (GCT), which is an approach to the P vs. NP problem via algebraic geometry, representation theory, and the theory of a new class of quantum groups, called nonstandard quantum groups, that arise in this approach.
Consensus clustering has emerged as an important elaboration of the classical clustering problem. Consensus clustering, also called aggregation of clustering (or partitions), refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete.
If two numbers b and c have the property that their difference b-c is integrally divisible by a number m (i.e., (b-c)/m is an integer), then b and c are said to be "congruent modulo m."
Die gezeigten Posts sind eventuell nicht akkurat bei Änderungen, die vor Kurzem vorgenommen worden. Wollen Sie jedoch akkurate Posts mit eingeschränkten Sortierungsmöglichkeiten, folgen Sie dem folgenden Link.
M. Koolen, G. Kazai, und N. Craswell. WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, Seite 44--53. New York, NY, USA, ACM, (2009)
A. Turpin, und F. Scholer. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seite 11--18. New York, NY, USA, ACM, (2006)
C. Daskalakis, P. Goldberg, и C. Papadimitriou. STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, стр. 71--78. New York, NY, USA, ACM, (2006)