This page is devoted to learning methods building on kernels, such as the support vector machine. It grew out of earlier pages at the Max Planck Institute for Biological Cybernetics and at GMD FIRST, snapshots of which can be found here and here. In those days, information about kernel methods was sparse and nontrivial to find, and the kernel machines web site acted as a central repository for the field. It included a list of people working in the field, and online preprints of most publications.
Nowadays, this no longer makes sense, partly because the field is very popular, so there are too many people and papers to make such lists useful, and partly because search engines do the job much more conveniently. But what really forced us to do a major update of the site was the fact that spammers discovered our site, and it was no longer possible to operate a system which was built on the trust that people who submit an entry do so to improve the quality of the site.
The first few posts to this blog will be most coherent if they are read in chronological order. A new entry will be posted each Wednesday morning. To see the latest posts in blog format, click here. To get updates when new posts appear, you can subscribe by RSS, or follow me on google+ or…
Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I'll discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I'll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I'll describe some future challenges and open research problems in this area.