Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion.
Once you have opened the index you can start adding documents. Every object you want to insert into the index is a Document. Each document in index has fields that contain some information about it. For each of the fields you need to specify what the inde
Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.
In Bibliothekskatalogen kommt der 'Treffersortierung nach Relevanz' immer größere Bedeutung zu. Der Aufsatz beschreibt verschiedene Möglichkeiten zur Optimierung des Trefferrankings am Beispiel des Lucene-basierten OPACs der UB Heidelberg. Zur Bestimmung der Relevanz können die Inhalte einzelner Datenfelder analysiert und gewichtet, es können Kriterien der Popularität, der Verfügbarkeit oder der Bewertung eines Titels, oder auch Nutzerprofile berücksichtigt werden. Im Beitrag werden verschiedene Gewichtungsmöglichkeiten und Lösungsansätze für weitere Kriterien aufgezeigt.
Imagine you can see 160 years of history, all on one screen. You can zoom and pan, you can look at a particular day, you can even do a search. And when you do, the results come up not as a list, but as a heat map that shows where in history that topic appears, and how often.
Apache Lucene is a high-performance Java search engine library available at the Apache Software Foundation. Hibernate Annotations includes a package of annotations that allows you to mark any domain model object as indexable and have Hibernate maintain a
Compass is a first class open source Java Search Engine Framework, enabling the power of Search Engine semantics to your application stack decoratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronizes changes with the datasource. With Compass: write less code, find data quicker.
InstaSearch is an Eclipse plug-in for doing fast text search in the workspace. The search is performed instantly as-you-type and resulting files are displayed in an Eclipse view. It is a lightweight plug-in based on Apache Lucene search engine. Each file then can be previewed using few most matching and relevant lines. A double-click on the match leads to the matching line in the file. Main Features Instantly shows search results Shows suggestions using auto-completion
Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called "indexing") via XML over HTTP. You query it via HTTP GET and receive XML results.
Katta is a scalable, failure tolerant, distributed, data storage for real time access.
Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
* Makes serving large or high load indices easy
* Serves very large Lucene or Hadoop Mapfile indices as index shards on many servers
* Replicate shards on different servers for performance and fault-tolerance
* Supports pluggable network topologies
* Master fail-over
* Fast, lightweight, easy to integrate
* Plays well with Hadoop clusters
* Apache Version 2 License
Katta is a scalable, failure tolerant, distributed, data storage for real time access.
Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.
* Makes serving large or high load indices easy
* Serves very large Lucene or Hadoop Mapfile indices as index shards on many servers
* Replicate shards on different servers for performance and fault-tolerance
* Supports pluggable network topologies
* Master fail-over
* Fast, lightweight, easy to integrate
* Plays well with Hadoop clusters
* Apache Version 2 License
N. Ferro, и D. Harman. Multilingual Information Access Evaluation I. Text Retrieval Experiments, том 6241 из Lecture Notes in Computer Science, Springer, Berlin / Heidelberg, (2010)
D. Hiemstra, и C. Hauff. Multilingual and Multimodal Information Access Evaluation, том 6360 из Lecture Notes in Computer Science, стр. 64--69. Berlin, Springer Verlag, (2010)