Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public.
What you can do with Voyant:
Use it to learn how computers-assisted analysis works. Check out our examples that show you how to do real academic tasks with Voyant.
Use it to study texts that you find on the web or texts that you have carefully edited and have on your computer.
Use it to add functionality to your online collections, journals, blogs or web sites so others can see through your texts with analytical tools.
Use it to add interactive evidence to your essays that you publish online. Add interactive panels right into your research essays (if they can be published online) so your readers can recapitulate your results.
Use it to develop your own tools using our functionality and code.
Every text search solution is as powerful as the text analysis capabilities it offers. Lucene is such open source information retrieval library offering many text analysis possibilities. In this post, we will cover some of the main text analysis features offered by ElasticSearch available to enrich your search content.
Welcome to NewsReader: “Building structured event Indexes of large volumes of financial and economic Data for Decision Making”
The volume of news data is enormous and expanding, covering billions of archived documents with millions of documents added daily. These documents are also getting more and more interconnected with knowledge from other sources such as biographies and company databases.
Professional decision makers who need to respond quickly to new developments or who need to explain these developments on the basis of the past are faced with the problem that current solutions for consulting these archives no longer work. There are simply too many possibly relevant and partially overlapping documents and from these documents decision makers still need to distinguish the correct from the wrong, the new from the old, the actual from the out-of-date by reading the content and maintaining a record in memory. Consequently, it becomes almost impossible to make well-informed decisions and professionals risk to be held liable for decisions based on incomplete, inaccurate and out-of-date information.
NewsReader will process news in 4 different languages when it comes in. It will extract what happened to whom, when and where, removing duplication, complementing information, registering inconsistencies and keeping track of the original sources. Any new information is integrated with the past, distinguishing the new from the old in an unfolding story line, similar to how people tend to remember the past and access knowledge and information. The difference here is that NewsReader can provide access to all original sources and will not forget any details (like a “History Recorder”). We will develop a decision-support tool that allows professional decision makers to explore these story lines using visual interfaces and interactions to exploit their explanatory power and their systematic structural implications. Likewise, NewsReader can make predictions from the past on future events or explain new events and developments through the past.
Natural Language Corpus Data: Beautiful Data
This directory contains code and data to accompany the chapter Natural Language Corpus Data from the book Beautiful Data (Segaran and Hammerbacher, 2009). If you like this you may also like: How to Write a Spelling Corrector.
The BioScope corpus consists of medical and biological texts annotated for negation, speculation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution. The corpus is publicly available for research purposes.
eTBLAST is a unique search engine for searching biomedical literature. it lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it.
Wired Magazine issue 16.07. Data Deluge. Crop predictions. Quark. Data mining. tracking news. watching the skies, scanning skeletons. airfares. voting. epidemics. google events. terrorism. visualizing big data
TAPoR will build a unique human and computing infrastructure for text analysis across the country by establishing six regional centers to form one national text analysis research portal.
The Software Environment for the Advancement of Scholarly Research (SEASR), funded by the Andrew W. Mellon Foundation, provides a research and development environment capable of powering leading-edge digital humanities initiatives.
A quick tutorial for the Boston Predictive Analytics MeetUp to demonstrate the use of R in the context of text mining Twitter. We implement a very crude algorit
Extraktion von strukturiertem Wissen aus Antiken Quellen für die Altertumswissenschaft (eAQUA)
Förderprogramm „Wechselwirkungen zwischen Natur– und Geisteswissenschaften”
This is an overview of the open source NLP and machine learning tools for text mining, information extraction, text classification, clustering, approximate string matching, language parsing and tagging, and more.
Using the transcripts of Bill Gates' keynote from CES 2007 and Steve Jobs' keynote at Macworld 2007 (via Todd Bishop's Microsoft Blog) I created this relational tagcloud using Rhizome Navigation.
FullText.exe is freely available for academic usage. The program generates a word-occurrence matrix, a co-occurrence matrix, and a normalized co-occurrence matrix from a set of text files and a word list.
Research Interests Comparator (RIC) is our fourth electronic text mining project. The goal of the RIC system is to dramatically improve the ability of biomedical researchers to find information that is relevant to their areas of study, and to provide them
Powerful Search Engine designed for Document Management, Competitive Intelligence, Press Analysis and Text Mining, Web Mining, Knowledge Discovery, Strategic Watch...Has Report Writer, Web Spider, Publisher, more...
Text Mining Recommendation Systems/ Collaborative Filtering, Structure Web Graph Page Rank/Spam Social Networking, Data Structures Bloom Filters ... Stanford University course; resources, links, more.