jaj > tools corpus

bookmarks (hide)6
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1subset of govdocs1 corpus
a subset of the govdocs1 corpus for testing file-characterization tools
10 years ago by @jaj
show all tags
corpus
digital_preservation
file_formats
tools
corpusdigital_preservationfile_formatstools
(0)
copydelete
- community post
- history of this post
1Digital Corpora » Govdocs1
a corpus of 1 million documents that are freely available for research and may be (to the best of our knowledge) freely redistributed. These documents were obtained by performing searches for words randomly chosen from the Unix dictionary, numbers randomly chosen between 1 and 1 million, and randomized combinations of the two, for documents of specified file types that resided on web servers in the .gov domain using the Yahoo an Google search engines.
10 years ago by @jaj
show all tags
corpus
digital_preservation
govdocs
tools
corpusdigital_preservationgovdocstools
(0)
copydelete
- community post
- history of this post
1openplanets/format-corpus · GitHub
An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
10 years ago by @jaj
show all tags
corpus
digital_preservation
file_formats
tools
corpusdigital_preservationfile_formatstools
(0)
copydelete
- community post
- history of this post
2Phrases in English
PIE incorporates a database derived from the second or World Edition of the British National Corpus (BNC 2000). It aims to provide a simple yet powerful interface for studying words and phrases up to eight words long appropriate for both experienced researchers and novice users.
12 years ago by @jaj
show all tags
corpus
tools
linguistics
corpustoolslinguistics
(0)
copydelete
- community post
- history of this post
1MemeTracker: tracking news phrases over the web
MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories and blog posts per day from 1 million online sources, ranging from mass media to personal blogs. We track the quotes and phrases that appear most frequently over time across this entire online news spectrum. This makes it possible to see how different stories compete for news and blog coverage each day, and how certain stories persist while others fade quickly.
12 years ago by @jaj
show all tags
corpus
news
tools
corpusnewstools
(0)
copydelete
- community post
- history of this post
2Google Books: American English (155 billion words)
the Google Books corpus of American English, 155 billion words in size. limited to what you can do via the website at Brigham Young University. The easy thing to do is type in a word or phrase and see its frequency by decade, going back to the 1810s. The interface allows you to look for collocates (words that go with other words), view charts showing relative word frequency in the corpus by decade, handles parts of speech, and gives you various limits and display options. Other kinds of analysis that might be done with text corpora can’t be done through the interface.
12 years ago by @jaj
show all tags
corpora
corpus
reference
tools
corporacorpusreferencetools
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

No matching posts.

⟨⟨
⟨
⟩
⟩⟩

BibSonomy

bookmarks (hide)6
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1subset of govdocs1 corpus

1Digital Corpora » Govdocs1

1openplanets/format-corpus · GitHub

2Phrases in English

1MemeTracker: tracking news phrases over the web

2Google Books: American English (155 billion words)

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

browse

related tags

concepts

tags

bookmarks (hide)6 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide) displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

tags

bookmarks (hide)6
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...