Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document
Each year the JHEPS lists the books and articles on the history of probability and statistics that have appeared in the previous year. The list of 2007 publications is due to appear in February 2008. Of course, omissions can be made good at any time and, if you know of any in the list below, please contact me, John Aldrich
Twitter's retention rate lower than Facebook's/Myspace
"Twitter has enjoyed a nice ride over the last few months, but it will not be able to sustain its meteoric rise without establishing a higher level of user loyalty. Frankly, if Oprah can’t accomplish that, I’m not sure who can."
This is the first of a three-part series called TFIDF In Libraries, where “relevancy ranking” will be introduced. In this part, term frequency/inverse document frequency (TFIDF) — a common mathematical method of weighing texts for automatic classification and sorting search results — will be described.
It has been a couple of years since I posted statistics from WorldCat, so here is a new spreadsheet based on an October 1, 2009 snapshot (see the earlier post for an explanation of the table). WorldCat has changed dramatically...