christiane rösinger; n.b. comments: gaga ist böse, weil mainstream/schein/nichts dahinter (erinnert an filme, wo leute in den mittleren 80ern tiefgefroren wurden und jetzt durch ein missgeschick aufgetaut wurden)
CATMA integrates three functional, interactive modules: a tagger, a query-builder and an analyzer. The analyzer module contains most of the text analytical functions known to users of TACT
Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about co-occurrences of words in randomly selected sentences.
Web 0.1: Wetter, Fuball-Live-Ticker, Fernsehprogramm: 16 Millionen Deutsche drcken jeden Tag auf die Fernbedienung, um an Informationen des Videotexts zu kommen. (tags: tv text)
Now you can easily create tables in plain text which can be copied into any text file. Multi-line cells' contents is supported as well as multirow and multicolumn spanning of cells.
Today, speech technology is only available for a small fraction of the thousands of languages spoken around the world because traditional systems need to be trained on large amounts of annotated speech audio with transcriptions. Obtaining that kind of data for every human language and dialect is almost impossible.
Wav2vec works around this limitation by requiring little to no transcribed data. The model uses self-supervision to push the boundaries by learning from unlabeled training data. This enables speech recognition systems for many more languages and dialects, such as Kyrgyz and Swahili, which don’t have a lot of transcribed speech audio. Self-supervision is the key to leveraging unannotated data and building better systems.
S. Ahire, G. Dere, M. Mekha, и A. S.Wagh. International Journal on Recent and Innovation Trends in Computing and Communication, 3 (1):
82--85(января 2015)