The Grid Corpus is a large multitalker audiovisual sentence corpus designed to support joint computational-behavioral studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female), for a total of 34000 sentences. Sentences are of the form "put red at G9 now". audio_25k.zip contains the wav format utterances at a 25 kHz sampling rate in a separate directory per talker alignments.zip provides word-level time alignments, again separated by talker s1.zip, s2.zip etc contain .jpg videos for each talker [note that due to an oversight, no video for talker t21 is available] The Grid Corpus is described in detail in the paper jasagrid.pdf included in the dataset.
D. Galvin. (2014)cite arxiv:1406.7872Comment: Notes prepared to accompany a series of tutorial lectures given by the author at the 1st Lake Michigan Workshop on Combinatorics and Graph Theory, Western Michigan University, March 15--16 2014.
Z. Wang, und S. Ji. (2018)cite arxiv:1808.08931Comment: The original version was accepted by KDD2018. Code is publicly available at https://github.com/divelab/dilated.
M. Finzi, K. Wang, und A. Wilson. (2020)cite arxiv:2010.13581Comment: NeurIPS 2020. Code available at https://github.com/mfinzi/constrained-hamiltonian-neural-networks.
T. Miyato, S. Maeda, M. Koyama, und S. Ishii. (2017)cite arxiv:1704.03976Comment: To be appeared in IEEE Transactions on Pattern Analysis and Machine Intelligence.