Data Catalogue

A working index of the corpora, records, and annotation pipelines Specific can license. Select any line to open an inquiry.

ContentsInquire

Voice & Speech Corpora

1.1Call-center recordings
1.2Human-aligned transcripts + audio
1.3Language-learning app dialogue
1.4Voice samples

Regions - South Asian · African · European · Accented EnglishLanguages - Hindi · English · Hinglish · Tamil · Telugu · Swahili · Arabic

Company's Internal Operations

Sources - Slack · Linear · Drive · Codebases

Transcription & Annotation

Licensing & General Inquiry

Not sure which dataset fits? Reach out and we'll point you to the right corpus.