OLAC Record: Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment

OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/74817

Metadata

Title: Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment

Bibliographic Citation: 2025-07; Article; Kaipuleohone University of Hawai'i Digital Language Archive;https://hdl.handle.net/10125/74817.

Date (W3CDTF): 2025-07

Description: Phonetic forced alignment can greatly expedite spoken language analysis by providing automatic time alignments at the word and phone levels. In the case of low-resource languages, it remains an open question whether phone-level forced alignment will be more successful with a small language-specific acoustic model or a high-resource cross-language acoustic model. The present study directly compared the forced alignment performance of language-specific and cross-language acoustic models using the Urum and Evenki datasets from the DoReCo Corpus. We evaluated six language-specific acoustic models trained with 5, 10, 15, 20, 25, or approximately 70 minutes of language-specific speech data against four English-based cross-language acoustic models that differed in size and accent homogeneity (large Global English or homogeneous American English of varying data amounts). Acoustic models were developed or obtained from the Montreal Forced Aligner and evaluated against held-out manually aligned phone boundaries. Overall, the Global English model and the larger language-specific acoustic models were competitive with one another and outperformed the homogeneous cross-language and smaller language-specific acoustic models. From this analysis, we recommend that researchers use a language-specific model with at least 25 minutes of actual speech (not just recording duration) or a large, diverse cross-language acoustic model for low-resource forced alignment.

National Foreign Language Resource Center

Format: Article

23

Identifier: Chodroff, Eleanor, Emily P. Ahn, Hossep Dolatian. 2025. Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment. Language Documentation & Conservation 19: 201-223.

1934-5275

Identifier (URI): https://hdl.handle.net/10125/74817

Language: English

Language (ISO639): eng

Publisher: University of Hawaii Press

Table Of Contents: Chodroff_etal_2025.pdf

OLAC Info

Archive: Language Documentation and Conservation

Description: http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:scholarspace.manoa.hawaii.edu:10125/74817

DateStamp: 2025-08-01

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2025. University of Hawaii Press.
Terms: area_Europe country_GB iso639_eng

http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/74817
Up-to-date as of: Thu Sep 25 0:33:47 EDT 2025

Metadata
Title:		Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment
Bibliographic Citation:		2025-07; Article; Kaipuleohone University of Hawai'i Digital Language Archive;https://hdl.handle.net/10125/74817.
Date (W3CDTF):		2025-07
Description:		Phonetic forced alignment can greatly expedite spoken language analysis by providing automatic time alignments at the word and phone levels. In the case of low-resource languages, it remains an open question whether phone-level forced alignment will be more successful with a small language-specific acoustic model or a high-resource cross-language acoustic model. The present study directly compared the forced alignment performance of language-specific and cross-language acoustic models using the Urum and Evenki datasets from the DoReCo Corpus. We evaluated six language-specific acoustic models trained with 5, 10, 15, 20, 25, or approximately 70 minutes of language-specific speech data against four English-based cross-language acoustic models that differed in size and accent homogeneity (large Global English or homogeneous American English of varying data amounts). Acoustic models were developed or obtained from the Montreal Forced Aligner and evaluated against held-out manually aligned phone boundaries. Overall, the Global English model and the larger language-specific acoustic models were competitive with one another and outperformed the homogeneous cross-language and smaller language-specific acoustic models. From this analysis, we recommend that researchers use a language-specific model with at least 25 minutes of actual speech (not just recording duration) or a large, diverse cross-language acoustic model for low-resource forced alignment.
Description:		National Foreign Language Resource Center
Format:		Article
Format:		23
Identifier:		Chodroff, Eleanor, Emily P. Ahn, Hossep Dolatian. 2025. Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment. Language Documentation & Conservation 19: 201-223.
Identifier:		1934-5275
Identifier (URI):		https://hdl.handle.net/10125/74817
Language:		English
Language (ISO639):		eng
Publisher:		University of Hawaii Press
Table Of Contents:		Chodroff_etal_2025.pdf
OLAC Info
Archive:		Language Documentation and Conservation
Description:		http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:scholarspace.manoa.hawaii.edu:10125/74817
DateStamp:		2025-08-01
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2025. University of Hawaii Press.
Terms:		area_Europe country_GB iso639_eng