Machine learningComputational biologyNatural language processingPashto speech technologyLow-resource language resources
Papers and preprints
PreprintarXiv:2604.04598arXiv20262
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
Hanif Rahman
A reproducible public benchmark for Pashto automatic speech recognition across Whisper, MMS, SeamlessM4T, and OmniASR models. The study separates word error rate from script fidelity and documents cross-domain failure modes in read-speech datasets.
Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark
Hanif Rahman
Introduces Script Fidelity Rate, a reference-free metric for detecting speech recognition output in the wrong writing system. The benchmark covers ten languages, six scripts, and ten multilingual speech models.
Fine-tuning Whisper for Pashto ASR: strategies and scale
Hanif Rahman
Compares full fine-tuning, LoRA, frozen encoders, and Urdu-to-Pashto transfer for Whisper on Pashto Common Voice. The paper reports practical scaling results and error patterns for Pashto-specific sounds.
Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language
Hanif Rahman, Shafeeq ur Rehman
Documents the growth of Pashto Common Voice from a small seed dataset into an openly licensed speech corpus with more than 100,000 clips. The work covers interface localisation, sentence collection, outreach, and baseline ASR results.
PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
Hanif Rahman
Presents a 1.25-billion-word Pashto corpus, evaluation suite, and reproducible pipeline. The release includes data, model training, and code for low-resource NLP experiments.
A comprehensive guide to IgA Nephropathy for patients, families, and anyone wanting to understand this common kidney disease. Covers diagnosis, progression, treatment options, and what to expect living with the condition.