Hanif Rahman

Research

Publications

Pashto speech and language resources, multilingual ASR evaluation, and patient-facing kidney disease education.

Scholar snapshot

Public Google Scholar profile for Hanif Rahman, listed as independent researcher and verified at hanifrahman.com. Metrics were fetched on May 8, 2026.

Open Google Scholar
5
Papers
2
Citations
1
h-index
0
i10-index
Machine learningComputational biologyNatural language processingPashto speech technologyLow-resource language resources

Papers and preprints

PreprintarXiv:2604.04598arXiv20262

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Hanif Rahman

A reproducible public benchmark for Pashto automatic speech recognition across Whisper, MMS, SeamlessM4T, and OmniASR models. The study separates word error rate from script fidelity and documents cross-domain failure modes in read-speech datasets.

Pashtospeech recognitionASRbenchmarks
PreprintarXiv:2604.08786arXiv2026

Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark

Hanif Rahman

Introduces Script Fidelity Rate, a reference-free metric for detecting speech recognition output in the wrong writing system. The benchmark covers ten languages, six scripts, and ten multilingual speech models.

speech recognitionevaluationmultilingual ASR
PreprintarXiv:2604.06507arXiv2026

Fine-tuning Whisper for Pashto ASR: strategies and scale

Hanif Rahman

Compares full fine-tuning, LoRA, frozen encoders, and Urdu-to-Pashto transfer for Whisper on Pashto Common Voice. The paper reports practical scaling results and error patterns for Pashto-specific sounds.

PashtoWhisperfine-tuningASR
PreprintarXiv:2603.27021arXiv2026

Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language

Hanif Rahman, Shafeeq ur Rehman

Documents the growth of Pashto Common Voice from a small seed dataset into an openly licensed speech corpus with more than 100,000 clips. The work covers interface localisation, sentence collection, outreach, and baseline ASR results.

PashtoCommon Voicespeech corpusopen data
PreprintarXiv:2603.16354arXiv2026

PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development

Hanif Rahman

Presents a 1.25-billion-word Pashto corpus, evaluation suite, and reproducible pipeline. The release includes data, model training, and code for low-resource NLP experiments.

Pashtocorpus linguisticsNLPlow-resource languages

Books and guides

Book2024

IgA Nephropathy: A Patient and Family Guide

Hanif Rahman

A comprehensive guide to IgA Nephropathy for patients, families, and anyone wanting to understand this common kidney disease. Covers diagnosis, progression, treatment options, and what to expect living with the condition.

kidney diseaseIgA nephropathypatient education

Full citation list on Google Scholar