Research

Publications

Pashto speech and language resources, multilingual ASR evaluation, and patient-facing kidney disease education.

Scholar snapshot

Public Google Scholar profile for Hanif Rahman, listed as independent researcher and verified at hanifrahman.com. Metrics were fetched on May 8, 2026.

Open Google Scholar

Papers

Citations

h-index

i10-index

Machine learningComputational biologyNatural language processingPashto speech technologyLow-resource language resources

Papers and preprints

PreprintarXiv:2604.04598arXiv20262

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Hanif Rahman

A reproducible public benchmark for Pashto automatic speech recognition across Whisper, MMS, SeamlessM4T, and OmniASR models. The study separates word error rate from script fidelity and documents cross-domain failure modes in read-speech datasets.

Read paper PDF Scholar

Pashtospeech recognitionASRbenchmarks

PreprintarXiv:2604.08786arXiv2026

Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark

Hanif Rahman

Introduces Script Fidelity Rate, a reference-free metric for detecting speech recognition output in the wrong writing system. The benchmark covers ten languages, six scripts, and ten multilingual speech models.

Read paper PDF Scholar

speech recognitionevaluationmultilingual ASR

PreprintarXiv:2604.06507arXiv2026

Fine-tuning Whisper for Pashto ASR: strategies and scale

Hanif Rahman

Compares full fine-tuning, LoRA, frozen encoders, and Urdu-to-Pashto transfer for Whisper on Pashto Common Voice. The paper reports practical scaling results and error patterns for Pashto-specific sounds.

Read paper PDF Scholar

PashtoWhisperfine-tuningASR

PreprintarXiv:2603.27021arXiv2026

Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language

Hanif Rahman, Shafeeq ur Rehman

Documents the growth of Pashto Common Voice from a small seed dataset into an openly licensed speech corpus with more than 100,000 clips. The work covers interface localisation, sentence collection, outreach, and baseline ASR results.

Read paper PDF Scholar

PashtoCommon Voicespeech corpusopen data

PreprintarXiv:2603.16354arXiv2026

PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development

Hanif Rahman

Presents a 1.25-billion-word Pashto corpus, evaluation suite, and reproducible pipeline. The release includes data, model training, and code for low-resource NLP experiments.

Read paper PDF Scholar Dataset Model Code

Pashtocorpus linguisticsNLPlow-resource languages

Books and guides

Book2024

IgA Nephropathy: A Patient and Family Guide

Hanif Rahman

A comprehensive guide to IgA Nephropathy for patients, families, and anyone wanting to understand this common kidney disease. Covers diagnosis, progression, treatment options, and what to expect living with the condition.

Get the book

kidney diseaseIgA nephropathypatient education

Full citation list on Google Scholar