Reading: How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)