How do you scale down a TabPFN? And which hyperparameters matter most at different scales?
TabPFN is a transformer-based model that solves small tabular classification problems in under a second—but it's large. This study investigates how to create smaller, more efficient versions (NanoTabPFN) while maintaining competitive performance.
Using random search with no prior assumptions, we systematically explored how scaling strategy (width vs. depth vs. compound), model size, and optimizer choice affect both performance and hyperparameter sensitivity. Our evaluation used the TabArena benchmark alongside classic datasets (Iris, Wine, Breast Cancer).
The key finding: smaller and deeper models can outperform larger, shallower configurations—challenging the assumption that more parameters always means better performance.