University Research

HPO4TabPFN

Scaling Down TabPFN: How Model Size Affects Optimal Training Configurations

Institution: University of Freiburg

Course: Deep Learning Lab (DLL25)

Supervisor: Johannes Hog

TabPFN NePS Muon AdamW Lion SHAP TabArena PyTorch

Overview

How do you scale down a TabPFN? And which hyperparameters matter most at different scales?

TabPFN is a transformer-based model that solves small tabular classification problems in under a second—but it's large. This study investigates how to create smaller, more efficient versions (NanoTabPFN) while maintaining competitive performance.

Using random search with no prior assumptions, we systematically explored how scaling strategy (width vs. depth vs. compound), model size, and optimizer choice affect both performance and hyperparameter sensitivity. Our evaluation used the TabArena benchmark alongside classic datasets (Iris, Wine, Breast Cancer).

The key finding: smaller and deeper models can outperform larger, shallower configurations—challenging the assumption that more parameters always means better performance.

Model Scale Configurations

We tested four model scales with different width/depth tradeoffs:

Scale	Strategy	Layers	Embedding	MLP Hidden	Parameters
Big	Base	8	192	768	`4.9M`
Medium	Width	4	192	768	`2.6M`
	Depth	8	140	560
	Compound	6	160	640
Small	Width	2	192	768	`1.35M`
	Depth	8	100	400
	Compound	4	140	560
Mini	Compound	6	64	192	`0.37M`

Key Findings

Compound Small (1.35M) outperformed larger models—the Pareto-optimal configuration wasn't the biggest one
Muon optimizer dominated—consistently beat AdamW, Lion, and AdamW Schedule-Free across all scales
Deeper > Wider—at equal parameter counts, deeper models slightly outperformed wider ones
Batch size correlates with scale—larger models benefited from larger batch sizes
Learning rate is critical—SHAP analysis showed LR and model scale as the most important hyperparameters
Smaller models need higher LR—inverse relationship between model size and optimal learning rate

Results

Performance was evaluated on the TabArena benchmark. The Compound Small configuration achieved the best accuracy-to-parameters ratio.

1st

HPO4TabPFN

Overview

Model Scale Configurations

Key Findings

Results

SHAP Hyperparameter Importance

Methodology

Research Poster

Team