University Research

AutoML Text Classifier

A Modular AutoML System with Optimizer Selection through Low-Budget Proxy-Search-Space Evaluations

Institution: University of Freiburg
Course: AutoML (SS25)
Team: Auto Mates
DEHB NePS PriorBand LSTM Transformer HPOSuite PyTorch

Overview

This project tackles a core challenge in automated machine learning: how do you efficiently explore vast hyperparameter spaces across diverse neural architectures when compute is limited?

Text classification models—LSTMs, Transformers, CNNs—each have unique hyperparameter landscapes. Traditional HPO methods waste significant compute evaluating poor configurations. Our approach introduces a proxy search space: a lightweight benchmark that mirrors the structure of the full optimization problem but evaluates in a fraction of the time.

By running optimizer selection on this proxy first, we identified DEHB as significantly outperforming PriorBand and RandomSearch for our task—before committing any compute to the actual optimization.

Key Contributions

  • DL-Architecture-Agnostic Interface: Unified system allowing seamless integration of LSTM, Transformer, and CNN architectures with their specific preprocessing pipelines
  • Proxy Search Space for Optimizer Selection: Custom HPOSuite benchmark that maintains structural similarity to the target task while being 10x faster to evaluate
  • Token Dropping Regularization: Novel hyperparameter controlling random token dropout (padding-prioritized) that improved LSTM and Transformer training efficiency
  • Trend-Based Early Stopping: EMA-filtered learning curve analysis to terminate underperforming runs when gradient trends fall below threshold
  • Multi-Fidelity Optimization: Training epochs as fidelity dimension combined with 40% data subsampling for efficient configuration evaluation

Results

DEHB found a strong configuration within 4 hours of a 12-hour budget, demonstrating fast convergence on the hierarchical search space. The best model (LSTM) achieved competitive accuracy with minimal computational resources.

0.661
Test Accuracy
0.662
Precision (Micro)
0.660
F1 Score (Micro)
14h
Total GPU Time

Hyperparameter Importance

fANOVA analysis revealed that learning rate dominated hyperparameter importance (R² = 0.88), followed by our custom token-related parameters:

  • Learning Rate: 0.065 ± 0.015 importance score—by far the most critical parameter
  • Keep Token %: 0.025 ± 0.013—our token dropping regularization had measurable impact
  • Token Length: 0.022 ± 0.010—sequence length optimization mattered more than expected
  • CNN Filters: 0.016 ± 0.009—architecture-specific parameters showed lower but non-trivial importance

Research Poster

SS25 AutoML Exam Poster Download PDF

Team

Jan Sander
Co-Author
Ojaswin Khamkar
Co-Author
Jude Mingay
Co-Author