TRL Seminar


Info

The Table Representation Learning (TRL) Seminar hosts talks on recent research in representation learning and generative models for structured data. This includes fundamental mechanisms for modeling structured data, retrieval from tabular data sources, multi-modal tabular learning, applications from data management, to reasoning and predicting over tabular data.

Organization

The TRL Seminar is an initiative from the TRL Lab under the affiliated Table Representation Learning Research Theme in the ELLIS unit Amsterdam, and is organized by Madelon Hulsebos (CWI).

Logistics
  • When: every second Friday of the month, 4-5pm, with drinks afterwards.
  • Where: room L3.36, LAB42 (University of Amsterdam), Science Park.
  • How: Talks are in-person, streamed and recorded through Zoom (link TBC).

Upcoming talks


TBC

Friday 16 May, L3.36 at LAB42, Amsterdam Science Park, live through Zoom

Unfold Bio


TBC

Abstract: TBC


Past talks


Marine Le Morvan, Inria

Friday 11 April 4-5pm, L3.36 at LAB42, Amsterdam Science Park, in-person talk and streamed through Zoom

Unfold Bio

Marine Le Morvan is an INRIA research scientist in the SODA team in Paris-Saclay. Her research lies at the intersection of statistical learning and trustworthy AI, with a focus on:
- Tabular foundation models, which unlock new possibilities through large-scale pretraining.
- Model auditing, to enhance the trustworthiness and reliability of machine learning systems.
- Learning from incomplete data, a challenge pervasive in fields like healthcare and social sciences.


“TabICL: A Tabular Foundation Model for In-Context Learning on Large Data”

Abstract: The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a single forward pass without parameter updates. While the very recent TabPFNv2 foundation model (2025) excels on tables with up to 10K samples, its alternating column- and row-wise attentions make handling large training sets computationally prohibitive. So, can ICL be effectively scaled and deliver a benefit for larger tables? We introduce TabICL, a tabular foundation model pre-trained on datasets with up to 60K samples and handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for efficient ICL. On the TALENT benchmark with 200 datasets, TabICL is on par with TabPFNv2 while being systematically faster (up to 10 times), and significantly outperforms all other approaches. On the 56 datasets with over 10K samples, TabICL surpasses both TabPFNv2 and CatBoost, demonstrating the potential of ICL for large data.

Join the seminar remotely via Zoom: https://cwi-nl-zoom.zoom.us/j/86928893058?pwd=0tFURmzfFWXtWyN4xqkx15urhoui7b.1