TRL Seminar | TRL Lab

Info

The Table Representation Learning (TRL) Seminar hosts talks on recent research in representation learning and generative models for structured data. This includes fundamental mechanisms for modeling structured data, retrieval from tabular data sources, multi-modal tabular learning, applications from data management, to reasoning and predicting over tabular data.

Organization

The TRL Seminar is an initiative from the TRL Lab under the affiliated Table Representation Learning Research Theme in the ELLIS unit Amsterdam, and is organized by Madelon Hulsebos (CWI).

Logistics

When: every second Friday of the month, 4-5pm, with drinks afterwards.
Where: room L302 CWI, Science Park 123, Amsterdam (gather 5 minutes before start in the lobby).
How: Talks are in-person, streamed and recorded through Zoom (link TBC).

Upcoming talks

Margherita Martorana, VU Amsterdam

Friday 24 October 4:00-4:30pm, room L302 at CWI, Science Park 123 Amsterdam, in-person talk and streamed through Zoom

Unfold Bio

Margherita is a researcher in knowledge engineering and AI at Vrije Universiteit Amsterdam. Her research interests focus on neurosymbolic approaches that combine symbolic reasoning and neural models to build interpretable, interoperable and semantically aware AI solutions. Her PhD explored how metadata-driven and semantic methods can make confidential tabular data more findable and reusable while preserving privacy. Most recently, she worked on applying knowledge graphs and multimodal models to improve the adaptability and interoperability of personal service robots.

“How can we work with data that we cannot see? Metadata-driven approaches for confidential tabular data”

Many datasets that could benefit society, from the medical domain to social science, contain personal or confidential information and therefore cannot be openly shared. This raises a key question: how can we do data-driven research when the data itself is not available? In this talk, I will present the main focus of my PhD research, which explores how metadata can be used to enable the discovery and reuse of restricted-access tabular data without exposing sensitive information. I will introduce the concept of dataless tables, where the actual data is not accessible, but the metadata is used to describe meaningful aspects of the dataset - such as its structure, variables, and relationships - allowing it to be understood and connected without revealing the underlying content. I will then show how knowledge repositories, ontologies, and large language models can support the enrichment and integration of metadata to improve data FAIRness. The work shows that metadata-driven methods offer a practical way to balance data utility and data protection, contributing to a more transparent and secure data-sharing ecosystem.

Join the seminar remotely via Zoom: https://cwi-nl-zoom.zoom.us/j/89379959994?pwd=auygjgwlePCaaFR4SeBkqWHy7lTuyZ.1

Elias Dubbeldam, University of Amsterdam

Friday 24 October 4:30-5:00pm, room L302 at CWI, Science Park 123 Amsterdam, in-person talk and streamed through Zoom

Unfold Bio

Elias Dubbeldam is a second-year PhD candidate in the Business Analytics Section at the University of Amsterdam. His research focuses on tabular deep learning, specifically on modeling feature (i.e., column) interactions in single tables. This work lies at the intersection of probabilistic graphical models and graph neural networks. Elias is also interested in healthcare applications of tabular machine learning.

“Graph-based Tabular Deep Learning Should Learn Feature Interactions, Not Just Make Predictions”

In this talk, I discuss my research on feature interactions of tabular deep learning (TDL). A key challenge of single-table prediction lies in modeling complex, dataset-specific feature interactions that are central to tabular data. I argue that TDL should move beyond prediction-centric objectives and prioritize the explicit learning and evaluation of feature interactions. I discuss my recent research on graph-based TDL (GTDL) methods, which represent features as nodes and their interactions as edges in graph neural networks. Using synthetic datasets with known ground-truth graph structures, we show that existing GTDL methods fail to recover meaningful feature interactions. Moreover, enforcing the true interaction structure improves predictive performance. This highlights the need for GTDL methods to prioritize quantitative evaluation and accurate structural learning. Finally, I highlight opportunities and challenges for explicit feature interaction modeling within tabular foundation models.

Join the seminar remotely via Zoom: https://cwi-nl-zoom.zoom.us/j/89379959994?pwd=auygjgwlePCaaFR4SeBkqWHy7lTuyZ.1

Past talks

Marine Le Morvan, Inria

Friday 11 April 4-5pm, L3.36 at LAB42, Amsterdam Science Park, in-person talk and streamed through Zoom

Unfold Bio

Marine Le Morvan is an INRIA research scientist in the SODA team in Paris-Saclay. Her research lies at the intersection of statistical learning and trustworthy AI, with a focus on:
- Tabular foundation models, which unlock new possibilities through large-scale pretraining.
- Model auditing, to enhance the trustworthiness and reliability of machine learning systems.
- Learning from incomplete data, a challenge pervasive in fields like healthcare and social sciences.

“TabICL: A Tabular Foundation Model for In-Context Learning on Large Data”

Abstract: The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a single forward pass without parameter updates. While the very recent TabPFNv2 foundation model (2025) excels on tables with up to 10K samples, its alternating column- and row-wise attentions make handling large training sets computationally prohibitive. So, can ICL be effectively scaled and deliver a benefit for larger tables? We introduce TabICL, a tabular foundation model pre-trained on datasets with up to 60K samples and handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for efficient ICL. On the TALENT benchmark with 200 datasets, TabICL is on par with TabPFNv2 while being systematically faster (up to 10 times), and significantly outperforms all other approaches. On the 56 datasets with over 10K samples, TabICL surpasses both TabPFNv2 and CatBoost, demonstrating the potential of ICL for large data.

Join the seminar remotely via Zoom: https://cwi-nl-zoom.zoom.us/j/86928893058?pwd=0tFURmzfFWXtWyN4xqkx15urhoui7b.1

Vaishali Pal, University of Amsterdam

Thursday 22 May 4-5pm, L3.33 at LAB42, Amsterdam Science Park, live through Zoom

Unfold Bio

Vaishali is a final-year PhD candidate at the Information Retrieval Lab at the University of Amsterdam. Her research interests are in the natural language processing and information retrieval, with a focus on semi-structured tables.

Table Question Answering

In this talk, I discuss my research on question answering over semi-structured tables. Semi-structured tables are fact-heavy and pose significant challenges to language models aiming to effectively meet a user's information needs. To understand these challenges, I discuss various tasks such as question answering and summarization over multiple tabular contexts and low-resource table question answering. Finally, I briefly discuss information retrieval over tables.

Join the seminar remotely via Zoom: https://cwi-nl-zoom.zoom.us/j/86928893058?pwd=0tFURmzfFWXtWyN4xqkx15urhoui7b.1