Virtuoso Logo
✨ 2nd Edition

Infrastructure for Evaluating
Novel HW/OS Interfaces

A full-day workshop and tutorial on the principles, methodologies, and practical aspects of hardware/OS co-design for memory management

📅 Date: Monday, March 23, 2026 (Full Day)
📍 Location: Room Ft. Duquesne, ASPLOS 2026, Pittsburgh, PA
--Days
--Hours
--Minutes
--Seconds

More information on the ASPLOS 2026 website.

About the Workshop & Tutorial

Building on the success of our first workshop at MICRO 2025, we are excited to bring the Virtuoso Workshop to ASPLOS 2026! This second edition will continue to explore the frontier of hardware/OS co-design for memory management, featuring new talks, updated demos, and fresh perspectives from leading researchers.

Traditional computing systems face significant challenges due to rigid interfaces between hardware and operating systems (OS). These interfaces struggle to meet the performance, efficiency, and security demands of modern applications. For example, the growth in data requirements has turned virtual memory (VM) into a major performance bottleneck.

This has led to a paradigm shift towards hardware/OS co-design, where hardware components and OS mechanisms are designed in tandem to optimize the system. This tutorial and workshop will provide a comprehensive introduction to this area, focusing on memory management.

A core component will be a hands-on exploration of Virtuoso, a simulation framework that enables rapid prototyping and evaluation of HW/OS co-designs. Published at ASPLOS 2025, Virtuoso provides a practical environment for attendees to experiment with co-design strategies and gain practical skills. The workshop is designed for students, engineers, and researchers in computer architecture and operating systems.

📺 Livestream

🔴 Can't attend in person? Join us live!

The workshop will be livestreamed on YouTube. A replay will also be available afterwards.

Virtuoso Workshop @ ASPLOS 2026 – Livestream thumbnail

Organizers

Photo of Konstantinos Kanellopoulos

Konstantinos Kanellopoulos

ETH Zürich

Konstantinos Kanellopoulos is a PhD candidate at ETH Zurich, advised by Prof. Onur Mutlu. His research interests are at the intersection of hardware, software, and operating systems, focusing on performance, programmability, and security. More info on his webpage.

Photo of Andreas Kosmas Kakolyris

Andreas Kosmas Kakolyris

ETH Zürich

Andreas Kosmas Kakolyris is a PhD student at ETH Zurich in the SAFARI Research Group, working with Prof. Onur Mutlu. His research focuses on hardware/software co-design solutions for high-performance and robust memory systems.

Photo of Prof. Onur Mutlu

Prof. Onur Mutlu

ETH Zürich

Onur Mutlu is a Professor of Computer Science at ETH Zürich. His research focuses on designing fundamentally energy-efficient, high-performance, and robust computing systems, with an emphasis on computer architecture, hardware security, and memory systems. He is an ACM Fellow, IEEE Fellow, and has received numerous honors and awards. He is passionate about making research and education widely accessible. More info on his webpage.

Talks Schedule & Invited Speakers

Morning Session — Invited Talks
08:00 – 08:30Konstantinos Kanellopoulos & Andreas Kosmas Kakolyris (ETH Zürich)Introduction & Welcome
08:30 – 09:00Nick Lindsay (Yale University)Understanding Address Translation Scaling Behaviors Using Hardware Event Counters
View Abstract

To stress-test the virtual memory system, computer architecture researchers are combining standard benchmark programs with synthetically generated inputs that induce large memory footprints. However, for these workloads the relationship between memory footprint and address translation overhead is poorly understood. Using hardware event counters, we characterize the performance of the Intel Haswell microarchitecture MMU across a range of benchmarks, memory footprints, and page sizes. We analyze these measurements using our Walk Cycles Per Instruction (WCPI) framework, which attributes address translation overhead to the program, TLB, MMU cache, and hardware page table walker. We find that address translation overhead often scales with the logarithm of memory footprint, with large pages increasing the footprint at which overhead becomes significant, and that up to 57% of all page table walks are either aborted or lie on the wrong path of speculative execution.

09:00 – 09:30Kaiyang Zhao (Carnegie Mellon University)Scaling Memory in Data Centers: From Learned Page Tables to CXL-Attached Tiered Memory
View Abstract

Memory in data centers has run into a scaling problem. With the unrelenting increase in memory capacity, virtual memory does not scale and causes frequent stalls in the processor. On the other hand, the cost of DRAM has skyrocketed as capacity increases, and DRAM now constitutes around half the server TCO in data centers. This talk presents LVM (MICRO '25), which introduces a learning-based approach to the page tables that drastically reduces the cost of page walks, followed by learnings and insights from ongoing work on CXL-attached memory that enables tiered memory in multi-tenant and multi-tier environments, lowering the cost of DRAM in data centers.

09:30 – 10:00Konstantinos Kanellopoulos (ETH Zürich)Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
View Abstract

Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). We present Victima, a new software-transparent mechanism that drastically increases the translation reach of the processor by leveraging the underutilized resources of the cache hierarchy. The key idea of Victima is to repurpose L2 cache blocks to store clusters of TLB entries, thereby providing an additional low-latency and high-capacity component that backs up the last-level TLB and thus reduces PTWs. Victima has two main components: a PTW cost predictor that identifies costly-to-translate addresses, and a TLB-aware cache replacement policy that prioritizes keeping TLB entries in the cache hierarchy. In native (virtualized) execution environments, Victima improves average end-to-end application performance by 7.4% (28.7%) over the baseline four-level radix-tree-based page table design, is completely transparent to application and system software, and incurs very small area and power overheads.

10:00 – 10:30☕ Coffee Break
10:30 – 11:00Deepanjali Mishra (Carnegie Mellon University)DeepFuse: Speculative Load Micro-Op Fusion
View Abstract

Modern processors make thousands of fine-grained decisions about memory every microsecond, guided by implicit, hardware-only models. For modern data center applications, these isolated hardware mechanisms leave significant performance on the table. Data center applications are heavily memory-intensive and frequently backend-bound, with a substantial fraction of stalls arising from load operations that access the same cache line but are executed independently. I-Fuse is a profile-guided and speculative load micro-op pair fusion mechanism that rethinks the traditional interface between software and microarchitecture. It combines Profile-Guided Fusion, which identifies high-confidence fusible load pairs using dynamic execution characteristics, with Speculative Fusion, which predicts and reserves resources for a partner micro-op early. Across backend-bound data center applications, I-Fuse achieves substantial IPC improvements and significantly outperforms prior hardware-only approaches.

11:00 – 11:30Stratos Psomadakis (NTUA)Elastic Translations: A Hardware-Tailored OS Framework for Efficient TLB Coalescing
View Abstract

As the ever-increasing memory footprints of modern workloads strain virtual memory hardware, address translation has become a critical performance bottleneck. Modern architectures, such as ARMv8-A and RISC-V, support OS-assisted TLB coalescing to help alleviate this overhead, yet OS support remains lacking. This talk presents Elastic Translations, a hardware-tailored OS memory management framework that enables the OS to effectively and efficiently drive TLB coalescing hardware to tame the address translation overhead.

11:30 – 12:00Ataberk Olgun (ETH Zürich)MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations
View Abstract

Hardware-software cooperative techniques are powerful approaches to improve the performance, quality of service, and security of general-purpose processors. They are however typically challenging to rapidly implement and evaluate in real hardware as they require full-stack changes to the hardware, OS, system software, and ISA. MetaSys is the first open-source FPGA-based infrastructure, with a prototype in a RISC-V core, to enable the rapid implementation and evaluation of a wide range of cross-layer techniques in real hardware. MetaSys implements a rich hardware-software interface and lightweight metadata support that can be used as a common basis to rapidly implement and evaluate new cross-layer techniques. We demonstrate MetaSys's versatility by implementing three cross-layer techniques for prefetching, bounds checking, and return address protection, each requiring only ~100 lines of Chisel code.

12:00 – 13:30🍽️ Lunch
Afternoon Session — Virtuoso Deep Dive
13:30 – 14:00Konstantinos Kanellopoulos & Konstantinos Sgouras (ETH Zürich)Virtuoso Core Design
14:00 – 14:30Konstantinos Kanellopoulos & Konstantinos Sgouras (ETH Zürich)New Features
14:30 – 15:00Andreas Kosmas Kakolyris & Nisa Bostanci (ETH Zürich)Ramulator + Virtuoso
Hands-On Tutorial Block
15:00 – 15:30Konstantinos Kanellopoulos (ETH Zürich)Tutorial Part 1: Setup & Overview
15:30 – 16:00Konstantinos Kanellopoulos (ETH Zürich)Tutorial Part 2: Core Concepts
16:00 – 16:30Konstantinos Kanellopoulos (ETH Zürich)Tutorial Part 3: Hands-On Walkthrough
16:30 – 17:00Konstantinos Kanellopoulos (ETH Zürich)Tutorial Part 4: Advanced Features & Examples
17:00 – 17:30Konstantinos Kanellopoulos (ETH Zürich)Tutorial Part 5: Q&A & Open Discussion

🎤 Invited Speakers

Photo of Nick Lindsay

Nick Lindsay

Yale University

Talk: Understanding Address Translation Scaling Behaviors Using Hardware Event Counters

Bio: Nick Lindsay is a Ph.D. candidate in Computer Science at Yale University, advised by Professor Abhishek Bhattacharjee. His research interests lie at the intersection of microarchitecture, performance modeling, and formal methods, with a focus on the memory system. At Yale, he has worked on brain-computer interfaces, memory-intensive workload characterization, and on studying the memory management unit implementation on modern processors. Nick is joining AMD Research and Advanced Development in August. More info on his webpage.

Photo of Kaiyang Zhao

Kaiyang Zhao

Carnegie Mellon University

Talk: Scaling Memory in Data Centers: From Learned Page Tables to CXL-Attached Tiered Memory

Bio: Kaiyang Zhao is a fifth-year PhD student in Computer Science at Carnegie Mellon University, advised by Professor Dimitrios Skarlatos. He cares about efficiency and scalability of memory in datacenters and improves them by working at the intersection of computer architecture and operating systems. He has won awards including ISCA Best Paper Award, IEEE MICRO Top Picks, and Qualcomm Innovation Fellowship.

Photo of Deepanjali Mishra

Deepanjali Mishra

Carnegie Mellon University

Talk: I-Fuse: Profile-Guided Speculative Load Fusion for Memory-Intensive Data Centers

Bio: Deepanjali Mishra is a Ph.D. student in the Electrical and Computer Engineering (ECE) Department at Carnegie Mellon University, advised by Prof. Akshitha Sriraman. Her research bridges computer architecture and software systems, demonstrating the importance of that bridge in enabling efficient and sustainable hyperscale data center systems via solutions that span the compute stack. Her research has been recognized with the ASPLOS 2026 WICArch Early-Career Researcher Fellowship, the 2024 Carnegie Institute of Technology Dean's Fellowship, and the 2022 ACM-W Scholarship.

Photo of Stratos Psomadakis

Stratos Psomadakis

National Technical University of Athens (NTUA)

Talk: Elastic Translations: A Hardware-Tailored OS Framework for Efficient TLB Coalescing

Bio: Stratos Psomadakis is a postdoctoral researcher at the Computing Systems Laboratory (CSLab), National Technical University of Athens (NTUA). His research lies at the intersection of operating systems and computer architecture, with a focus on virtual memory, memory management, and efficient sandboxing of serverless applications. His work has received two best paper awards (ASPLOS'25, HotStorage'25). In a previous life, he was a Cloud and Site Reliability Engineer at GRNET, and a Gentoo Linux developer.

Photo of Ataberk Olgun

Ataberk Olgun

ETH Zürich

Talk: MetaSys: A Practical Open-Source Metadata Management System to Implement and Evaluate Cross-Layer Optimizations

Bio: Ataberk Olgun is a Computer Architecture researcher and a Ph.D. student in the SAFARI Research Group at ETH Zürich, led by Prof. Onur Mutlu. He obtained his BSc and MSc degrees from TOBB ETÜ under Prof. Oğuz Ergin's supervision. His research interests lie primarily in designing reliable, performance- and energy-efficient memory systems from the ground up. More info on his webpage.

Photo of Nisa Bostanci

Nisa Bostanci

ETH Zürich

Talk: Ramulator + Virtuoso

Bio: Nisa Bostanci is a Ph.D. student in the SAFARI Research Group at ETH Zurich, led by Prof. Onur Mutlu. She completed her BSc and MSc degrees in Computer Engineering at TOBB ETU under Prof. Oguz Ergin's supervision. Her research focuses on memory systems, particularly their security, reliability, and safety (robustness), and designing effective and efficient solutions to address robustness issues. More info on her webpage.

Hands-On Demonstration Plan

The goal of the demo is to provide attendees with a practical, hands-on experience using Virtuoso to prototype and evaluate hardware/OS co-design techniques for memory management.

Part 1: Measuring Memory Management Overheads with eBPF

In this hands-on demo, attendees will use eBPF (extended Berkeley Packet Filter) to instrument and measure the overheads of memory management operations at the OS level. This software-based approach provides a lightweight, production-safe way to observe page faults and other memory management events in real time, without modifying the kernel. Participants will learn how to write and deploy eBPF programs to collect fine-grained performance data and identify bottlenecks in virtual memory management.

Part 2: Accelerating Page Faults in Hardware with Virtuoso

Building on the insights from Part 1, this demo will explore a hardware-accelerated approach to page fault handling. Attendees will use the Virtuoso simulation framework to prototype and evaluate a hardware mechanism that offloads page fault processing from the OS, significantly reducing the overhead identified in the first part. This hands-on session will demonstrate Virtuoso's full workflow, from defining a hardware/OS co-design to running simulations and analyzing performance results.

What to Expect

Based on our MICRO 2025 experience, attendees can expect:

  • Environment setup and quickstart for both eBPF tools and Virtuoso
  • Guided, step-by-step implementation of each demo
  • A complete journey from measuring real-system overheads (Part 1) to prototyping hardware solutions (Part 2)
  • Practical skills in eBPF instrumentation and hardware/OS co-design simulation with Virtuoso

Prerequisites for Attendees

Detailed prerequisites will be announced closer to the event. In general, attendees should have:

Basic Requirements

  • Laptop with admin/root access (macOS, Linux, or Windows with WSL2)
  • A terminal application and text editor (we recommend VSCode)
  • Basic familiarity with C/C++ and command-line tools
  • Interest in computer architecture and/or operating systems

Options for Participation

  • Cloud VMs: We will provide access to pre-configured VMs (details closer to the event)
  • Local Setup: At least 8GB RAM and 40GB free disk space, with Git and Docker installed
  • Remote Participation: Follow along via livestream if you cannot meet the requirements