Avatar

Marcelo Orenes Vera

Computer Science PhD Candidate

Princeton University

Biography

I am a Ph.D. candidate in Computer Science at Princeton University advised by Professor Margaret Martonosi and Professor David Wentzlaff.

My research focuses on Computer Architecture, from hardware RTL design and verification to software programming models of novel architectures. I have previously worked in the hardware industry at Arm, contributing to the design and verification of three GPU projects; at Cerebras Systems, creating High-Performance Computing kernels; and at AMD Research, working towards designing the next generation data centers optimized for large graph data structure traversal. At Princeton, I have contributed in two academic chip tapeouts that aims to improve the performance, power and programmability of several emerging workflows in the broad areas of Machine Learning and Graph Analytics.

Interests

  • Computer Architecture
  • Heterogeneous parallelism
  • Software-defined hardware
  • Domain-specific architectures
  • Data-centric Architectures
  • Memory-bound Workloads
  • Software-Hardware Co-design
  • Large Manycores
  • HPC
  • SoCs
  • Hardware Design & Verification
  • Formal Verification
  • Security

Education

  • PhD in Computer Science, Current

    Princeton University

    Dissertation: Navigating Heterogeneity and Scalability in Modern Chip Design

  • Master's in Computer Science, 2021

    Princeton University

    GPA 3.95/4.0

  • BSc in Computer Science. 2017

    University of Murcia (Spain)

    GPA: 9.65/10. Ranked 1st of the class.

    Thesis: An Indoor Location and Guidance System with Automated User Trajectory Analysis.

  • International exchange. Computer Science, 2015-2016

    University of Hasselt (Belgium)

Publications

Conference Publications


Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems.

In The International Symposium on Performance Analysis of Systems and Software (ISPASS),2024

Using LLMs to Facilitate Formal Verification of RTL.

AutoCC: Automatic discovery of Covert Channels in Time-Shared Hardware

In the 56th International Symposium on Microarchitecture (MICRO). IEEE/ACM 2023

Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees.

DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications.

Wafer-Scale Fast Fourier Transforms.

In the 37th International Conference in Supercomputing (ICS '23).

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

In the 29th IEEE Symposium on High-Performance Computer Architecture (HPCA '23)

Cohort: Software-Oriented Acceleration for Heterogeneous SoCs

In Proc. of 28th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ACM 2023

DECADES: A 67mm2, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET

In Proc. of the Custom Integrated Circuits Conference (CICC), IEEE 2023

CIFER: A 12nm, 16mm2, 22-Core SoC with a 1541 LUT6/mm2, 1.92 MOPS/LUT, Fully Synthesizable, Cache-Coherent, Embedded FPGA

In Proc. of the Custom Integrated Circuits Conference (CICC), IEEE 2023

Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs

In the 49th Annual International Symposium on Computer Architecture (ISCA '22)
IEEE MICRO Top Picks honorable mention

AutoSVA: Democratizing Formal Verification of RTL Module Interactions

In the 58th ACM/IEEE Design Automation Conference (DAC 21)

MosaicSim: A Lightweight, Modular Simulator for Heterogeneous Systems

In The International Symposium on Performance Analysis of Systems and Software (ISPASS),2020
Nominated for Best Paper Award

A Simulator and Compiler Framework for Agile Hardware-Software Co-design Evaluation

In International Conference on Computer Aided Design (ICCAD). ACM/IEEE 2020

Journal Publications


CIFER: A Cache-Coherent 12nm 16mm2 SoC With Four 64-Bit RISC-V Application Cores, 18 32-Bit RISC-V Compute Cores, and a 1541 LUT6/mm2 Synthesizable eFPGA

In Solid-State Circuit Letters. IEEE 2023

RECITE: A Framework for User Trajectory Analysis in Cultural Sites

In Journal of Ambient Intelligence (JAISE). 2021

Experience

 
 
 
 
 

PhD candidate

Princeton University

Sep 2019 – Present Princeton, New Jersey
  • Designing full-stack approaches to optimize the performance, power, and programmability of graph and other sparse analytics that are the heart of many modern big data applications; contributing to the development of a specialized, reconfigurable hardware platform for accelerating different software applications as part of DARPA’s Software Defined Hardware (SDH) program
 
 
 
 
 

Research and Development

Advanced Micro Devices Inc.

January 2023 - August 2023 Austin, Texas
  • Massively parallelize kernels that traverse sparse data structures and analyze bottlenecks when scaling graph traversal to petabyte-scale graphs.
  • Design hardware accelerators for key functionality that is common within these kernels.
  • Characterize end-to-end applications that are key for the intelligence community such as knowledge graphs to inform the design of the next generation of AMD’s data-center systems.
  • Determine the granularity and integration of memory and compute for such systems.
 
 
 
 
 

Application kernel engineer

Cerebras Systems Inc.

May 2021 - June 2022 Sunnyvale, California
  • Analyze performance bottlenecks of different mappings of massively parallel applications.
  • Use state-of-the-art parallelization and partitioning techniques to automate generation, exploiting hand-written distributed kernels, e.g., FFT (publication at ICS), BFS/SSSP, SPMV, etc.
  • Evaluate the performance of different HPC applications on the Cerebras WSE. Employ and extend state of the art program analysis methods such as the Integer Set Library.
 
 
 
 
 

Hardware engineer

Arm Ltd.

Jul 2017 – Aug 2019 Trondheim, Norway
  • Investigation and design of optimizations to the current implementation of several modules within the Texture Mapper of the Mali GPU, specifically the Texture Cache and the miss-path of it. Multilevel cache systems development and replacement policies optimization for performance and bandwidth saving. I also worked in the main pipeline of data access in an out of order execution architecture.
  • Formal Verification at Unit Level and Bug hunting approaches for liveness properties between units. UVM usage for simulation testbench of the texturing unit.
 
 
 
 
 

Undergraduate Research Assistant

University of Murcia

Feb 2016 –Jul 2017 Murcia, Spain
  • Developing a platform for indoor location based on Bluetooth 4.0 LE technology and analysis of user trajectories making use of clustering techniques.

Contact