ABOUT ME

I am a fourth-year Ph.D. student in the Department of Electrical and Computer Engineering at the University of Wisconsin–Madison, advised by Prof. Tsung-Wei (TW) Huang. My research focuses on task graph partitioning, parallel computing, and design automation. My programming skills mainly include C/C++ and CUDA C++. Recently, I have been working on memory-bound GPU acceleration problems, including FDTD simulation and SSTA. With strong support from my research group and collaborators, I have published papers at top-tier conferences including DAC, ICCAD, and ISPD. Outside of research, I enjoy driving manual cars in my free time — save the manuals!

Experiences

Software Intern

2024.5 - 2024.8
Cadence
  • Worked on GPU-accelerated Statistical Static Timing Analysis.

Research Assistant

2023.9 - Present
University of Wisconsin-Madison
  • Designed PASTA, a fast task-graph partitioner for static timing analysis (STA), with both a parallel CPU version (C-PASTA ISPD’24) and a GPU version (G-PASTA DAC’24). PASTA aims to reduce task-scheduling overhead by partitioning the original task graph into a smaller graph without affecting too much its parallelism.
  • Designed iTAP, an incremental task graph partitioner for STA built on top of PASTA (ASP-DAC’25). iTAP aims to further reduce the partitionig runtime by incrementally updating the partitioned task graph, avoiding fully repartition the entire graph when only a small portion of the graph changes.
  • Implemented an efficient GPU kernel for Finite-Difference Time-Domain (FDTD) simulation that applies diamond tiling to exploit temporal data reuse in shared memory, achieving ~40% speedup over a state-of-the-art implementation on a 4M-cell problem.
  • Designed G-STAR, a GPU-accelerated statistical static timing analysis algorithm using level-by-level replication. G-STAR aims to enable efficient levelized data propagation for memory-bound workloads whose full level list cannot fit in GPU memory at once.

Research Assistant

2023.1 - 2023.8
University of Utah
  • Implemented various task graph partitioners (Repcut, Vivek’s, GDCA) to speed up the update timing process in OpenTimer.

Research Assistant

2021.6 - 2022.6
Rutgers University
  • Implemented R-tree motion planning algorithm for 2-D space searching in MATLAB.
  • Implemented a simple five-layer CNN inference phrase based on the systolic array architecture on a PYNQ-Z1 FPGA board. Data is pre-trained in PC with PyTorch and loaded into DRAM through UART. The CNN computation part is implemented in Verilog on the FPGA part of the board. Data transfer is controlled by the SoC ZYNQ processor programmed in C.
  • Implemented a parser in JavaScript with RapidWright library to parse the placement information in Xilinx’s Vivado Design Tools. The output of the parser, which contains the placement information from Vivado, can be used as input to academic placers to improve the placement results from Vivado. The improved placement results from the academic placers can then be inputted back to Vivado to finish the later routing process.

Teaching Assistant

2026.1 - 2026.5
University of Wisconsin-Madison

ECE 376 Electrical and Electronic Circuits.

  • Led the course discussion.

Teaching Assistant

2022.9 - 2022.12
Rutgers University

ECE 14:332:231 Digital Logic Design.

  • Designed the lab materials and policies.
  • Designed the final project (A simple RISC-V architecture written in SystemVerilog).

Projects

iTAP - Incremental Task Graph Partitioner for Task-parallel Static Timing Analysis
G-PASTA - GPU Accelerated Partitioning Algorithm for Static Timing Analysis
C-PASTA - Parallel CPU Partitioning Algorithm for Static Timing Analysis

Selected Publications

Conference Papers

  • iTAP: An Incremental Task Graph Partitioner for Task-parallel Static Timing Analysis
  • Boyang Zhang, Che Chang, Cheng-Hsiang Chiu, Dian-Lun Lin, Yang Sui, Chih-Chun Chang, Yi-Hua Chung, Wan-Luan Lee, Zizheng Guo, Yibo Lin, and Tsung-Wei Huang
    IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, 2025
  • G-PASTA: GPU-Accelerated Partitioning Algorithm for Static Timing Analysis
  • Boyang Zhang, Dian-Lun Lin, Che Chang, Cheng-Hsiang Chiu, Bojue Wang, Wan Luan Lee, Chih-Chun Chang, Donghao Fang, and Tsung-Wei Huang
    ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, 2024
  • Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System
  • Tsung-Wei Huang, Boyang Zhang, Dian-Lun Lin, and Cheng-Hsiang Chiu
    ACM International Symposium on Physical Design (ISPD), Taipei, Taiwan, 2024
  • Global Placement Exploiting Soft 2D Regularity
  • Donghao Fang, Boyang Zhang, Hailiang Hu, Wuxi Li, Bo Yuan, Jiang Hu
    ACM International Symposium on Physical Design (ISPD), New York, NY, USA, 2022
  • Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study
  • Boyang Zhang, Yang Sui, Lingyi Huang, Siyu Liao, Chunhua Deng, Bo Yuan
    IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 2021