EE563 Programming Massively Parallel Processors
The architecture of Graphics Processing Units (GPU) has evolved over the year from fixed-function graphic pipelines to arrays of unified programmable processors. This has allowed GPUs to be used for scientific computing. Equipped with hundreds or even thousands of cores, GPUs are qualified of massively parallel processors and provide significant performance improvement for parallel applications compared to common single-core or multi-core processors (CPU). In this course, students will develop a thorough understanding of the architecture of recent GPUs and will learn how to efficiently program these processors through data-level parallelism using high-level programming languages such as CUDA. Topics covered will include the history of GPUs, the architecture of GPUs, principles of parallel programming, data level parallelism, memory hierarchy, performance consideration, numerical consideration, parallel patterns such as map, reduction, scan, sort, histogram and matrix operations. Students completing this course will have a throughout understanding of the GPU programming model and will be able to design efficient parallel algorithms on GPUs.
By taking this course, the students will:
- Develop a thorough understanding of the GPU architecture and programming model;
- Complete four hand-on laboratories that will require significant programming in C CUDA;
- Complete a course project involving the parallel implementation of a relevant algorithm on GPU in order to achieve a shorter execution time; and
- Review of a state-of-the-art paper on GPU programming.
- 4x laboratory reports;
- Project report;
- Project presentation; and
- Written or oral critique of a research paper on GPU programming.
D. Kirk and W. Hwu. “Programming Massively Parallel Processors”, 3rd edition, 2016, 576 p.
The course will be organized in three components as follows:
Component 1 will consist of a series of online lectures supplemented by reading assignments, tutorials and laboratory works. The instruction material comes from the CS193 course taught at Stanford University and has been made freely available online for others to use in their curriculum. The lectures can be downloaded using iTune. The tutorials and lab instructions are available on the CS193 GitHub repository. The links are given below. The lab instructions are embedded in the starting code for each lab. The reading assignments are from the mandatory textbook (D. Kirk and W. Hwu, 2016). The readings supplement the online lectures and prepare the students for the laboratory works.
- Lectures (https://itunes.apple.com/us/itunes-u/programming-massively-parallel/id384233322)
- Tutorials (https://code.google.com/archive/p/stanford-cs193g-sp2010/wikis)
- Labs (https://github.com/jaredhoberock/stanford-cs193g-sp2010)
Component 2 will consist of completing a course project using parallel programming on GPUs. The project can be completed individually or in teams of two and must include a significant hands-on component. Typical projects consist of parallelizing a known algorithm on a GPU and measure the speedup achieved. The students will be required to submit a project proposal midway through the term, a project report at the end of the term and give an oral presentation during the last two weeks of the term.
Component 3 will consist of a review of a state-of-the-art paper on GPU programming. The review will consist of a single spaced one-page document that will summarize the content of the paper, critique or comment on the work presented and suggest future work on that topic.
With the exception of the first week when the instructor introduces the course and the last week when the students present their project, there will be no formal lectures given by the instructor. All the material is available online and in the mandatory textbook. The students are expected to work autonomously and to submit their work on time throughout the semesters. The instructor will identify a period during the week when he is available to answer questions.
For each laboratory, a full lab report must be submitted. This report, must include an introduction, a high level description of the implementation, a self-explanatory description of the test and results, a discussion and a conclusion. The lab report must contain sufficient details to demonstrate that the work was completed successfully.
Marks will be weighed as follows:
- Labs – 40%
- Project report – 20%
- Project presentation – 20%
- Paper review – 20%