Introduction to Parallel Computing and Scientific
Please send info about when you are NOT available to
firstname.lastname@example.org by Thu Jan ?? 11:59pm
Hall room 8220 Time: Thu 1:30-3:30pm
First Lecture: Wed Jan 17, 2019
Projects due: graduating students - May 14,
2019 ; non-graduating students - May 19, 2019
The objectives of this course are:
- to develop structural intuition of how the hardware and the
software work together, starting from simple systems to complex
shared resource architectures;
- to provide guidelines about how to write and document a
- to familiarize the audience with the main parallel programming
techniques and the common software packages/libraries.
New for 2019:
- Access to PSC computing infrastructure through XSEDE
- GPU Computing: added OpenACC
- Schedulers: Torque replaced by SLURM
- OpenMPI: v3 updates
- Slight reordering of the material
The course is intended to be self-consistent, no prior computer
skills being required. However, familiarity with the C programming
language and Unix command line should give the student more time
to concentrate on the core issues of the course, as hardware
structure, operating system and networking insights, numerical
The main idea of the course is to give the student a hands-on
experience of writing a simple software package that eventually
can be implemented on a parallel computer architecture. All the
steps and components of the process (defining the problem,
numerical algorithms, program design, coding, different levels of
documentation) are treated at a basic level. Everything is done in
the context of a structured vision of the computing environment.
The typical programming environment makes the computer hardware
and operating system transparent to the user. In contrast, each
program intended for efficient parallel execution must consider
the custom physical and logical communication topology of the
processors in a parallel system. The course gives a general image
over the entire range of issues that a developer should consider
when designing a parallel algorithm, from principles to details.
The knowledge provided by the course should be enough to help the
audience decide what's the most appropriate technique to approach
a problem on a given computer architecture. However, the
development of an efficient algorithm will require a lot of additional study, practice, and
The examples, exercises, and projects were determined by the
computers and software available for practice. The following were
preferred: the C language, the x86_64 hardware platform, and the
Linux operating system. However, the presentation will be kept at
a very general level such that the student is prepared for any
real parallel computing environment.
The course contains three parts:
The first part makes the connection between real life and the
The second part provides the background needed to understand how
computer systems work.
- Module 1 software package structure, design, development, and
- Module 2 parallel computing basic concepts and programming
techniques: SMP, MPI, domain/data decomposition, deadlocks.
- Module 3 Cluster management: remote access/key management,
- Module 4 how to transform a real life problem into a
sequential computer algorithm, with reference to basic numerical
The third part explores the performance computing world.
- Module 5 the layered model of the computer hardware basics.
- Module 6 a model of structural information
organization with applications to filesystems and storage.
- Module 7 a typical operating system, user interfaces, shell,
process communications, user level issues.
- Module 8 programming notions with applications to the C
language, libraries, compilers, debuggers.
- Module 9 describes computer networks, topology, and layered
- Module 10 principal parallel architectures and the
characteristics of the typical software packages; hybrid
- Module 11 how to take advantage of multiple cores (SMP)
through multi-threading and OpenMP.
- Module 12 the MPI standard, several common implementation,
additional library issues.
- Module 13 the PETSC library, an interesting application of MPI
for real life simulations.
- Module 14 GPU computing: CUDA and OpenACC.
- Module 15 modern developments: Big Data (Hadoop), Artificial
Intelligence (Decission Trees, Neural Networks/Tensorflow).
Grading is based on three components:
Students will need to pick a final project no later than the third
week of the semester and to deliver milestones every other week. A
project consists in developing a (simple) software package or module
that has a defined practical purpose. More details and alternatives
are here. Students are welcome to
discuss with the instructor projects close to their scientific
interests, or pick one of the offered projects.
- Class attendance (30%)
- Merits of the final project (50%)
- Midterm take-home test and/or in-class short quizzes (20%)
Merits of the final project considered for grading are:
Homework will be assigned after most
lectures. Submitting solutions is not mandatory, but can offer
a bonus for the final grade. The purpose of the homework is to help
develop practical skills and get used with the computing
environment. The homework is recommended for students auditing the
lectures as well. Students are encouraged to submit solutions
containing interesting approaches or comments.
- how well is the program structured such that it will allow
easy further development, easy debugging (diagram of modules and
- the quality (and not the length) of the documentation
- how functional, efficient, and/or innovative is the numerical
algorithm (tested on the provided examples)
Students are welcome to participate for credit or for fun.
Unregistered students should express their interest and by
e-mail to email@example.com
or in person (Wean Hall, Room 6218) at any time before second day
of school of the Spring Semester.
I'm always available for consultations and for discussions
regarding the projects and the curriculum.