QMCPACK Release v3.16.0 - 2023-01-31

Notes

This release contains important bug fixes as well as feature improvements. It is a recommended release for all users. Thanks to everyone who reported an issue or suggested an improvement. See GitHub for the full list of merged pull requests and closed issues.

This release is expected to be the last including the legacy CUDA implementation, the version built with QMC_CUDA=1. Users should transition to the batched drivers which support greater functionality as well as both CPU and GPU execution. Users should adopt these drivers now and report any issues. The new drivers can be requested with the driver_version input parameter, see https://qmcpack.readthedocs.io/en/develop/performance_portable.html . In a subsequent release, the non-batched CPU drivers will also be removed leaving only the performance portable batched drivers. This will result in a single implementation of most functionality, improving overall usability and maintainability.

Important bugfix to NLPP integration grid rotations and update to all relevant deterministic test values. See issue #4362 for full discussion and visualization. Found and corrected by @markdewing, this bug has existed since the earliest days of QMCPACK. The stochastic rotations used to randomly reorient the integration grids for the non-local pseudopoptentials would not cover the full sphere unless they had many points and sufficient symmetry, as was the case for the QMCPACK default. However, calculations with custom integration grids with only a few points (small nrule) could show error or excess statistical noise in the non-local part of the pseudopotential energy. Standard calculations and tests on carbon diamond, lithium hydride, and hydrocarbon molecules were not affected due to QMCPACK's conservative defaults. Tests updated in #4383
NLPP grid randomization can be disabled for debugging and greater reproducibility #4394
Two-body Jastrow support for true 2D calculations #4289 (contributed by @Paul-St-Young)
Fix for very large calculations requesting too large grids in CUDA spline implementation #4421 (contributed by @pwang234)
Bugfix in the batched OpenMP offload implementation memory errors #4408 when the number of splines is not a perfectly aligned size (multiple of 8 single precision or 4 double precision).
Updates to test tolerances for many build types and platforms to improve reliability of deterministic tests. Goal: ctest -L deterministic should pass on all platforms. Please report any failures.
Improved CMake configuration including detecting use of parallel HDF5 in non-MPI builds #4420 and detection of missing OpenMP support #4422
Optimization of spinor wavefunctions with spin-orbit and pseudopotentials re-enabled #4418
QMCPACK output now indicates status of QMC_COMPLEX #4412
Initial work for eventual GPU offloading of Gaussian basis wavefunctions for molecules and solids #4407
Bugfix to support one-body Jastrow functions where only a subset of elements is given #4405
Electron coordinates are printed in case a NaN is detected #4401
To evade support problems for complex reductions in OpenMP offload compilers, real builds no longer reference any complex reductions #4379
Enabled HIP as language in CMake (requires >= 3.21) #3646. When using HIP targeting AMD GPUs, replace HIP_ARCH with CMAKE_HIP_ARCHITECTURES if HIP_ARCH was used to specify the GPU architectures.
Refinements to SYCL usage, e.g., #4384, #4382, #4380
Many expanded tests including for NLPP parameter derivatives #4394, more boundary conditions in distance tables #4374, for reptation Monte Carlo observables #4327, and orbital rotations #4304
Many updates to HDF5 usage including adoption of HDF5 1.10 API #4352 and related cleanup, e.g. #4300
Initial Perlmutter CPU build recipe #4398
Initial ALCF Sunspot build recipe including offloading to Intel Ponte Vecchio/Xe HPC GPU #4391
Better support for FreeBSD #4416
Minimum supported Intel classic compiler version is 2021.1. #4389
Ongoing improvement to orbital optimization and rotation, e.g. #4288, #4402
Ongoing code cleanup, e.g. #4276, #4275, #4273
Updated bmpi3 MPI "wrapper"
Various other small bug fixes and quality of life improvements. See the full list of merged PRs on GitHub for details.

Known problems

When offload builds are compiled with CUDA toolkit versions above 11.2 (tested 11.3-11.8) using LLVM15, multideterminant tests and functionality will fail, seemingly due to an issue with the toolkit. This is discussed in llvm/llvm-project#54633 . All other functionality appears to work as expected. As a workaround, the CUDA toolkit 11.2 can be used. The actual NVIDIA drivers can be more recent.
CUDA toolkit version 12.0 is not compatible with LLVM OpenMP offload llvm/llvm-project#60296

NEXUS

Nexus: Support for use of templates for job submission scripts #4344
Nexus: twist_info.dat files now added to results directory for easier analysis of twist average quantities #4302
Nexus: Initial support for Polaris at ALCF #4354
Nexus: Initial support for Perlmutter at NERSC #4356
Nexus: Support for gpusharing keyword for legacy CUDA #4403
Nexus: Support for handling multiple pickle protocols #4385
Nexus: CPU/GPU flags for batched code #4341
Nexus: Jastrow factors can be read from existing files #4339
Nexus: Fix VASP POSCAR write #4331
Nexus: Better handling of VASP pseudopotentials #4330

Known problems

The new QE7.1 DFT+U input style is not yet supported #4100