This release contains important bug fixes as well as feature improvements. It is a recommended release for all users. Thanks to everyone who reported an issue or suggested an improvement. See GitHub for the full list of merged pull requests and closed issues.
This release is expected to be the last including the legacy CUDA implementation, the version built with QMC_CUDA=1. Users should transition to the batched drivers which support greater functionality as well as both CPU and GPU execution. Users should adopt these drivers now and report any issues. The new drivers can be requested with the driver_version input parameter, see https://qmcpack.readthedocs.io/en/develop/performance_portable.html . In a subsequent release, the non-batched CPU drivers will also be removed leaving only the performance portable batched drivers. This will result in a single implementation of most functionality, improving overall usability and maintainability.
- Important bugfix to NLPP integration grid rotations and update to all relevant deterministic test values. See issue #4362 for full discussion and visualization. Found and corrected by @markdewing, this bug has existed since the earliest days of QMCPACK. The stochastic rotations used to randomly reorient the integration grids for the non-local pseudopoptentials would not cover the full sphere unless they had many points and sufficient symmetry, as was the case for the QMCPACK default. However, calculations with custom integration grids with only a few points (small
nrule) could show error or excess statistical noise in the non-local part of the pseudopotential energy. Standard calculations and tests on carbon diamond, lithium hydride, and hydrocarbon molecules were not affected due to QMCPACK's conservative defaults. Tests updated in #4383
- NLPP grid randomization can be disabled for debugging and greater reproducibility #4394
- Two-body Jastrow support for true 2D calculations #4289 (contributed by @Paul-St-Young)
- Fix for very large calculations requesting too large grids in CUDA spline implementation #4421 (contributed by @pwang234)
- Bugfix in the batched OpenMP offload implementation memory errors #4408 when the number of splines is not a perfectly aligned size (multiple of 8 single precision or 4 double precision).
- Updates to test tolerances for many build types and platforms to improve reliability of deterministic tests. Goal:
ctest -L deterministicshould pass on all platforms. Please report any failures.
- Improved CMake configuration including detecting use of parallel HDF5 in non-MPI builds #4420 and detection of missing OpenMP support #4422
- Optimization of spinor wavefunctions with spin-orbit and pseudopotentials re-enabled #4418
- QMCPACK output now indicates status of QMC_COMPLEX #4412
- Initial work for eventual GPU offloading of Gaussian basis wavefunctions for molecules and solids #4407
- Bugfix to support one-body Jastrow functions where only a subset of elements is given #4405
- Electron coordinates are printed in case a NaN is detected #4401
- To evade support problems for complex reductions in OpenMP offload compilers, real builds no longer reference any complex reductions #4379
- Enabled HIP as language in CMake (requires >= 3.21) #3646. When using HIP targeting AMD GPUs, replace HIP_ARCH with CMAKE_HIP_ARCHITECTURES if HIP_ARCH was used to specify the GPU architectures.
- Refinements to SYCL usage, e.g., #4384, #4382, #4380
- Many expanded tests including for NLPP parameter derivatives #4394, more boundary conditions in distance tables #4374, for reptation Monte Carlo observables #4327, and orbital rotations #4304
- Many updates to HDF5 usage including adoption of HDF5 1.10 API #4352 and related cleanup, e.g. #4300
- Initial Perlmutter CPU build recipe #4398
- Initial ALCF Sunspot build recipe including offloading to Intel Ponte Vecchio/Xe HPC GPU #4391
- Better support for FreeBSD #4416
- Minimum supported Intel classic compiler version is 2021.1. #4389
- Ongoing improvement to orbital optimization and rotation, e.g. #4288, #4402
- Ongoing code cleanup, e.g. #4276, #4275, #4273
- Updated bmpi3 MPI "wrapper"
- Various other small bug fixes and quality of life improvements. See the full list of merged PRs on GitHub for details.
- When offload builds are compiled with CUDA toolkit versions above 11.2 (tested 11.3-11.8) using LLVM15, multideterminant tests and functionality will fail, seemingly due to an issue with the toolkit. This is discussed in llvm/llvm-project#54633 . All other functionality appears to work as expected. As a workaround, the CUDA toolkit 11.2 can be used. The actual NVIDIA drivers can be more recent.
- CUDA toolkit version 12.0 is not compatible with LLVM OpenMP offload llvm/llvm-project#60296
- Nexus: Support for use of templates for job submission scripts #4344
- Nexus: twist_info.dat files now added to results directory for easier analysis of twist average quantities #4302
- Nexus: Initial support for Polaris at ALCF #4354
- Nexus: Initial support for Perlmutter at NERSC #4356
- Nexus: Support for gpusharing keyword for legacy CUDA #4403
- Nexus: Support for handling multiple pickle protocols #4385
- Nexus: CPU/GPU flags for batched code #4341
- Nexus: Jastrow factors can be read from existing files #4339
- Nexus: Fix VASP POSCAR write #4331
- Nexus: Better handling of VASP pseudopotentials #4330
- The new QE7.1 DFT+U input style is not yet supported #4100