For XCDs, we dont show them if the GPU is made of a single
XCD, as it adds little value
For matrix cores, we assume it can be computed as
compute_units * simds_per_cu, it seems to work for the GPUs
I checked from CDNA3 and RDNA3. Not sure what would happen for
older GPUs that do not have matrix cores though.
Before we had AMD support, CMakeLists.txt tried to enable all backends
by default. Now that we have AMD support, that does not make that much
sense so instead it will only enable the backend specified by the user
(with the -DENABLE_XXX_BACKEND flags)
Then, before AMD support, the build.sh script was useful to just
invoke cmake and let it figure out the backends, but the script was
a bit useless after the mentioned change in the CMakeLists.txt.
Therefore, this commit allow users to specify an argument, like:
./build.sh cuda
To specify what backend/s to enable, without the need to manually
configure the build with the -DENABLE_XXX_BACKEND flag. Note that
multiple backends are also allowed, like:
./build.sh intel,hsa
Would enable both Intel and HSA backends (which could make sense for
example in a system with Intel iGPU an an AMD dGPU).
We can use hsa_amd_agent_iterate_memory_pools to fetch info about GPU
memory pools in the GPU. HSA_AMD_SEGMENT_GROUP seems to be LDS, and
HSA_AMD_SEGMENT_GLOBAL seems to be global memory.
However, the latter is reported multiple times (I don't know why). The
only solution I found for this is to check for the
HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXTENDED_SCOPE_FINE_GRAINED flag, which
seems to be reported only once.
For bus width, we simply use HSA_AMD_AGENT_INFO_MEMORY_WIDTH.
If only HSA is enabled we dont need pciutils since AMD detection does
not rely on it. Therefore we change CMakeLists.txt to build pciutils
only if required.
This commit has some side-effects:
1. We now don't build Intel backend by default. In other words, no
backend is built by default, the user must specify which backend
to use.
2. There were some issues with includes and wrongly used defines and
variables. This commit fixes all that.
Similarly to NVIDIA and Intel GPUs, we now detect microarchitecture,
also with manufacturing process and specific chip name. We infer all
of this from the gfx name (in the code we use the term llvm_target),
altough it's not clear yet that this method is completely reliable (see
comments for more details). In the future we might want to replace that
with a better way. Once we have the gfx name, we *should* be able to
infer the specific chip, and from the chip we can easily infer the
microarchitecture.
This commit also includes some refactorings and code improvements on
the HSA backend.
Adds very basic support for AMD (experimental). The only install
requirement is ROCm. Unlike NVIDIA, we don't need the CUDA equivalent
(HIP) to make gpufetch work, which reduces the installation
requirements quite significantly.
Major changes:
* CMakeLists:
- Make CUDA not compiled by default (since we now may want to target
AMD only)
- Set build flags on gpufetch cmake target instead of doing
"set(CMAKE_CXX_FLAGS". This fixes a warning coming from ROCm.
- Assumes that the ROCm CMake files are installed (should be fixed
later)
* hsa folder: AMD support is implemented via HSA (Heterogeneous System
Architecture) calls. Therefore, HSA is added as a new backend to
gpufetch. We only print basic stuff for now, so we may need more
things in the future to give full support for AMD GPUs.
NOTE: This commit will probably break AUR packages since we used to
build CUDA by default, which is no longer the case. The AUR package
should be updated and use -DENABLE_CUDA_BACKEND or -DENABLE_HSA_BACKEND
as appropriate.