132 Commits

Author SHA1 Message Date
0f416b2da9 Patch cuda.cpp with cloudy's fix 2026-01-10 19:29:45 -05:00
Dr-Noob
5f619dc95a [v0.30] Add support for XCDs and matrix cores
For XCDs, we dont show them if the GPU is made of a single
XCD, as it adds little value

For matrix cores, we assume it can be computed as
compute_units * simds_per_cu, it seems to work for the GPUs
I checked from CDNA3 and RDNA3. Not sure what would happen for
older GPUs that do not have matrix cores though.
2025-10-26 10:51:27 +01:00
Dr-Noob
98bb02e203 [v0.30] Allow users to select backend from build script
Before we had AMD support, CMakeLists.txt tried to enable all backends
by default. Now that we have AMD support, that does not make that much
sense so instead it will only enable the backend specified by the user
(with the -DENABLE_XXX_BACKEND flags)

Then, before AMD support, the build.sh script was useful to just
invoke cmake and let it figure out the backends, but the script was
a bit useless after the mentioned change in the CMakeLists.txt.

Therefore, this commit allow users to specify an argument, like:

./build.sh cuda

To specify what backend/s to enable, without the need to manually
configure the build with the -DENABLE_XXX_BACKEND flag. Note that
multiple backends are also allowed, like:

./build.sh intel,hsa

Would enable both Intel and HSA backends (which could make sense for
example in a system with Intel iGPU an an AMD dGPU).
2025-10-24 22:29:45 +02:00
Dr-Noob
78d34e71f1 [v0.30][AMD] Add support to fetch bus width, global memory and LDS size
We can use hsa_amd_agent_iterate_memory_pools to fetch info about GPU
memory pools in the GPU. HSA_AMD_SEGMENT_GROUP seems to be LDS, and
HSA_AMD_SEGMENT_GLOBAL seems to be global memory.

However, the latter is reported multiple times (I don't know why). The
only solution I found for this is to check for the
HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXTENDED_SCOPE_FINE_GRAINED flag, which
seems to be reported only once.

For bus width, we simply use HSA_AMD_AGENT_INFO_MEMORY_WIDTH.
2025-10-23 21:30:02 +02:00
Dr-Noob
82ea16fc3d [v0.30] Fix warning in printer 2025-10-16 20:01:14 +02:00
Dr-Noob
6589de9717 [v0.30] Reorganize attributes in printer and add CUs attr for AMD 2025-10-16 19:53:48 +02:00
Dr-Noob
0950b97393 [v0.30] Build pciutils only if neccesary
If only HSA is enabled we dont need pciutils since AMD detection does
not rely on it. Therefore we change CMakeLists.txt to build pciutils
only if required.

This commit has some side-effects:
1. We now don't build Intel backend by default. In other words, no
   backend is built by default, the user must specify which backend
   to use.
2. There were some issues with includes and wrongly used defines and
   variables. This commit fixes all that.
2025-10-16 08:26:42 +02:00
Dr-Noob
8794cd322d [v0.30] Add support for building on AMD where rocm-cmake is not installed 2025-10-16 07:24:45 +02:00
Dr-Noob
5df85aea2c [v0.30] Add uarch detection to AMD GPUs
Similarly to NVIDIA and Intel GPUs, we now detect microarchitecture,
also with manufacturing process and specific chip name. We infer all
of this from the gfx name (in the code we use the term llvm_target),
altough it's not clear yet that this method is completely reliable (see
comments for more details). In the future we might want to replace that
with a better way. Once we have the gfx name, we *should* be able to
infer the specific chip, and from the chip we can easily infer the
microarchitecture.

This commit also includes some refactorings and code improvements on
the HSA backend.
2025-10-15 08:23:28 +02:00
Dr-Noob
b29b17d14f [v0.30] Add support for AMD GPUs
Adds very basic support for AMD (experimental). The only install
requirement is ROCm. Unlike NVIDIA, we don't need the CUDA equivalent
(HIP) to make gpufetch work, which reduces the installation
requirements quite significantly.

Major changes:

* CMakeLists:
  - Make CUDA not compiled by default (since we now may want to target
    AMD only)
  - Set build flags on gpufetch cmake target instead of doing
    "set(CMAKE_CXX_FLAGS". This fixes a warning coming from ROCm.
  - Assumes that the ROCm CMake files are installed (should be fixed
    later)

* hsa folder: AMD support is implemented via HSA (Heterogeneous System
  Architecture) calls. Therefore, HSA is added as a new backend to
  gpufetch. We only print basic stuff for now, so we may need more
  things in the future to give full support for AMD GPUs.

NOTE: This commit will probably break AUR packages since we used to
build CUDA by default, which is no longer the case. The AUR package
should be updated and use -DENABLE_CUDA_BACKEND or -DENABLE_HSA_BACKEND
as appropriate.
2025-10-12 12:34:56 +02:00
Dr-Noob
57caadf530 [v0.25] Add Intel Whiskey Lake SoC (#42) 2023-10-20 07:59:07 +01:00
Dr-Noob
ed35cb872b [v0.25] Leave cuda/intel backend to decide how to report PCI vendor failure 2023-03-31 16:16:46 +02:00
Dr-Noob
3d36852f9d [v0.25] Fix for PCI class 0302 can also be responsible for GPUs (like in AWS) 2023-03-31 16:12:22 +02:00
Dr-Noob
fb0109d327 [v0.25] PCI class 0302 can also be responsible for GPUs 2023-03-31 16:08:59 +02:00
Dr-Noob
68619aa03e [v0.25] Avoid segfault when the pci vendor is not found 2023-03-31 15:50:37 +02:00
Dr-Noob
a4006db616 [v0.25] Remove warning notice 2022-12-03 18:06:36 +01:00
Dr-Noob
774550307c [v0.25] Add option to print all GPUs as requested in #33 2022-12-03 18:04:50 +01:00
Dr-Noob
06dc50b6a5 [v0.25] Updated cuda_helper to support latest GPUs 2022-12-03 16:39:18 +00:00
Dr-Noob
9837236c7e [v0.25] Fixed some details in README and build.sh 2022-12-03 14:46:48 +00:00
Dr-Noob
a6f0c18fcb [v0.25] Add missing Ampere GPU chips and new uarchs: ada and hopper v0.25 2022-10-25 20:13:29 +02:00
Dr-Noob
94490b3f38 [v0.24] Fix typo in error message (thanks #22 and #28) 2022-10-25 19:41:46 +02:00
Dr-Noob
5faac7a756 [v0.24] Update PCI ids to pciutils/pciids@06c4c9a 2022-10-25 19:30:24 +02:00
Dr-Noob
8c62e9ebaf [v0.24] Added generic KBL UHD Graphics. Should fix #19 2022-07-13 13:27:22 +02:00
Dr-Noob
4d948eb80a [v0.24] Remove CUDA driver initialization message before printing any other message 2022-05-21 23:19:03 +02:00
Dr-Noob
cf96628385 [v0.24] Fix topology for currently supported ALD iGPUs 2022-05-14 20:25:08 +02:00
Dr-Noob
5bf35ee6d7 [v0.24] Make sure we have valid data before reporting peakperf in Intel 2022-05-14 13:12:19 +02:00
Dr-Noob
fea985d08c [v0.24] Add first support for Alder Lake iGPUs. Needs more work to check data properly 2022-05-14 13:01:34 +02:00
Dr-Noob
24f20d0901 [v0.24] Small fixes; improve PCI report when no GPU is found, speedup invalid GPU idx detection 2022-05-14 12:00:23 +02:00
Dr-Noob
c4ad2bd4f8 [v0.24] Merge bugfix branch 2022-04-17 14:04:19 +02:00
Dr-Noob
af52d2850c [v0.24] Remove cuda-samples dependency v0.24 2022-04-17 13:55:05 +02:00
Dr-Noob
6f196c1797 [v0.23] Fix FreeBSD compilation issues as reported by #13 2022-04-10 16:52:42 +01:00
Dr-Noob
312d78b7f1 [v0.23] Fix dummy warning in intel uarch 2022-04-10 16:11:59 +01:00
Dr-Noob
ebad29e044 [v0.23] Fix CMake to find CUDA Samples in CUDA >= 11.6 2022-03-12 11:04:09 +01:00
Dr-Noob
59df3e53ec [v0.23] Fix README text. It is written following a C style, but actually written in C++ because of CUDA 2022-01-23 10:57:02 +01:00
Dr-Noob
d120f9a1cd [v0.23] Add --logo-short/long. Closes #11 2022-01-23 10:55:26 +01:00
Dr-Noob
bd1158c139 [v0.23] Sort PCI devices; this makes the devices list to match CUDA driver ordering, which fixes a bug when there was more than one NVIDIA GPU v0.23 2022-01-22 13:25:22 +01:00
Dr-Noob
23586a18e9 [v0.22] Fix for previous commit (dont show tensor cores in TU116) 2022-01-20 22:57:19 +01:00
Dr-Noob
d3aaf7cfe5 [v0.22] Do not show tensor cores in TU116 2022-01-12 19:34:11 +01:00
Dr-Noob
49119ae7eb [v0.22] Disable pciutils hwdb compilation (useless for gpufetch) to avoid linking against udev 2022-01-12 19:14:56 +01:00
Dr-Noob
4cba0a7194 [v0.22] Round memory size to make output prettier 2022-01-12 18:29:49 +01:00
Dr-Noob
6d9985e5f7 [v0.22] Link against udev, which should fix the error reported by #9 2022-01-11 18:33:49 +01:00
Dr-Noob
0faa7caeee [v0.22] Add check to properly detect TigerLake GT2 80/96 EUs 2021-12-29 21:56:19 +01:00
Dr-Noob
7f7e70bc5d [v0.22] Add Gen11 and Gen12 Intel iGPUs (needs more work) 2021-12-28 18:34:56 +01:00
Dr-Noob
6f555f1b47 [v0.22] Small various fixes 2021-12-28 16:43:11 +01:00
Dr-Noob
98a70d5c9e [v0.21] Print only one error message when the GPU chip is not found in the LUT 2021-12-28 16:21:04 +01:00
Dr-Noob
7ed0e4a63d [v0.21] Small improvement to argument error reporting 2021-12-28 16:09:39 +01:00
Dr-Noob
9d2a07146a [v0.21] Check that topology is valid in Intel backend. Print informative message if no valid topology is found 2021-12-28 15:56:44 +01:00
Dr-Noob
8d2f50b398 [v0.21] Print GPU list even when no valid GPU is detected, to improve user understanding 2021-12-28 15:40:29 +01:00
Dr-Noob
8bfe88f9f6 [v0.21] Use MiB to show memory size and do not truncate (may cause problems, as reported in #8) 2021-12-28 13:44:53 +01:00
Dr-Noob
8fbf97c47a [v0.21] Add verbose option. Fix CUDA driver initialization message when verbose output is used 2021-12-27 22:37:51 +01:00