35 Commits

Author SHA1 Message Date
0f416b2da9 Patch cuda.cpp with cloudy's fix 2026-01-10 19:29:45 -05:00
Dr-Noob
5f619dc95a [v0.30] Add support for XCDs and matrix cores
For XCDs, we dont show them if the GPU is made of a single
XCD, as it adds little value

For matrix cores, we assume it can be computed as
compute_units * simds_per_cu, it seems to work for the GPUs
I checked from CDNA3 and RDNA3. Not sure what would happen for
older GPUs that do not have matrix cores though.
2025-10-26 10:51:27 +01:00
Dr-Noob
98bb02e203 [v0.30] Allow users to select backend from build script
Before we had AMD support, CMakeLists.txt tried to enable all backends
by default. Now that we have AMD support, that does not make that much
sense so instead it will only enable the backend specified by the user
(with the -DENABLE_XXX_BACKEND flags)

Then, before AMD support, the build.sh script was useful to just
invoke cmake and let it figure out the backends, but the script was
a bit useless after the mentioned change in the CMakeLists.txt.

Therefore, this commit allow users to specify an argument, like:

./build.sh cuda

To specify what backend/s to enable, without the need to manually
configure the build with the -DENABLE_XXX_BACKEND flag. Note that
multiple backends are also allowed, like:

./build.sh intel,hsa

Would enable both Intel and HSA backends (which could make sense for
example in a system with Intel iGPU an an AMD dGPU).
2025-10-24 22:29:45 +02:00
Dr-Noob
78d34e71f1 [v0.30][AMD] Add support to fetch bus width, global memory and LDS size
We can use hsa_amd_agent_iterate_memory_pools to fetch info about GPU
memory pools in the GPU. HSA_AMD_SEGMENT_GROUP seems to be LDS, and
HSA_AMD_SEGMENT_GLOBAL seems to be global memory.

However, the latter is reported multiple times (I don't know why). The
only solution I found for this is to check for the
HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXTENDED_SCOPE_FINE_GRAINED flag, which
seems to be reported only once.

For bus width, we simply use HSA_AMD_AGENT_INFO_MEMORY_WIDTH.
2025-10-23 21:30:02 +02:00
Dr-Noob
82ea16fc3d [v0.30] Fix warning in printer 2025-10-16 20:01:14 +02:00
Dr-Noob
6589de9717 [v0.30] Reorganize attributes in printer and add CUs attr for AMD 2025-10-16 19:53:48 +02:00
Dr-Noob
0950b97393 [v0.30] Build pciutils only if neccesary
If only HSA is enabled we dont need pciutils since AMD detection does
not rely on it. Therefore we change CMakeLists.txt to build pciutils
only if required.

This commit has some side-effects:
1. We now don't build Intel backend by default. In other words, no
   backend is built by default, the user must specify which backend
   to use.
2. There were some issues with includes and wrongly used defines and
   variables. This commit fixes all that.
2025-10-16 08:26:42 +02:00
Dr-Noob
8794cd322d [v0.30] Add support for building on AMD where rocm-cmake is not installed 2025-10-16 07:24:45 +02:00
Dr-Noob
5df85aea2c [v0.30] Add uarch detection to AMD GPUs
Similarly to NVIDIA and Intel GPUs, we now detect microarchitecture,
also with manufacturing process and specific chip name. We infer all
of this from the gfx name (in the code we use the term llvm_target),
altough it's not clear yet that this method is completely reliable (see
comments for more details). In the future we might want to replace that
with a better way. Once we have the gfx name, we *should* be able to
infer the specific chip, and from the chip we can easily infer the
microarchitecture.

This commit also includes some refactorings and code improvements on
the HSA backend.
2025-10-15 08:23:28 +02:00
Dr-Noob
b29b17d14f [v0.30] Add support for AMD GPUs
Adds very basic support for AMD (experimental). The only install
requirement is ROCm. Unlike NVIDIA, we don't need the CUDA equivalent
(HIP) to make gpufetch work, which reduces the installation
requirements quite significantly.

Major changes:

* CMakeLists:
  - Make CUDA not compiled by default (since we now may want to target
    AMD only)
  - Set build flags on gpufetch cmake target instead of doing
    "set(CMAKE_CXX_FLAGS". This fixes a warning coming from ROCm.
  - Assumes that the ROCm CMake files are installed (should be fixed
    later)

* hsa folder: AMD support is implemented via HSA (Heterogeneous System
  Architecture) calls. Therefore, HSA is added as a new backend to
  gpufetch. We only print basic stuff for now, so we may need more
  things in the future to give full support for AMD GPUs.

NOTE: This commit will probably break AUR packages since we used to
build CUDA by default, which is no longer the case. The AUR package
should be updated and use -DENABLE_CUDA_BACKEND or -DENABLE_HSA_BACKEND
as appropriate.
2025-10-12 12:34:56 +02:00
Dr-Noob
57caadf530 [v0.25] Add Intel Whiskey Lake SoC (#42) 2023-10-20 07:59:07 +01:00
Dr-Noob
ed35cb872b [v0.25] Leave cuda/intel backend to decide how to report PCI vendor failure 2023-03-31 16:16:46 +02:00
Dr-Noob
3d36852f9d [v0.25] Fix for PCI class 0302 can also be responsible for GPUs (like in AWS) 2023-03-31 16:12:22 +02:00
Dr-Noob
fb0109d327 [v0.25] PCI class 0302 can also be responsible for GPUs 2023-03-31 16:08:59 +02:00
Dr-Noob
68619aa03e [v0.25] Avoid segfault when the pci vendor is not found 2023-03-31 15:50:37 +02:00
Dr-Noob
a4006db616 [v0.25] Remove warning notice 2022-12-03 18:06:36 +01:00
Dr-Noob
774550307c [v0.25] Add option to print all GPUs as requested in #33 2022-12-03 18:04:50 +01:00
Dr-Noob
06dc50b6a5 [v0.25] Updated cuda_helper to support latest GPUs 2022-12-03 16:39:18 +00:00
Dr-Noob
9837236c7e [v0.25] Fixed some details in README and build.sh 2022-12-03 14:46:48 +00:00
Dr-Noob
a6f0c18fcb [v0.25] Add missing Ampere GPU chips and new uarchs: ada and hopper 2022-10-25 20:13:29 +02:00
Dr-Noob
94490b3f38 [v0.24] Fix typo in error message (thanks #22 and #28) 2022-10-25 19:41:46 +02:00
Dr-Noob
5faac7a756 [v0.24] Update PCI ids to pciutils/pciids@06c4c9a 2022-10-25 19:30:24 +02:00
Dr-Noob
8c62e9ebaf [v0.24] Added generic KBL UHD Graphics. Should fix #19 2022-07-13 13:27:22 +02:00
Dr-Noob
4d948eb80a [v0.24] Remove CUDA driver initialization message before printing any other message 2022-05-21 23:19:03 +02:00
Dr-Noob
cf96628385 [v0.24] Fix topology for currently supported ALD iGPUs 2022-05-14 20:25:08 +02:00
Dr-Noob
5bf35ee6d7 [v0.24] Make sure we have valid data before reporting peakperf in Intel 2022-05-14 13:12:19 +02:00
Dr-Noob
fea985d08c [v0.24] Add first support for Alder Lake iGPUs. Needs more work to check data properly 2022-05-14 13:01:34 +02:00
Dr-Noob
24f20d0901 [v0.24] Small fixes; improve PCI report when no GPU is found, speedup invalid GPU idx detection 2022-05-14 12:00:23 +02:00
Dr-Noob
c4ad2bd4f8 [v0.24] Merge bugfix branch 2022-04-17 14:04:19 +02:00
Dr-Noob
af52d2850c [v0.24] Remove cuda-samples dependency 2022-04-17 13:55:05 +02:00
Dr-Noob
6f196c1797 [v0.23] Fix FreeBSD compilation issues as reported by #13 2022-04-10 16:52:42 +01:00
Dr-Noob
312d78b7f1 [v0.23] Fix dummy warning in intel uarch 2022-04-10 16:11:59 +01:00
Dr-Noob
ebad29e044 [v0.23] Fix CMake to find CUDA Samples in CUDA >= 11.6 2022-03-12 11:04:09 +01:00
Dr-Noob
59df3e53ec [v0.23] Fix README text. It is written following a C style, but actually written in C++ because of CUDA 2022-01-23 10:57:02 +01:00
Dr-Noob
d120f9a1cd [v0.23] Add --logo-short/long. Closes #11 2022-01-23 10:55:26 +01:00
29 changed files with 1524 additions and 240 deletions

1
.gitignore vendored
View File

@@ -1 +1,2 @@
gpufetch gpufetch
build/

View File

@@ -7,17 +7,19 @@ project(gpufetch CXX)
set(SRC_DIR "src") set(SRC_DIR "src")
set(COMMON_DIR "${SRC_DIR}/common") set(COMMON_DIR "${SRC_DIR}/common")
set(CUDA_DIR "${SRC_DIR}/cuda") set(CUDA_DIR "${SRC_DIR}/cuda")
set(HSA_DIR "${SRC_DIR}/hsa")
set(INTEL_DIR "${SRC_DIR}/intel") set(INTEL_DIR "${SRC_DIR}/intel")
if(NOT DEFINED ENABLE_INTEL_BACKEND) # Make sure that at least one backend is enabled.
set(ENABLE_INTEL_BACKEND true) # It does not make sense that the user has not specified any backend.
if(NOT ENABLE_INTEL_BACKEND AND NOT ENABLE_CUDA_BACKEND AND NOT ENABLE_HSA_BACKEND)
message(FATAL_ERROR "No backend was enabled! Please enable at least one backend with -DENABLE_XXX_BACKEND")
endif() endif()
if(NOT DEFINED ENABLE_CUDA_BACKEND OR ENABLE_CUDA_BACKEND) if(ENABLE_CUDA_BACKEND)
check_language(CUDA) check_language(CUDA)
if(CMAKE_CUDA_COMPILER) if(CMAKE_CUDA_COMPILER)
enable_language(CUDA) enable_language(CUDA)
set(ENABLE_CUDA_BACKEND true)
# Must link_directories early so add_executable(gpufetch ...) gets the right directories # Must link_directories early so add_executable(gpufetch ...) gets the right directories
link_directories(cuda_backend ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}/targets/x86_64-linux/lib) link_directories(cuda_backend ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}/targets/x86_64-linux/lib)
else() else()
@@ -25,7 +27,70 @@ if(NOT DEFINED ENABLE_CUDA_BACKEND OR ENABLE_CUDA_BACKEND)
endif() endif()
endif() endif()
if(ENABLE_HSA_BACKEND)
find_package(ROCmCMakeBuildTools QUIET)
if (ROCmCMakeBuildTools_FOUND)
find_package(hsa-runtime64 1.0 REQUIRED)
link_directories(hsa_backend hsa-runtime64::hsa-runtime64)
# Find HSA headers
# ROCm does not seem to provide this, which is quite frustrating.
find_path(HSA_INCLUDE_DIR
NAMES hsa/hsa.h
HINTS
$ENV{ROCM_PATH}/include # allow users override via env variable
/opt/rocm/include # common default path
/usr/include
/usr/local/include
)
if(NOT HSA_INCLUDE_DIR)
message(STATUS "${BoldYellow}HSA not found, disabling HSA backend${ColorReset}")
set(ENABLE_HSA_BACKEND false)
endif()
else()
# rocm-cmake is not installed, try to manually find neccesary files.
message(STATUS "${BoldYellow}Could NOT find HSA automatically, running manual search...${ColorReset}")
if (NOT DEFINED ROCM_PATH)
set(ROCM_PATH "/opt/rocm" CACHE PATH "Path to ROCm")
endif()
find_path(HSA_INCLUDE_DIR hsa/hsa.h HINTS ${ROCM_PATH}/include)
find_library(HSA_LIBRARY hsa-runtime64 HINTS ${ROCM_PATH}/lib ${ROCM_PATH}/lib64)
if (HSA_INCLUDE_DIR AND HSA_LIBRARY)
message(STATUS "${BoldYellow}HSA was found manually${ColorReset}")
else()
set(ENABLE_HSA_BACKEND false)
message(STATUS "${BoldYellow}HSA was not found manually${ColorReset}")
endif()
endif()
endif()
set(GPUFECH_COMMON
${COMMON_DIR}/main.cpp
${COMMON_DIR}/args.cpp
${COMMON_DIR}/gpu.cpp
${COMMON_DIR}/global.cpp
${COMMON_DIR}/printer.cpp
${COMMON_DIR}/master.cpp
${COMMON_DIR}/uarch.cpp
)
set(GPUFETCH_LINK_TARGETS z)
if(NOT(ENABLE_HSA_BACKEND AND NOT ENABLE_CUDA_BACKEND AND NOT ENABLE_INTEL_BACKEND))
# Look for pciutils only if not building HSA only.
#
# This has the (intented) secondary effect that if only HSA backend is enabled
# by the user, but ROCm cannot be found, pciutils will still be compiled in
# order to show the list of GPUs available on the system, so that the user will
# get at least some feedback even if HSA is not found.
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake") list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake")
list(APPEND GPUFECH_COMMON ${COMMON_DIR}/pci.cpp ${COMMON_DIR}/sort.cpp)
list(APPEND GPUFETCH_LINK_TARGETS pci)
set(CMAKE_ENABLE_PCIUTILS ON)
find_package(PCIUTILS) find_package(PCIUTILS)
if(NOT ${PCIUTILS_FOUND}) if(NOT ${PCIUTILS_FOUND})
message(STATUS "${BoldYellow}pciutils not found, downloading and building a local copy...${ColorReset}") message(STATUS "${BoldYellow}pciutils not found, downloading and building a local copy...${ColorReset}")
@@ -45,11 +110,19 @@ if(NOT ${PCIUTILS_FOUND})
else() else()
include_directories(${PCIUTILS_INCLUDE_DIR}) include_directories(${PCIUTILS_INCLUDE_DIR})
link_libraries(${PCIUTILS_LIBRARIES}) link_libraries(${PCIUTILS_LIBRARIES})
# Needed for linking libpci in FreeBSD
link_directories(/usr/local/lib/)
endif()
endif() endif()
add_executable(gpufetch ${COMMON_DIR}/main.cpp ${COMMON_DIR}/args.cpp ${COMMON_DIR}/gpu.cpp ${COMMON_DIR}/pci.cpp ${COMMON_DIR}/sort.cpp ${COMMON_DIR}/global.cpp ${COMMON_DIR}/printer.cpp ${COMMON_DIR}/master.cpp ${COMMON_DIR}/uarch.cpp) add_executable(gpufetch ${GPUFECH_COMMON})
set(SANITY_FLAGS "-Wfloat-equal -Wshadow -Wpointer-arith") set(SANITY_FLAGS -Wfloat-equal -Wshadow -Wpointer-arith -Wall -Wextra -pedantic -fstack-protector-all -pedantic)
set(CMAKE_CXX_FLAGS "${SANITY_FLAGS} -Wall -Wextra -pedantic -fstack-protector-all -pedantic -std=c++11") target_compile_features(gpufetch PRIVATE cxx_std_11)
target_compile_options(gpufetch PRIVATE ${SANITY_FLAGS})
if (CMAKE_ENABLE_PCIUTILS)
target_compile_definitions(gpufetch PUBLIC BACKEND_USE_PCI)
endif()
if(ENABLE_INTEL_BACKEND) if(ENABLE_INTEL_BACKEND)
target_compile_definitions(gpufetch PUBLIC BACKEND_INTEL) target_compile_definitions(gpufetch PUBLIC BACKEND_INTEL)
@@ -68,8 +141,10 @@ if(ENABLE_CUDA_BACKEND)
# https://en.wikipedia.org/w/index.php?title=CUDA&section=5#GPUs_supported # https://en.wikipedia.org/w/index.php?title=CUDA&section=5#GPUs_supported
# https://raw.githubusercontent.com/PointCloudLibrary/pcl/master/cmake/pcl_find_cuda.cmake # https://raw.githubusercontent.com/PointCloudLibrary/pcl/master/cmake/pcl_find_cuda.cmake
if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "11.0") if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "11.1")
set(CMAKE_CUDA_ARCHITECTURES 35 37 50 52 53 60 61 62 70 72 75 80 86) set(CMAKE_CUDA_ARCHITECTURES 35 37 50 52 53 60 61 62 70 72 75 80 86)
elseif(${CMAKE_CUDA_COMPILER_VERSION} EQUAL "11.0")
set(CMAKE_CUDA_ARCHITECTURES 30 32 35 37 50 52 53 60 61 62 70 72 75 80)
elseif(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "10.0") elseif(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "10.0")
set(CMAKE_CUDA_ARCHITECTURES 30 32 35 37 50 52 53 60 61 62 70 72 75) set(CMAKE_CUDA_ARCHITECTURES 30 32 35 37 50 52 53 60 61 62 70 72 75)
elseif(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "9.0") elseif(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "9.0")
@@ -84,13 +159,33 @@ if(ENABLE_CUDA_BACKEND)
add_dependencies(cuda_backend pciutils) add_dependencies(cuda_backend pciutils)
endif() endif()
target_include_directories(cuda_backend PUBLIC ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}/samples/common/inc ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}/targets/x86_64-linux/include) target_include_directories(cuda_backend PUBLIC ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}/targets/x86_64-linux/include)
target_link_libraries(cuda_backend PRIVATE cudart) target_link_libraries(cuda_backend PRIVATE cudart)
target_link_libraries(gpufetch cuda_backend) target_link_libraries(gpufetch cuda_backend)
endif() endif()
target_link_libraries(gpufetch pci z) if(ENABLE_HSA_BACKEND)
target_compile_definitions(gpufetch PUBLIC BACKEND_HSA)
add_library(hsa_backend STATIC ${HSA_DIR}/hsa.cpp ${HSA_DIR}/uarch.cpp)
if(NOT ${PCIUTILS_FOUND})
add_dependencies(hsa_backend pciutils)
endif()
target_include_directories(hsa_backend PRIVATE "${HSA_INCLUDE_DIR}")
if (HSA_LIBRARY)
target_link_libraries(hsa_backend PRIVATE ${HSA_LIBRARY})
else()
target_link_libraries(hsa_backend PRIVATE hsa-runtime64::hsa-runtime64)
endif()
target_link_libraries(gpufetch hsa_backend)
endif()
target_link_libraries(gpufetch ${GPUFETCH_LINK_TARGETS})
install(TARGETS gpufetch DESTINATION bin) install(TARGETS gpufetch DESTINATION bin)
if(NOT WIN32) if(NOT WIN32)
@@ -111,6 +206,11 @@ if(ENABLE_CUDA_BACKEND)
else() else()
message(STATUS "CUDA backend: ${BoldRed}OFF${ColorReset}") message(STATUS "CUDA backend: ${BoldRed}OFF${ColorReset}")
endif() endif()
if(ENABLE_HSA_BACKEND)
message(STATUS "HSA backend: ${BoldGreen}ON${ColorReset}")
else()
message(STATUS "HSA backend: ${BoldRed}OFF${ColorReset}")
endif()
if(ENABLE_INTEL_BACKEND) if(ENABLE_INTEL_BACKEND)
message(STATUS "Intel backend: ${BoldGreen}ON${ColorReset}") message(STATUS "Intel backend: ${BoldGreen}ON${ColorReset}")
else() else()

View File

@@ -20,7 +20,7 @@
<p align="center"> </p> <p align="center"> </p>
<p align="center"> <p align="center">
gpufetch is a command-line tool written in C that displays the GPU information in a clean and beautiful way gpufetch is a command-line tool written in C++ that displays the GPU information in a clean and beautiful way
</p> </p>
<p align="center"> <p align="center">
@@ -33,6 +33,7 @@ gpufetch is a command-line tool written in C that displays the GPU information i
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Table of contents](#table-of-contents)
- [1. Support](#1-support) - [1. Support](#1-support)
- [2. Backends](#2-backends) - [2. Backends](#2-backends)
- [2.1 CUDA backend is not enabled. Why?](#21-cuda-backend-is-not-enabled-why) - [2.1 CUDA backend is not enabled. Why?](#21-cuda-backend-is-not-enabled-why)
@@ -49,14 +50,16 @@ gpufetch is a command-line tool written in C that displays the GPU information i
gpufetch supports the following GPUs: gpufetch supports the following GPUs:
- **NVIDIA** GPUs (Compute Capability >= 2.0) - **NVIDIA** GPUs (Compute Capability >= 2.0)
- **AMD** GPUs (Experimental) (RDNA 3.0, CDNA 3.0)
- **Intel** iGPUs (Generation >= Gen6) - **Intel** iGPUs (Generation >= Gen6)
Only compilation under **Linux** is supported. Only compilation under **Linux** is supported.
## 2. Backends ## 2. Backends
gpufetch is made up of two backends: gpufetch is made up of three backends:
- CUDA backend - CUDA backend
- HSA backend
- Intel backend - Intel backend
Backends are enabled and disabled at **compile time**. When compiling gpufetch, check the CMake output to see which backends are enabled. Backends are enabled and disabled at **compile time**. When compiling gpufetch, check the CMake output to see which backends are enabled.
@@ -85,6 +88,7 @@ If there is a NVIDIA GPU or Intel iGPU in the system and the appropiate backend
You will need (mandatory): You will need (mandatory):
- C++ compiler (e.g, `g++`) - C++ compiler (e.g, `g++`)
- `zlib`
- `cmake` - `cmake`
- `make` - `make`
@@ -110,6 +114,7 @@ By default, `gpufetch` will print the GPU logo with the system color scheme. How
By specifying a name, gpufetch will use the specific colors of each manufacture. Valid values are: By specifying a name, gpufetch will use the specific colors of each manufacture. Valid values are:
- intel - intel
- amd
- nvidia - nvidia
``` ```

102
build.sh
View File

@@ -1,5 +1,24 @@
#!/bin/bash #!/bin/bash
print_help() {
cat << EOF
Usage: $0 <backends> [build_type]
<backends> MANDATORY. Comma-separated list of
backends to enable.
Valid options: hsa, intel, cuda
Example: hsa,cuda
[build_type] OPTIONAL. Build type. Valid options:
debug, release (default: release)
Examples:
$0 hsa,intel debug
$0 cuda
$0 hsa,intel,cuda release
EOF
}
# gpufetch build script # gpufetch build script
set -e set -e
@@ -7,26 +26,97 @@ rm -rf build/ gpufetch
mkdir build/ mkdir build/
cd build/ cd build/
if [ "$1" == "debug" ] if [ "$1" == "--help" ]
then then
BUILD_TYPE="Debug" echo "gpufetch build script"
else echo
BUILD_TYPE="Release" print_help
exit 0
fi fi
if [[ $# -lt 1 ]]; then
echo "ERROR: At least one backend must be specified."
echo
print_help
exit 1
fi
# Determine if last argument is build type
LAST_ARG="${!#}"
if [[ "$LAST_ARG" == "debug" || "$LAST_ARG" == "release" ]]; then
BUILD_TYPE="$LAST_ARG"
BACKEND_ARG="${1}"
else
BUILD_TYPE="release"
BACKEND_ARG="${1}"
fi
# Split comma-separated backends into an array
IFS=',' read -r -a BACKENDS <<< "$BACKEND_ARG"
# Validate build type
if [[ "$BUILD_TYPE" != "debug" && "$BUILD_TYPE" != "release" ]]
then
echo "Error: Invalid build type '$BUILD_TYPE'."
echo "Valid options are: debug, release"
exit 1
fi
# From lower to upper case
CMAKE_FLAGS="-DCMAKE_BUILD_TYPE=${BUILD_TYPE^}"
# Validate backends
VALID_BACKENDS=("hsa" "intel" "cuda")
for BACKEND in "${BACKENDS[@]}"; do
case "$BACKEND" in
hsa)
CMAKE_FLAGS+=" -DENABLE_HSA_BACKEND=ON"
;;
intel)
CMAKE_FLAGS+=" -DENABLE_INTEL_BACKEND=ON"
;;
cuda)
CMAKE_FLAGS+=" -DENABLE_CUDA_BACKEND=ON"
;;
*)
echo "ERROR: Invalid backend '$BACKEND'."
echo "Valid options: ${VALID_BACKENDS[*]}"
exit 1
;;
esac
done
# You can also manually specify the compilation flags.
# If you need to, just run the cmake command directly
# instead of using this script.
#
# Here you will find some help:
#
# In case you have CUDA installed but it is not detected, # In case you have CUDA installed but it is not detected,
# - set CMAKE_CUDA_COMPILER to your nvcc binary: # - set CMAKE_CUDA_COMPILER to your nvcc binary:
# - set CMAKE_CUDA_COMPILER_TOOLKIT_ROOT to the CUDA root dir # - set CMAKE_CUDA_COMPILER_TOOLKIT_ROOT to the CUDA root dir
# for example: # for example:
# cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_CUDA_COMPILER_TOOLKIT_ROOT=/usr/local/cuda/ .. # cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_CUDA_COMPILER_TOOLKIT_ROOT=/usr/local/cuda/ ..
#
# In case you want to explicitely disable a backend, you can: # In case you want to explicitely disable a backend, you can:
# Disable CUDA backend: # Disable CUDA backend:
# cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DENABLE_CUDA_BACKEND=OFF .. # cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DENABLE_CUDA_BACKEND=OFF ..
# Disable HSA backend:
# cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DENABLE_HSA_BACKEND=OFF ..
# Disable Intel backend: # Disable Intel backend:
# cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DENABLE_INTEL_BACKEND=OFF .. # cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DENABLE_INTEL_BACKEND=OFF ..
cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE .. echo "$0: Running cmake $CMAKE_FLAGS"
echo
cmake $CMAKE_FLAGS ..
os=$(uname)
if [ "$os" == 'Linux' ]; then
make -j$(nproc) make -j$(nproc)
elif [ "$os" == 'FreeBSD' ]; then
gmake -j4
fi
cd - cd -
ln -s build/gpufetch . ln -s build/gpufetch .

View File

@@ -13,12 +13,14 @@
#define NUM_COLORS 4 #define NUM_COLORS 4
#define COLOR_STR_NVIDIA "nvidia" #define COLOR_STR_NVIDIA "nvidia"
#define COLOR_STR_AMD "amd"
#define COLOR_STR_INTEL "intel" #define COLOR_STR_INTEL "intel"
// +-----------------------+-----------------------+ // +-----------------------+-----------------------+
// | Color logo | Color text | // | Color logo | Color text |
// | Color 1 | Color 2 | Color 1 | Color 2 | // | Color 1 | Color 2 | Color 1 | Color 2 |
#define COLOR_DEFAULT_NVIDIA "118,185,000:255,255,255:255,255,255:118,185,000" #define COLOR_DEFAULT_NVIDIA "118,185,000:255,255,255:255,255,255:118,185,000"
#define COLOR_DEFAULT_AMD "250,250,250:250,250,250:200,200,200:255,255,255"
#define COLOR_DEFAULT_INTEL "015,125,194:230,230,230:040,150,220:230,230,230" #define COLOR_DEFAULT_INTEL "015,125,194:230,230,230:040,150,220:230,230,230"
struct args_struct { struct args_struct {
@@ -26,6 +28,8 @@ struct args_struct {
bool verbose_flag; bool verbose_flag;
bool version_flag; bool version_flag;
bool list_gpus; bool list_gpus;
bool logo_long;
bool logo_short;
int gpu_idx; int gpu_idx;
STYLE style; STYLE style;
struct color** colors; struct color** colors;
@@ -38,6 +42,8 @@ const char args_chr[] = {
/* [ARG_COLOR] = */ 'c', /* [ARG_COLOR] = */ 'c',
/* [ARG_GPU] = */ 'g', /* [ARG_GPU] = */ 'g',
/* [ARG_LIST] = */ 'l', /* [ARG_LIST] = */ 'l',
/* [ARG_LOGO_LONG] = */ 1,
/* [ARG_LOGO_SHORT] = */ 2,
/* [ARG_HELP] = */ 'h', /* [ARG_HELP] = */ 'h',
/* [ARG_VERBOSE] = */ 'v', /* [ARG_VERBOSE] = */ 'v',
/* [ARG_VERSION] = */ 'V', /* [ARG_VERSION] = */ 'V',
@@ -47,6 +53,8 @@ const char *args_str[] = {
/* [ARG_COLOR] = */ "color", /* [ARG_COLOR] = */ "color",
/* [ARG_GPU] = */ "gpu", /* [ARG_GPU] = */ "gpu",
/* [ARG_LIST] = */ "list-gpus", /* [ARG_LIST] = */ "list-gpus",
/* [ARG_LOGO_LONG] = */ "logo-long",
/* [ARG_LOGO_SHORT] = */ "logo-short",
/* [ARG_HELP] = */ "help", /* [ARG_HELP] = */ "help",
/* [ARG_VERBOSE] = */ "verbose", /* [ARG_VERBOSE] = */ "verbose",
/* [ARG_VERSION] = */ "version", /* [ARG_VERSION] = */ "version",
@@ -111,6 +119,14 @@ bool list_gpus() {
return args.list_gpus; return args.list_gpus;
} }
bool show_logo_long() {
return args.logo_long;
}
bool show_logo_short() {
return args.logo_short;
}
bool show_version() { bool show_version() {
return args.version_flag; return args.version_flag;
} }
@@ -134,8 +150,9 @@ char* build_short_options() {
char* str = (char *) emalloc(sizeof(char) * (len*2 + 1)); char* str = (char *) emalloc(sizeof(char) * (len*2 + 1));
memset(str, 0, sizeof(char) * (len*2 + 1)); memset(str, 0, sizeof(char) * (len*2 + 1));
sprintf(str, "%c:%c:%c%c%c%c", c[ARG_GPU], sprintf(str, "%c:%c:%c%c%c%c%c%c", c[ARG_GPU],
c[ARG_COLOR], c[ARG_HELP], c[ARG_LIST], c[ARG_COLOR], c[ARG_HELP], c[ARG_LIST],
c[ARG_LOGO_SHORT], c[ARG_LOGO_LONG],
c[ARG_VERBOSE], c[ARG_VERSION]); c[ARG_VERBOSE], c[ARG_VERSION]);
return str; return str;
@@ -153,6 +170,7 @@ bool parse_color(char* optarg_str, struct color*** cs) {
bool free_ptr = true; bool free_ptr = true;
if(strcmp(optarg_str, COLOR_STR_NVIDIA) == 0) color_to_copy = COLOR_DEFAULT_NVIDIA; if(strcmp(optarg_str, COLOR_STR_NVIDIA) == 0) color_to_copy = COLOR_DEFAULT_NVIDIA;
else if(strcmp(optarg_str, COLOR_STR_AMD) == 0) color_to_copy = COLOR_DEFAULT_AMD;
else if(strcmp(optarg_str, COLOR_STR_INTEL) == 0) color_to_copy = COLOR_DEFAULT_INTEL; else if(strcmp(optarg_str, COLOR_STR_INTEL) == 0) color_to_copy = COLOR_DEFAULT_INTEL;
else { else {
str_to_parse = optarg_str; str_to_parse = optarg_str;
@@ -203,6 +221,8 @@ bool parse_args(int argc, char* argv[]) {
args.version_flag = false; args.version_flag = false;
args.help_flag = false; args.help_flag = false;
args.list_gpus = false; args.list_gpus = false;
args.logo_long = false;
args.logo_short = false;
args.gpu_idx = 0; args.gpu_idx = 0;
args.colors = NULL; args.colors = NULL;
@@ -210,6 +230,8 @@ bool parse_args(int argc, char* argv[]) {
{args_str[ARG_COLOR], required_argument, 0, args_chr[ARG_COLOR] }, {args_str[ARG_COLOR], required_argument, 0, args_chr[ARG_COLOR] },
{args_str[ARG_GPU], required_argument, 0, args_chr[ARG_GPU] }, {args_str[ARG_GPU], required_argument, 0, args_chr[ARG_GPU] },
{args_str[ARG_LIST], no_argument, 0, args_chr[ARG_LIST] }, {args_str[ARG_LIST], no_argument, 0, args_chr[ARG_LIST] },
{args_str[ARG_LOGO_SHORT], no_argument, 0, args_chr[ARG_LOGO_SHORT] },
{args_str[ARG_LOGO_LONG], no_argument, 0, args_chr[ARG_LOGO_LONG] },
{args_str[ARG_HELP], no_argument, 0, args_chr[ARG_HELP] }, {args_str[ARG_HELP], no_argument, 0, args_chr[ARG_HELP] },
{args_str[ARG_VERBOSE], no_argument, 0, args_chr[ARG_VERBOSE] }, {args_str[ARG_VERBOSE], no_argument, 0, args_chr[ARG_VERBOSE] },
{args_str[ARG_VERSION], no_argument, 0, args_chr[ARG_VERSION] }, {args_str[ARG_VERSION], no_argument, 0, args_chr[ARG_VERSION] },
@@ -227,16 +249,33 @@ bool parse_args(int argc, char* argv[]) {
} }
} }
else if(opt == args_chr[ARG_GPU]) { else if(opt == args_chr[ARG_GPU]) {
// Check for "a" option
if(strcmp(optarg, "a") == 0) {
args.gpu_idx = -1;
}
else {
args.gpu_idx = getarg_int(optarg); args.gpu_idx = getarg_int(optarg);
if(errn != 0) { if(errn != 0) {
printErr("Option %s: %s", args_str[ARG_GPU], getarg_error()); printErr("Option %s: %s", args_str[ARG_GPU], getarg_error());
args.help_flag = true; args.help_flag = true;
return false; return false;
} }
if(args.gpu_idx < 0) {
printErr("Specified GPU index is out of range: %d. ", args.gpu_idx);
printf("Run gpufetch with the --%s option to check out valid GPU indexes\n", args_str[ARG_LIST]);
return false;
}
}
} }
else if(opt == args_chr[ARG_LIST]) { else if(opt == args_chr[ARG_LIST]) {
args.list_gpus = true; args.list_gpus = true;
} }
else if(opt == args_chr[ARG_LOGO_SHORT]) {
args.logo_short = true;
}
else if(opt == args_chr[ARG_LOGO_LONG]) {
args.logo_long = true;
}
else if(opt == args_chr[ARG_HELP]) { else if(opt == args_chr[ARG_HELP]) {
args.help_flag = true; args.help_flag = true;
} }
@@ -260,6 +299,12 @@ bool parse_args(int argc, char* argv[]) {
args.help_flag = true; args.help_flag = true;
} }
if(args.logo_short && args.logo_long) {
printWarn("%s and %s cannot be specified together", args_str[ARG_LOGO_SHORT], args_str[ARG_LOGO_LONG]);
args.logo_short = false;
args.logo_long = false;
}
if((args.help_flag + args.version_flag) > 1) { if((args.help_flag + args.version_flag) > 1) {
printWarn("You should specify just one option"); printWarn("You should specify just one option");
args.help_flag = true; args.help_flag = true;

View File

@@ -1,6 +1,8 @@
#ifndef __ARGS__ #ifndef __ARGS__
#define __ARGS__ #define __ARGS__
#include <cstdint>
struct color { struct color {
int32_t R; int32_t R;
int32_t G; int32_t G;
@@ -20,6 +22,8 @@ enum {
ARG_COLOR, ARG_COLOR,
ARG_GPU, ARG_GPU,
ARG_LIST, ARG_LIST,
ARG_LOGO_LONG,
ARG_LOGO_SHORT,
ARG_HELP, ARG_HELP,
ARG_VERBOSE, ARG_VERBOSE,
ARG_VERSION ARG_VERSION
@@ -34,6 +38,8 @@ int max_arg_str_length();
bool parse_args(int argc, char* argv[]); bool parse_args(int argc, char* argv[]);
bool show_help(); bool show_help();
bool list_gpus(); bool list_gpus();
bool show_logo_long();
bool show_logo_short();
bool show_version(); bool show_version();
bool verbose_enabled(); bool verbose_enabled();
void free_colors_struct(struct color** cs); void free_colors_struct(struct color** cs);

View File

@@ -34,6 +34,23 @@ $C2## ## ## ## ## ## ## ## #: :# \
$C2## ## ## ## ## ## ## ## ####### \ $C2## ## ## ## ## ## ## ## ####### \
$C2## ## ### ## ###### ## ## ## " $C2## ## ### ## ###### ## ## ## "
#define ASCII_AMD \
"$C2 '############### \
$C2 ,############# \
$C2 .#### \
$C2 #. .#### \
$C2 :##. .#### \
$C2 :###. .#### \
$C2 #########. :## \
$C2 #######. ; \
$C1 \
$C1 ### ### ### ####### \
$C1 ## ## ##### ##### ## ## \
$C1 ## ## ### #### ### ## ## \
$C1 ######### ### ## ### ## ## \
$C1## ## ### ### ## ## \
$C1## ## ### ### ####### "
#define ASCII_INTEL \ #define ASCII_INTEL \
"$C1 .#################. \ "$C1 .#################. \
$C1 .#### ####. \ $C1 .#### ####. \
@@ -68,6 +85,27 @@ $C1 olcc::; ,:ccloMMMMMMMMM \
$C1 :......oMMMMMMMMMMMMMMMMMMMMMM \ $C1 :......oMMMMMMMMMMMMMMMMMMMMMM \
$C1 :lllMMMMMMMMMMMMMMMMMMMMMMMMMM " $C1 :lllMMMMMMMMMMMMMMMMMMMMMMMMMM "
#define ASCII_AMD_L \
"$C1 \
$C1 \
$C1 \
$C1 \
$C1 \
$C1 \
$C1 @@@@ @@@ @@@ @@@@@@@@ $C2 ############ \
$C1 @@@@@@ @@@@@ @@@@@ @@@ @@@ $C2 ########## \
$C1 @@@ @@@ @@@@@@@@@@@@@ @@@ @@ $C2 # ##### \
$C1 @@@ @@@ @@@ @@@ @@@ @@@ @@ $C2 ### ##### \
$C1 @@@@@@@@@@@@ @@@ @@@ @@@ @@@ $C2######### ### \
$C1 @@@ @@@ @@@ @@@ @@@@@@@@@ $C2######## ## \
$C1 \
$C1 \
$C1 \
$C1 \
$C1 \
$C1 \
$C1 "
#define ASCII_INTEL_L \ #define ASCII_INTEL_L \
"$C1 ###############@ \ "$C1 ###############@ \
$C1 ######@ ######@ \ $C1 ######@ ######@ \
@@ -95,9 +133,11 @@ typedef struct ascii_logo asciiL;
// | LOGO | W | H | REPLACE | COLORS LOGO | COLORS TEXT | // | LOGO | W | H | REPLACE | COLORS LOGO | COLORS TEXT |
// ------------------------------------------------------------------------------------------ // ------------------------------------------------------------------------------------------
asciiL logo_nvidia = { ASCII_NVIDIA, 45, 19, false, {C_FG_GREEN, C_FG_WHITE}, {C_FG_WHITE, C_FG_GREEN} }; asciiL logo_nvidia = { ASCII_NVIDIA, 45, 19, false, {C_FG_GREEN, C_FG_WHITE}, {C_FG_WHITE, C_FG_GREEN} };
asciiL logo_amd = { ASCII_AMD, 39, 15, false, {C_FG_WHITE, C_FG_GREEN}, {C_FG_WHITE, C_FG_GREEN} };
asciiL logo_intel = { ASCII_INTEL, 48, 14, false, {C_FG_CYAN}, {C_FG_CYAN, C_FG_WHITE} }; asciiL logo_intel = { ASCII_INTEL, 48, 14, false, {C_FG_CYAN}, {C_FG_CYAN, C_FG_WHITE} };
// Long variants | ---------------------------------------------------------------------------------------| // Long variants | ---------------------------------------------------------------------------------------|
asciiL logo_nvidia_l = { ASCII_NVIDIA_L, 50, 15, false, {C_FG_GREEN, C_FG_WHITE}, {C_FG_WHITE, C_FG_GREEN} }; asciiL logo_nvidia_l = { ASCII_NVIDIA_L, 50, 15, false, {C_FG_GREEN, C_FG_WHITE}, {C_FG_WHITE, C_FG_GREEN} };
asciiL logo_amd_l = { ASCII_AMD_L, 62, 19, true, {C_BG_WHITE, C_BG_WHITE}, {C_FG_CYAN, C_FG_B_WHITE} };
asciiL logo_intel_l = { ASCII_INTEL_L, 62, 19, true, {C_BG_CYAN, C_BG_WHITE}, {C_FG_CYAN, C_FG_WHITE} }; asciiL logo_intel_l = { ASCII_INTEL_L, 62, 19, true, {C_BG_CYAN, C_BG_WHITE}, {C_FG_CYAN, C_FG_WHITE} };
asciiL logo_unknown = { NULL, 0, 0, false, {C_NONE}, {C_NONE, C_NONE} }; asciiL logo_unknown = { NULL, 0, 0, false, {C_NONE}, {C_NONE, C_NONE} };

View File

@@ -101,6 +101,17 @@ char* get_str_bus_width(struct gpu_info* gpu) {
return string; return string;
} }
char* get_str_lds_size(struct gpu_info* gpu) {
// TODO: Show XX KB (XX MB Total) like in cpufetch
uint32_t size = 3+1+3+1;
assert(strlen(STRING_UNKNOWN)+1 <= size);
char* string = (char *) ecalloc(size, sizeof(char));
sprintf(string, "%d KB", gpu->mem->lds_size / 1024);
return string;
}
char* get_str_memory_clock(struct gpu_info* gpu) { char* get_str_memory_clock(struct gpu_info* gpu) {
return get_freq_as_str_mhz(gpu->mem->freq); return get_freq_as_str_mhz(gpu->mem->freq);
} }

View File

@@ -3,12 +3,11 @@
#include <cstdint> #include <cstdint>
#include "../cuda/pci.hpp"
#define UNKNOWN_FREQ -1 #define UNKNOWN_FREQ -1
enum { enum {
GPU_VENDOR_NVIDIA, GPU_VENDOR_NVIDIA,
GPU_VENDOR_AMD,
GPU_VENDOR_INTEL GPU_VENDOR_INTEL
}; };
@@ -44,6 +43,15 @@ struct topology_c {
int32_t tensor_cores; int32_t tensor_cores;
}; };
// HSA topology
struct topology_h {
int32_t compute_units;
int32_t num_shader_engines;
int32_t simds_per_cu;
int32_t num_xcc;
int32_t matrix_cores;
};
// Intel topology // Intel topology
struct topology_i { struct topology_i {
int32_t slices; int32_t slices;
@@ -57,6 +65,7 @@ struct memory {
int32_t bus_width; int32_t bus_width;
int32_t freq; int32_t freq;
int32_t clk_mul; // clock multiplier int32_t clk_mul; // clock multiplier
int32_t lds_size; // HSA specific for now
}; };
struct gpu_info { struct gpu_info {
@@ -72,6 +81,8 @@ struct gpu_info {
struct memory* mem; struct memory* mem;
struct cache* cach; struct cache* cach;
struct topology_c* topo_c; struct topology_c* topo_c;
// HSA specific
struct topology_h* topo_h;
// Intel specific // Intel specific
struct topology_i* topo_i; struct topology_i* topo_i;
}; };
@@ -82,6 +93,7 @@ char* get_str_freq(struct gpu_info* gpu);
char* get_str_memory_size(struct gpu_info* gpu); char* get_str_memory_size(struct gpu_info* gpu);
char* get_str_memory_type(struct gpu_info* gpu); char* get_str_memory_type(struct gpu_info* gpu);
char* get_str_bus_width(struct gpu_info* gpu); char* get_str_bus_width(struct gpu_info* gpu);
char* get_str_lds_size(struct gpu_info* gpu);
char* get_str_memory_clock(struct gpu_info* gpu); char* get_str_memory_clock(struct gpu_info* gpu);
char* get_str_l2(struct gpu_info* gpu); char* get_str_l2(struct gpu_info* gpu);
char* get_str_peak_performance(struct gpu_info* gpu); char* get_str_peak_performance(struct gpu_info* gpu);

View File

@@ -8,7 +8,11 @@
#include "../cuda/cuda.hpp" #include "../cuda/cuda.hpp"
#include "../cuda/uarch.hpp" #include "../cuda/uarch.hpp"
static const char* VERSION = "0.23"; #ifdef BACKEND_USE_PCI
#include "pci.hpp"
#endif
static const char* VERSION = "0.30";
void print_help(char *argv[]) { void print_help(char *argv[]) {
const char **t = args_str; const char **t = args_str;
@@ -21,7 +25,9 @@ void print_help(char *argv[]) {
printf("Options: \n"); printf("Options: \n");
printf(" -%c, --%s %*s Set the color scheme (by default, gpufetch uses the system color scheme) See COLORS section for a more detailed explanation\n", c[ARG_COLOR], t[ARG_COLOR], (int) (max_len-strlen(t[ARG_COLOR])), ""); printf(" -%c, --%s %*s Set the color scheme (by default, gpufetch uses the system color scheme) See COLORS section for a more detailed explanation\n", c[ARG_COLOR], t[ARG_COLOR], (int) (max_len-strlen(t[ARG_COLOR])), "");
printf(" -%c, --%s %*s List the available GPUs in the system\n", c[ARG_LIST], t[ARG_LIST], (int) (max_len-strlen(t[ARG_LIST])), ""); printf(" -%c, --%s %*s List the available GPUs in the system\n", c[ARG_LIST], t[ARG_LIST], (int) (max_len-strlen(t[ARG_LIST])), "");
printf(" -%c, --%s %*s Select the GPU to use (default: 0)\n", c[ARG_GPU], t[ARG_GPU], (int) (max_len-strlen(t[ARG_GPU])), ""); printf(" -%c, --%s %*s Select the GPU to print (default: 0). Use 'a' to print all GPUs\n", c[ARG_GPU], t[ARG_GPU], (int) (max_len-strlen(t[ARG_GPU])), "");
printf(" --%s %*s Show the short version of the logo\n", t[ARG_LOGO_SHORT], (int) (max_len-strlen(t[ARG_LOGO_SHORT])), "");
printf(" --%s %*s Show the long version of the logo\n", t[ARG_LOGO_LONG], (int) (max_len-strlen(t[ARG_LOGO_LONG])), "");
printf(" -%c, --%s %*s Enable verbose output\n", c[ARG_VERBOSE], t[ARG_VERBOSE], (int) (max_len-strlen(t[ARG_VERBOSE])), ""); printf(" -%c, --%s %*s Enable verbose output\n", c[ARG_VERBOSE], t[ARG_VERBOSE], (int) (max_len-strlen(t[ARG_VERBOSE])), "");
printf(" -%c, --%s %*s Print this help and exit\n", c[ARG_HELP], t[ARG_HELP], (int) (max_len-strlen(t[ARG_HELP])), ""); printf(" -%c, --%s %*s Print this help and exit\n", c[ARG_HELP], t[ARG_HELP], (int) (max_len-strlen(t[ARG_HELP])), "");
printf(" -%c, --%s %*s Print gpufetch version and exit\n", c[ARG_VERSION], t[ARG_VERSION], (int) (max_len-strlen(t[ARG_VERSION])), ""); printf(" -%c, --%s %*s Print gpufetch version and exit\n", c[ARG_VERSION], t[ARG_VERSION], (int) (max_len-strlen(t[ARG_VERSION])), "");
@@ -69,14 +75,20 @@ int main(int argc, char* argv[]) {
set_log_level(verbose_enabled()); set_log_level(verbose_enabled());
int idx = get_gpu_idx();
struct gpu_list* list = get_gpu_list(); struct gpu_list* list = get_gpu_list();
if(list_gpus()) { if(list_gpus()) {
return print_gpus_list(list); return print_gpus_list(list);
} }
if(get_num_gpus_available(list) == 0) { if(get_num_gpus_available(list) == 0) {
#ifdef BACKEND_USE_PCI
printErr("No GPU was detected! Available GPUs are:"); printErr("No GPU was detected! Available GPUs are:");
print_gpus_list_pci(); print_gpus_list_pci();
#else
printErr("No GPU was detected!");
#endif
printf("Please, make sure that the appropiate backend is enabled:\n"); printf("Please, make sure that the appropiate backend is enabled:\n");
print_enabled_backends(); print_enabled_backends();
printf("Visit https://github.com/Dr-Noob/gpufetch#2-backends for more information\n"); printf("Visit https://github.com/Dr-Noob/gpufetch#2-backends for more information\n");
@@ -84,17 +96,27 @@ int main(int argc, char* argv[]) {
return EXIT_FAILURE; return EXIT_FAILURE;
} }
struct gpu_info* gpu = get_gpu_info(list, get_gpu_idx()); int first_idx, last_idx;
if(gpu == NULL) if(idx == -1) {
return EXIT_FAILURE; first_idx = 0;
last_idx = get_num_gpus_available(list);
}
else {
first_idx = idx;
last_idx = idx+1;
}
printf("[NOTE]: gpufetch is in beta. The provided information may be incomplete or wrong.\n\ struct gpu_info* gpu = NULL;
If you want to help to improve gpufetch, please compare the output of the program\n\ for(int gpu_idx = first_idx; gpu_idx < last_idx; gpu_idx++) {
with a reliable source which you know is right (e.g, techpowerup.com) and report\n\ gpu = get_gpu_info(list, gpu_idx);
any inconsistencies to https://github.com/Dr-Noob/gpufetch/issues\n"); if(gpu == NULL) {
if(print_gpufetch(gpu, get_style(), get_colors()))
return EXIT_SUCCESS;
else
return EXIT_FAILURE; return EXIT_FAILURE;
} }
if(!print_gpufetch(gpu, get_style(), get_colors())) {
return EXIT_FAILURE;
}
}
return EXIT_SUCCESS;
}

View File

@@ -1,11 +1,16 @@
#include <cstdlib> #include <cstdlib>
#include <cstdio> #include <cstdio>
#ifdef BACKEND_USE_PCI
#include "pci.hpp" #include "pci.hpp"
#endif
#include "global.hpp" #include "global.hpp"
#include "colors.hpp" #include "colors.hpp"
#include "master.hpp" #include "master.hpp"
#include "args.hpp"
#include "../cuda/cuda.hpp" #include "../cuda/cuda.hpp"
#include "../hsa/hsa.hpp"
#include "../intel/intel.hpp" #include "../intel/intel.hpp"
#define MAX_GPUS 1000 #define MAX_GPUS 1000
@@ -17,7 +22,9 @@ struct gpu_list {
struct gpu_list* get_gpu_list() { struct gpu_list* get_gpu_list() {
int idx = 0; int idx = 0;
#ifdef BACKEND_USE_PCI
struct pci_dev *devices = get_pci_devices_from_pciutils(); struct pci_dev *devices = get_pci_devices_from_pciutils();
#endif
struct gpu_list* list = (struct gpu_list*) malloc(sizeof(struct gpu_list)); struct gpu_list* list = (struct gpu_list*) malloc(sizeof(struct gpu_list));
list->num_gpus = 0; list->num_gpus = 0;
list->gpus = (struct gpu_info**) malloc(sizeof(struct info*) * MAX_GPUS); list->gpus = (struct gpu_info**) malloc(sizeof(struct info*) * MAX_GPUS);
@@ -34,6 +41,18 @@ struct gpu_list* get_gpu_list() {
list->num_gpus += idx; list->num_gpus += idx;
#endif #endif
#ifdef BACKEND_HSA
bool valid = true;
while(valid) {
list->gpus[idx] = get_gpu_info_hsa(idx);
if(list->gpus[idx] != NULL) idx++;
else valid = false;
}
list->num_gpus += idx;
#endif
#ifdef BACKEND_INTEL #ifdef BACKEND_INTEL
list->gpus[idx] = get_gpu_info_intel(devices); list->gpus[idx] = get_gpu_info_intel(devices);
if(list->gpus[idx] != NULL) list->num_gpus++; if(list->gpus[idx] != NULL) list->num_gpus++;
@@ -50,6 +69,11 @@ bool print_gpus_list(struct gpu_list* list) {
print_gpu_cuda(list->gpus[i]); print_gpu_cuda(list->gpus[i]);
#endif #endif
} }
else if(list->gpus[i]->vendor == GPU_VENDOR_AMD) {
#ifdef BACKEND_AMD
print_gpu_hsa(list->gpus[i]);
#endif
}
else if(list->gpus[i]->vendor == GPU_VENDOR_INTEL) { else if(list->gpus[i]->vendor == GPU_VENDOR_INTEL) {
#ifdef BACKEND_INTEL #ifdef BACKEND_INTEL
print_gpu_intel(list->gpus[i]); print_gpu_intel(list->gpus[i]);
@@ -68,6 +92,13 @@ void print_enabled_backends() {
printf("%sOFF%s\n", C_FG_RED, C_RESET); printf("%sOFF%s\n", C_FG_RED, C_RESET);
#endif #endif
printf("- HSA backend: ");
#ifdef BACKEND_HSA
printf("%sON%s\n", C_FG_GREEN, C_RESET);
#else
printf("%sOFF%s\n", C_FG_RED, C_RESET);
#endif
printf("- Intel backend: "); printf("- Intel backend: ");
#ifdef BACKEND_INTEL #ifdef BACKEND_INTEL
printf("%sON%s\n", C_FG_GREEN, C_RESET); printf("%sON%s\n", C_FG_GREEN, C_RESET);
@@ -83,6 +114,7 @@ int get_num_gpus_available(struct gpu_list* list) {
struct gpu_info* get_gpu_info(struct gpu_list* list, int idx) { struct gpu_info* get_gpu_info(struct gpu_list* list, int idx) {
if(idx >= list->num_gpus || idx < 0) { if(idx >= list->num_gpus || idx < 0) {
printErr("Specified GPU index is out of range: %d", idx); printErr("Specified GPU index is out of range: %d", idx);
printf("Run gpufetch with the --%s option to check out valid GPU indexes\n", args_str[ARG_LIST]);
return NULL; return NULL;
} }
return list->gpus[idx]; return list->gpus[idx];

View File

@@ -7,9 +7,11 @@
#include <cstdio> #include <cstdio>
#include <cstddef> #include <cstddef>
// https://pci-ids.ucw.cz/read/PD
// TODO: Move AMD PCI id when possible // TODO: Move AMD PCI id when possible
#define PCI_VENDOR_ID_AMD 0x1002 #define PCI_VENDOR_ID_AMD 0x1002
#define CLASS_VGA_CONTROLLER 0x0300 #define CLASS_VGA_CONTROLLER 0x0300
#define CLASS_3D_CONTROLLER 0x0302
void debug_devices(struct pci_dev *devices) { void debug_devices(struct pci_dev *devices) {
int idx = 0; int idx = 0;
@@ -21,12 +23,11 @@ void debug_devices(struct pci_dev *devices) {
bool pciutils_is_vendor_id_present(struct pci_dev *devices, int id) { bool pciutils_is_vendor_id_present(struct pci_dev *devices, int id) {
for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) { for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) {
if(dev->vendor_id == id && dev->device_class == CLASS_VGA_CONTROLLER) { if(dev->vendor_id == id && (dev->device_class == CLASS_VGA_CONTROLLER || dev->device_class == CLASS_3D_CONTROLLER)) {
return true; return true;
} }
} }
printWarn("Unable to find a valid device for vendor id 0x%.4X using pciutils", id);
return false; return false;
} }
@@ -34,7 +35,7 @@ uint16_t pciutils_get_pci_device_id(struct pci_dev *devices, int id, int idx) {
int curr = 0; int curr = 0;
for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) { for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) {
if(dev->vendor_id == id && dev->device_class == CLASS_VGA_CONTROLLER) { if(dev->vendor_id == id && (dev->device_class == CLASS_VGA_CONTROLLER || dev->device_class == CLASS_3D_CONTROLLER)) {
if(curr == idx) { if(curr == idx) {
return dev->device_id; return dev->device_id;
} }
@@ -50,7 +51,7 @@ void pciutils_set_pci_bus(struct pci* pci, struct pci_dev *devices, int id) {
bool found = false; bool found = false;
for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) { for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) {
if(dev->vendor_id == id && dev->device_class == CLASS_VGA_CONTROLLER) { if(dev->vendor_id == id && (dev->device_class == CLASS_VGA_CONTROLLER || dev->device_class == CLASS_3D_CONTROLLER)) {
pci->domain = dev->domain; pci->domain = dev->domain;
pci->bus = dev->bus; pci->bus = dev->bus;
pci->dev = dev->dev; pci->dev = dev->dev;
@@ -99,8 +100,9 @@ void print_gpus_list_pci() {
struct pci_dev *devices = get_pci_devices_from_pciutils(); struct pci_dev *devices = get_pci_devices_from_pciutils();
for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) { for(struct pci_dev *dev=devices; dev != NULL; dev=dev->next) {
if(dev->device_class == CLASS_VGA_CONTROLLER) { if(dev->device_class == CLASS_VGA_CONTROLLER || dev->device_class == CLASS_3D_CONTROLLER) {
printf("- GPU %d: ", i); printf("- GPU %d:\n", i);
printf(" * Vendor: ");
if(dev->vendor_id == PCI_VENDOR_ID_NVIDIA) { if(dev->vendor_id == PCI_VENDOR_ID_NVIDIA) {
printf("NVIDIA"); printf("NVIDIA");
} }
@@ -110,7 +112,11 @@ void print_gpus_list_pci() {
else if(dev->vendor_id == PCI_VENDOR_ID_AMD) { else if(dev->vendor_id == PCI_VENDOR_ID_AMD) {
printf("AMD"); printf("AMD");
} }
printf("%.4x:%.4x\n", dev->vendor_id, dev->device_id); else {
printf("Unknown");
}
printf("\n * PCI id: %.4x:%.4x\n", dev->vendor_id, dev->device_id);
i++;
} }
} }
} }

View File

@@ -10,6 +10,8 @@
#include "../intel/uarch.hpp" #include "../intel/uarch.hpp"
#include "../intel/intel.hpp" #include "../intel/intel.hpp"
#include "../hsa/hsa.hpp"
#include "../hsa/uarch.hpp"
#include "../cuda/cuda.hpp" #include "../cuda/cuda.hpp"
#include "../cuda/uarch.hpp" #include "../cuda/uarch.hpp"
@@ -30,64 +32,60 @@
#define MAX_ATTRIBUTES 100 #define MAX_ATTRIBUTES 100
#define MAX_TERM_SIZE 1024 #define MAX_TERM_SIZE 1024
typedef struct {
int id;
const char *name;
const char *shortname;
} AttributeField;
// AttributeField IDs
// Used by
enum { enum {
ATTRIBUTE_NAME, ATTRIBUTE_NAME, // ALL
ATTRIBUTE_CHIP, ATTRIBUTE_CHIP, // ALL
ATTRIBUTE_UARCH, ATTRIBUTE_UARCH, // ALL
ATTRIBUTE_TECHNOLOGY, ATTRIBUTE_TECHNOLOGY, // ALL
ATTRIBUTE_GT, ATTRIBUTE_FREQUENCY, // ALL
ATTRIBUTE_FREQUENCY, ATTRIBUTE_PEAK, // ALL
ATTRIBUTE_STREAMINGMP, ATTRIBUTE_COMPUTE_UNITS, // HSA
ATTRIBUTE_CORESPERMP, ATTRIBUTE_MATRIX_CORES, // HSA
ATTRIBUTE_CUDA_CORES, ATTRIBUTE_XCDS, // HSA
ATTRIBUTE_TENSOR_CORES, ATTRIBUTE_LDS_SIZE, // HSA
ATTRIBUTE_EUS, ATTRIBUTE_STREAMINGMP, // CUDA
ATTRIBUTE_L2, ATTRIBUTE_CORESPERMP, // CUDA
ATTRIBUTE_MEMORY, ATTRIBUTE_CUDA_CORES, // CUDA
ATTRIBUTE_MEMORY_FREQ, ATTRIBUTE_TENSOR_CORES, // CUDA
ATTRIBUTE_BUS_WIDTH, ATTRIBUTE_L2, // CUDA
ATTRIBUTE_PEAK, ATTRIBUTE_MEMORY, // CUDA,HSA
ATTRIBUTE_PEAK_TENSOR, ATTRIBUTE_MEMORY_FREQ, // CUDA
ATTRIBUTE_BUS_WIDTH, // CUDA,HSA
ATTRIBUTE_PEAK_TENSOR, // CUDA
ATTRIBUTE_EUS, // Intel
ATTRIBUTE_GT, // Intel
}; };
static const char* ATTRIBUTE_FIELDS [] = { static const AttributeField ATTRIBUTE_INFO[] = {
"Name:", { ATTRIBUTE_NAME, "Name:", "Name:" },
"GPU processor:", { ATTRIBUTE_CHIP, "GPU processor:", "Processor:" },
"Microarchitecture:", { ATTRIBUTE_UARCH, "Microarchitecture:", "uArch:" },
"Technology:", { ATTRIBUTE_TECHNOLOGY, "Technology:", "Technology:" },
"Graphics Tier:", { ATTRIBUTE_FREQUENCY, "Max Frequency:", "Max Freq.:" },
"Max Frequency:", { ATTRIBUTE_PEAK, "Peak Performance:", "Peak Perf.:" },
"SMs:", { ATTRIBUTE_COMPUTE_UNITS, "Compute Units (CUs):", "CUs" },
"Cores/SM:", { ATTRIBUTE_MATRIX_CORES, "Matrix Cores:", "Matrix Cores:" },
"CUDA Cores:", { ATTRIBUTE_XCDS, "XCDs:", "XCDs" },
"Tensor Cores:", { ATTRIBUTE_LDS_SIZE, "LDS size:", "LDS:" },
"Execution Units:", { ATTRIBUTE_STREAMINGMP, "SMs:", "SMs:" },
"L2 Size:", { ATTRIBUTE_CORESPERMP, "Cores/SM:", "Cores/SM:" },
"Memory:", { ATTRIBUTE_CUDA_CORES, "CUDA Cores:", "CUDA Cores:" },
"Memory frequency:", { ATTRIBUTE_TENSOR_CORES, "Tensor Cores:", "Tensor Cores:" },
"Bus width:", { ATTRIBUTE_L2, "L2 Size:", "L2 Size:" },
"Peak Performance:", { ATTRIBUTE_MEMORY, "Memory:", "Memory:" },
"Peak Performance (MMA):", { ATTRIBUTE_MEMORY_FREQ, "Memory frequency:", "Memory freq.:" },
}; { ATTRIBUTE_BUS_WIDTH, "Bus width:", "Bus width:" },
{ ATTRIBUTE_PEAK_TENSOR, "Peak Performance (MMA):", "Peak Perf.(MMA):" },
static const char* ATTRIBUTE_FIELDS_SHORT [] = { { ATTRIBUTE_EUS, "Execution Units:", "EUs:" },
"Name:", { ATTRIBUTE_GT, "Graphics Tier:", "GT:" },
"Processor:",
"uArch:",
"Technology:",
"GT:",
"Max Freq.:",
"SMs:",
"Cores/SM:",
"CUDA Cores:",
"Tensor Cores:",
"EUs:",
"L2 Size:",
"Memory:",
"Memory freq.:",
"Bus width:",
"Peak Perf.:",
"Peak Perf.(MMA):",
}; };
struct terminal { struct terminal {
@@ -205,8 +203,6 @@ bool ascii_fits_screen(int termw, struct ascii_logo logo, int lf) {
void replace_bgbyfg_color(struct ascii_logo* logo) { void replace_bgbyfg_color(struct ascii_logo* logo) {
// Replace background by foreground color // Replace background by foreground color
for(int i=0; i < 2; i++) { for(int i=0; i < 2; i++) {
if(logo->color_ascii[i] == NULL) break;
if(strcmp(logo->color_ascii[i], C_BG_BLACK) == 0) strcpy(logo->color_ascii[i], C_FG_BLACK); if(strcmp(logo->color_ascii[i], C_BG_BLACK) == 0) strcpy(logo->color_ascii[i], C_FG_BLACK);
else if(strcmp(logo->color_ascii[i], C_BG_RED) == 0) strcpy(logo->color_ascii[i], C_FG_RED); else if(strcmp(logo->color_ascii[i], C_BG_RED) == 0) strcpy(logo->color_ascii[i], C_FG_RED);
else if(strcmp(logo->color_ascii[i], C_BG_GREEN) == 0) strcpy(logo->color_ascii[i], C_FG_GREEN); else if(strcmp(logo->color_ascii[i], C_BG_GREEN) == 0) strcpy(logo->color_ascii[i], C_FG_GREEN);
@@ -219,6 +215,8 @@ void replace_bgbyfg_color(struct ascii_logo* logo) {
} }
struct ascii_logo* choose_ascii_art_aux(struct ascii_logo* logo_long, struct ascii_logo* logo_short, struct terminal* term, int lf) { struct ascii_logo* choose_ascii_art_aux(struct ascii_logo* logo_long, struct ascii_logo* logo_short, struct terminal* term, int lf) {
if(show_logo_long()) return logo_long;
if(show_logo_short()) return logo_short;
if(ascii_fits_screen(term->w, *logo_long, lf)) { if(ascii_fits_screen(term->w, *logo_long, lf)) {
return logo_long; return logo_long;
} }
@@ -231,6 +229,9 @@ void choose_ascii_art(struct ascii* art, struct color** cs, struct terminal* ter
if(art->vendor == GPU_VENDOR_NVIDIA) { if(art->vendor == GPU_VENDOR_NVIDIA) {
art->art = choose_ascii_art_aux(&logo_nvidia_l, &logo_nvidia, term, lf); art->art = choose_ascii_art_aux(&logo_nvidia_l, &logo_nvidia, term, lf);
} }
else if(art->vendor == GPU_VENDOR_AMD) {
art->art = choose_ascii_art_aux(&logo_amd_l, &logo_amd, term, lf);
}
else if(art->vendor == GPU_VENDOR_INTEL) { else if(art->vendor == GPU_VENDOR_INTEL) {
art->art = choose_ascii_art_aux(&logo_intel_l, &logo_intel, term, lf); art->art = choose_ascii_art_aux(&logo_intel_l, &logo_intel, term, lf);
} }
@@ -269,13 +270,14 @@ void choose_ascii_art(struct ascii* art, struct color** cs, struct terminal* ter
} }
} }
uint32_t longest_attribute_length(struct ascii* art, const char** attribute_fields) { uint32_t longest_attribute_length(struct ascii* art, bool use_short) {
uint32_t max = 0; uint32_t max = 0;
uint64_t len = 0; uint64_t len = 0;
for(uint32_t i=0; i < art->n_attributes_set; i++) { for(uint32_t i=0; i < art->n_attributes_set; i++) {
if(art->attributes[i]->value != NULL) { if(art->attributes[i]->value != NULL) {
len = strlen(attribute_fields[art->attributes[i]->type]); const char* str = use_short ? ATTRIBUTE_INFO[art->attributes[i]->type].shortname : ATTRIBUTE_INFO[art->attributes[i]->type].name;
len = strlen(str);
if(len > max) max = len; if(len > max) max = len;
} }
} }
@@ -299,7 +301,7 @@ uint32_t longest_field_length(struct ascii* art, int la) {
return max; return max;
} }
void print_ascii_generic(struct ascii* art, uint32_t la, int32_t text_space, const char** attribute_fields) { void print_ascii_generic(struct ascii* art, uint32_t la, int32_t text_space, bool use_short) {
struct ascii_logo* logo = art->art; struct ascii_logo* logo = art->art;
int attr_to_print = 0; int attr_to_print = 0;
int attr_type; int attr_type;
@@ -343,11 +345,13 @@ void print_ascii_generic(struct ascii* art, uint32_t la, int32_t text_space, con
attr_value = art->attributes[attr_to_print]->value; attr_value = art->attributes[attr_to_print]->value;
attr_to_print++; attr_to_print++;
space_right = 1 + (la - strlen(attribute_fields[attr_type])); const char* attr_str = use_short ? ATTRIBUTE_INFO[attr_type].shortname : ATTRIBUTE_INFO[attr_type].name;
space_right = 1 + (la - strlen(attr_str));
current_space = max(0, text_space); current_space = max(0, text_space);
printf("%s%.*s%s", logo->color_text[0], current_space, attribute_fields[attr_type], art->reset); printf("%s%.*s%s", logo->color_text[0], current_space, attr_str, art->reset);
current_space = max(0, current_space - (int) strlen(attribute_fields[attr_type])); current_space = max(0, current_space - (int) strlen(attr_str));
printf("%*s", min(current_space, space_right), ""); printf("%*s", min(current_space, space_right), "");
current_space = max(0, current_space - min(current_space, space_right)); current_space = max(0, current_space - min(current_space, space_right));
printf("%s%.*s%s", logo->color_text[1], current_space, attr_value, art->reset); printf("%s%.*s%s", logo->color_text[1], current_space, attr_value, art->reset);
@@ -381,19 +385,19 @@ bool print_gpufetch_intel(struct gpu_info* gpu, STYLE s, struct color** cs, stru
setAttribute(art, ATTRIBUTE_EUS, eus); setAttribute(art, ATTRIBUTE_EUS, eus);
setAttribute(art, ATTRIBUTE_PEAK, pp); setAttribute(art, ATTRIBUTE_PEAK, pp);
const char** attribute_fields = ATTRIBUTE_FIELDS; bool use_short = false;
uint32_t longest_attribute = longest_attribute_length(art, attribute_fields); uint32_t longest_attribute = longest_attribute_length(art, use_short);
uint32_t longest_field = longest_field_length(art, longest_attribute); uint32_t longest_field = longest_field_length(art, longest_attribute);
choose_ascii_art(art, cs, term, longest_field); choose_ascii_art(art, cs, term, longest_field);
if(!ascii_fits_screen(term->w, *art->art, longest_field)) { if(!ascii_fits_screen(term->w, *art->art, longest_field)) {
// Despite of choosing the smallest logo, the output does not fit // Despite of choosing the smallest logo, the output does not fit
// Choose the shorter field names and recalculate the longest attr // Choose the shorter field names and recalculate the longest attr
attribute_fields = ATTRIBUTE_FIELDS_SHORT; use_short = true;
longest_attribute = longest_attribute_length(art, attribute_fields); longest_attribute = longest_attribute_length(art, use_short);
} }
print_ascii_generic(art, longest_attribute, term->w - art->art->width, attribute_fields); print_ascii_generic(art, longest_attribute, term->w - art->art->width, use_short);
return true; return true;
} }
@@ -450,19 +454,19 @@ bool print_gpufetch_cuda(struct gpu_info* gpu, STYLE s, struct color** cs, struc
setAttribute(art, ATTRIBUTE_PEAK_TENSOR, pp_tensor); setAttribute(art, ATTRIBUTE_PEAK_TENSOR, pp_tensor);
} }
const char** attribute_fields = ATTRIBUTE_FIELDS; bool use_short = false;
uint32_t longest_attribute = longest_attribute_length(art, attribute_fields); uint32_t longest_attribute = longest_attribute_length(art, use_short);
uint32_t longest_field = longest_field_length(art, longest_attribute); uint32_t longest_field = longest_field_length(art, longest_attribute);
choose_ascii_art(art, cs, term, longest_field); choose_ascii_art(art, cs, term, longest_field);
if(!ascii_fits_screen(term->w, *art->art, longest_field)) { if(!ascii_fits_screen(term->w, *art->art, longest_field)) {
// Despite of choosing the smallest logo, the output does not fit // Despite of choosing the smallest logo, the output does not fit
// Choose the shorter field names and recalculate the longest attr // Choose the shorter field names and recalculate the longest attr
attribute_fields = ATTRIBUTE_FIELDS_SHORT; use_short = true;
longest_attribute = longest_attribute_length(art, attribute_fields); longest_attribute = longest_attribute_length(art, use_short);
} }
print_ascii_generic(art, longest_attribute, term->w - art->art->width, attribute_fields); print_ascii_generic(art, longest_attribute, term->w - art->art->width, use_short);
free(manufacturing_process); free(manufacturing_process);
free(max_frequency); free(max_frequency);
@@ -476,6 +480,62 @@ bool print_gpufetch_cuda(struct gpu_info* gpu, STYLE s, struct color** cs, struc
} }
#endif #endif
#ifdef BACKEND_HSA
bool print_gpufetch_amd(struct gpu_info* gpu, STYLE s, struct color** cs, struct terminal* term) {
struct ascii* art = set_ascii(get_gpu_vendor(gpu), s);
if(art == NULL)
return false;
char* gpu_name = get_str_gpu_name(gpu);
char* gpu_chip = get_str_chip(gpu->arch);
char* uarch = get_str_uarch_hsa(gpu->arch);
char* manufacturing_process = get_str_process(gpu->arch);
char* cus = get_str_cu(gpu);
char* matrix_cores = get_str_matrix_cores(gpu);
char* xcds = get_str_xcds(gpu);
char* max_frequency = get_str_freq(gpu);
char* bus_width = get_str_bus_width(gpu);
char* mem_size = get_str_memory_size(gpu);
char* lds_size = get_str_lds_size(gpu);
setAttribute(art, ATTRIBUTE_NAME, gpu_name);
if (gpu_chip != NULL) {
setAttribute(art, ATTRIBUTE_CHIP, gpu_chip);
}
setAttribute(art, ATTRIBUTE_UARCH, uarch);
setAttribute(art, ATTRIBUTE_TECHNOLOGY, manufacturing_process);
setAttribute(art, ATTRIBUTE_FREQUENCY, max_frequency);
setAttribute(art, ATTRIBUTE_COMPUTE_UNITS, cus);
setAttribute(art, ATTRIBUTE_MATRIX_CORES, matrix_cores);
if (xcds != NULL) {
setAttribute(art, ATTRIBUTE_XCDS, xcds);
}
setAttribute(art, ATTRIBUTE_LDS_SIZE, lds_size);
setAttribute(art, ATTRIBUTE_MEMORY, mem_size);
setAttribute(art, ATTRIBUTE_BUS_WIDTH, bus_width);
bool use_short = false;
uint32_t longest_attribute = longest_attribute_length(art, use_short);
uint32_t longest_field = longest_field_length(art, longest_attribute);
choose_ascii_art(art, cs, term, longest_field);
if(!ascii_fits_screen(term->w, *art->art, longest_field)) {
// Despite of choosing the smallest logo, the output does not fit
// Choose the shorter field names and recalculate the longest attr
use_short = true;
longest_attribute = longest_attribute_length(art, use_short);
}
print_ascii_generic(art, longest_attribute, term->w - art->art->width, use_short);
free(art->attributes);
free(art);
return true;
}
#endif
struct terminal* get_terminal_size() { struct terminal* get_terminal_size() {
struct terminal* term = (struct terminal*) emalloc(sizeof(struct terminal)); struct terminal* term = (struct terminal*) emalloc(sizeof(struct terminal));
@@ -515,11 +575,22 @@ bool print_gpufetch(struct gpu_info* gpu, STYLE s, struct color** cs) {
return false; return false;
#endif #endif
} }
else { else if(gpu->vendor == GPU_VENDOR_AMD) {
#ifdef BACKEND_HSA
return print_gpufetch_amd(gpu, s, cs, term);
#else
return false;
#endif
}
else if(gpu->vendor == GPU_VENDOR_INTEL) {
#ifdef BACKEND_INTEL #ifdef BACKEND_INTEL
return print_gpufetch_intel(gpu, s, cs, term); return print_gpufetch_intel(gpu, s, cs, term);
#else #else
return false; return false;
#endif #endif
} }
else {
printErr("Invalid GPU vendor: %d", gpu->vendor);
return false;
}
} }

View File

@@ -16,6 +16,9 @@ struct uarch {
int32_t cc_minor; int32_t cc_minor;
int32_t compute_capability; int32_t compute_capability;
// HSA specific
int32_t llvm_target;
// Intel specific // Intel specific
int32_t gt; int32_t gt;
int32_t eu; int32_t eu;

View File

@@ -5,6 +5,10 @@ typedef uint32_t GPUCHIP;
enum { enum {
CHIP_UNKNOWN_CUDA, CHIP_UNKNOWN_CUDA,
CHIP_AD102,
CHIP_AD102GL,
CHIP_AD104,
CHIP_AD104GL,
CHIP_G80, CHIP_G80,
CHIP_G80GL, CHIP_G80GL,
CHIP_G84, CHIP_G84,
@@ -37,6 +41,9 @@ enum {
CHIP_GA100GL, CHIP_GA100GL,
CHIP_GA102, CHIP_GA102,
CHIP_GA102GL, CHIP_GA102GL,
CHIP_GA103,
CHIP_GA103GLM,
CHIP_GA103M,
CHIP_GA104, CHIP_GA104,
CHIP_GA104GL, CHIP_GA104GL,
CHIP_GA104GLM, CHIP_GA104GLM,
@@ -45,6 +52,7 @@ enum {
CHIP_GA106M, CHIP_GA106M,
CHIP_GA107, CHIP_GA107,
CHIP_GA107BM, CHIP_GA107BM,
CHIP_GA107GL,
CHIP_GA107GLM, CHIP_GA107GLM,
CHIP_GA107M, CHIP_GA107M,
CHIP_GF100, CHIP_GF100,
@@ -71,6 +79,7 @@ enum {
CHIP_GF117M, CHIP_GF117M,
CHIP_GF119, CHIP_GF119,
CHIP_GF119M, CHIP_GF119M,
CHIP_GH100,
CHIP_GK104, CHIP_GK104,
CHIP_GK104GL, CHIP_GK104GL,
CHIP_GK104GLM, CHIP_GK104GLM,
@@ -166,7 +175,7 @@ enum {
CHIP_TU117BM, CHIP_TU117BM,
CHIP_TU117GL, CHIP_TU117GL,
CHIP_TU117GLM, CHIP_TU117GLM,
CHIP_TU117M, CHIP_TU117M
}; };
#endif #endif

View File

@@ -1,9 +1,15 @@
#include <helper_cuda.h>
// patched cuda.cpp for cuda13 by cloudy
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include "cuda.hpp" #include "cuda.hpp"
#include "uarch.hpp" #include "uarch.hpp"
#include "../common/pci.hpp" #include "pci.hpp"
#include "gpufetch_helper_cuda.hpp"
#include "../common/global.hpp" #include "../common/global.hpp"
#include "../common/uarch.hpp" #include "../common/uarch.hpp"
@@ -11,29 +17,22 @@ bool print_gpu_cuda(struct gpu_info* gpu) {
char* cc = get_str_cc(gpu->arch); char* cc = get_str_cc(gpu->arch);
printf("%s (Compute Capability %s)\n", gpu->name, cc); printf("%s (Compute Capability %s)\n", gpu->name, cc);
free(cc); free(cc);
return true; return true;
} }
struct cache* get_cache_info(cudaDeviceProp prop) { struct cache* get_cache_info(cudaDeviceProp prop) {
struct cache* cach = (struct cache*) emalloc(sizeof(struct cache)); struct cache* cach = (struct cache*) emalloc(sizeof(struct cache));
cach->L2 = (struct cach*) emalloc(sizeof(struct cach)); cach->L2 = (struct cach*) emalloc(sizeof(struct cach));
cach->L2->size = prop.l2CacheSize; cach->L2->size = prop.l2CacheSize;
cach->L2->num_caches = 1; cach->L2->num_caches = 1;
cach->L2->exists = true; cach->L2->exists = true;
return cach; return cach;
} }
int get_tensor_cores(struct uarch* arch, int sm, int major) { int get_tensor_cores(struct uarch* arch, int sm, int major) {
if(major == 7) { if(major == 7) {
// TU116 does not have tensor cores! if (is_chip_TU116(arch))
// https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2
if(arch->chip == CHIP_TU116 || arch->chip == CHIP_TU116BM ||
arch->chip == CHIP_TU116GL || arch->chip == CHIP_TU116M) {
return 0; return 0;
}
return sm * 8; return sm * 8;
} }
else if(major == 8) return sm * 4; else if(major == 8) return sm * 4;
@@ -42,57 +41,57 @@ int get_tensor_cores(struct uarch* arch, int sm, int major) {
struct topology_c* get_topology_info(struct uarch* arch, cudaDeviceProp prop) { struct topology_c* get_topology_info(struct uarch* arch, cudaDeviceProp prop) {
struct topology_c* topo = (struct topology_c*) emalloc(sizeof(struct topology_c)); struct topology_c* topo = (struct topology_c*) emalloc(sizeof(struct topology_c));
topo->streaming_mp = prop.multiProcessorCount; topo->streaming_mp = prop.multiProcessorCount;
topo->cores_per_mp = _ConvertSMVer2Cores(prop.major, prop.minor); topo->cores_per_mp = _ConvertSMVer2Cores(prop.major, prop.minor);
topo->cuda_cores = topo->streaming_mp * topo->cores_per_mp; topo->cuda_cores = topo->streaming_mp * topo->cores_per_mp;
topo->tensor_cores = get_tensor_cores(arch, topo->streaming_mp, prop.major); topo->tensor_cores = get_tensor_cores(arch, topo->streaming_mp, prop.major);
return topo; return topo;
} }
int32_t guess_clock_multipilier(struct gpu_info* gpu, struct memory* mem) { int32_t guess_clock_multipilier(struct gpu_info* gpu, struct memory* mem) {
// Guess clock multiplier
int32_t clk_mul = 1; int32_t clk_mul = 1;
int32_t clk8 = abs((mem->freq/8) - gpu->freq); int32_t clk8 = abs((mem->freq/8) - gpu->freq);
int32_t clk4 = abs((mem->freq/4) - gpu->freq); int32_t clk4 = abs((mem->freq/4) - gpu->freq);
int32_t clk2 = abs((mem->freq/2) - gpu->freq); int32_t clk2 = abs((mem->freq/2) - gpu->freq);
int32_t clk1 = abs((mem->freq/1) - gpu->freq); int32_t clk1 = abs((mem->freq/1) - gpu->freq);
int32_t min = mem->freq; int32_t min = mem->freq;
if(clkm_possible_for_uarch(8, gpu->arch) && min > clk8) { clk_mul = 8; min = clk8; } if(clkm_possible_for_uarch(8, gpu->arch) && min > clk8) { clk_mul = 8; min = clk8; }
if(clkm_possible_for_uarch(4, gpu->arch) && min > clk4) { clk_mul = 4; min = clk4; } if(clkm_possible_for_uarch(4, gpu->arch) && min > clk4) { clk_mul = 4; min = clk4; }
if(clkm_possible_for_uarch(2, gpu->arch) && min > clk2) { clk_mul = 2; min = clk2; } if(clkm_possible_for_uarch(2, gpu->arch) && min > clk2) { clk_mul = 2; min = clk2; }
if(clkm_possible_for_uarch(1, gpu->arch) && min > clk1) { clk_mul = 1; min = clk1; } if(clkm_possible_for_uarch(1, gpu->arch) && min > clk1) { clk_mul = 1; min = clk1; }
return clk_mul; return clk_mul;
} }
struct memory* get_memory_info(struct gpu_info* gpu, cudaDeviceProp prop) { struct memory* get_memory_info(struct gpu_info* gpu, cudaDeviceProp prop) {
struct memory* mem = (struct memory*) emalloc(sizeof(struct memory)); struct memory* mem = (struct memory*) emalloc(sizeof(struct memory));
int val = 0;
mem->size_bytes = (unsigned long long) prop.totalGlobalMem; mem->size_bytes = (unsigned long long) prop.totalGlobalMem;
mem->freq = prop.memoryClockRate * 0.001f;
if (cudaDeviceGetAttribute(&val, cudaDevAttrMemoryClockRate, gpu->idx) == cudaSuccess) {
if (val > 1000000)
mem->freq = (float)val / 1000000.0f;
else
mem->freq = (float)val * 0.001f;
} else {
mem->freq = 0.0f;
}
mem->bus_width = prop.memoryBusWidth; mem->bus_width = prop.memoryBusWidth;
mem->clk_mul = guess_clock_multipilier(gpu, mem); mem->clk_mul = guess_clock_multipilier(gpu, mem);
mem->type = guess_memtype_from_cmul_and_uarch(mem->clk_mul, gpu->arch); mem->type = guess_memtype_from_cmul_and_uarch(mem->clk_mul, gpu->arch);
// Fix frequency returned from CUDA to show real frequency if (mem->clk_mul > 0)
mem->freq = mem->freq / mem->clk_mul; mem->freq = mem->freq / mem->clk_mul;
return mem; return mem;
} }
// Compute peak performance when using CUDA cores
int64_t get_peak_performance_cuda(struct gpu_info* gpu) { int64_t get_peak_performance_cuda(struct gpu_info* gpu) {
return gpu->freq * 1000000 * gpu->topo_c->cuda_cores * 2; return gpu->freq * 1000000 * gpu->topo_c->cuda_cores * 2;
} }
// Compute peak performance when using tensor cores
int64_t get_peak_performance_tcu(cudaDeviceProp prop, struct gpu_info* gpu) { int64_t get_peak_performance_tcu(cudaDeviceProp prop, struct gpu_info* gpu) {
// Volta / Turing tensor cores performs 4x4x4 FP16 matrix multiplication
// Ampere tensor cores performs 8x4x8 FP16 matrix multiplicacion
if(prop.major == 7) return gpu->freq * 1000000 * 4 * 4 * 4 * 2 * gpu->topo_c->tensor_cores; if(prop.major == 7) return gpu->freq * 1000000 * 4 * 4 * 4 * 2 * gpu->topo_c->tensor_cores;
else if(prop.major == 8) return gpu->freq * 1000000 * 8 * 4 * 8 * 2 * gpu->topo_c->tensor_cores; else if(prop.major == 8) return gpu->freq * 1000000 * 8 * 4 * 8 * 2 * gpu->topo_c->tensor_cores;
else return 0; else return 0;
@@ -114,24 +113,24 @@ struct gpu_info* get_gpu_info_cuda(struct pci_dev *devices, int gpu_idx) {
} }
int num_gpus = -1; int num_gpus = -1;
cudaError_t err = cudaSuccess; cudaError_t err = cudaGetDeviceCount(&num_gpus);
if ((err = cudaGetDeviceCount(&num_gpus)) != cudaSuccess) {
printErr("%s: %s", cudaGetErrorName(err), cudaGetErrorString(err));
return NULL;
}
if(gpu_idx == 0) { if(gpu_idx == 0) {
printf("\r"); printf("\r%*c\r", (int) strlen(CUDA_DRIVER_START_WARNING), ' ');
fflush(stdout); fflush(stdout);
} }
if(err != cudaSuccess) {
printErr("%s: %s", cudaGetErrorName(err), cudaGetErrorString(err));
return NULL;
}
if(num_gpus <= 0) { if(num_gpus <= 0) {
printErr("No CUDA capable devices found!"); printErr("No CUDA capable devices found!");
return NULL; return NULL;
} }
if(gpu->idx+1 > num_gpus) { if(gpu->idx+1 > num_gpus) {
// Master is trying to query an invalid GPU
return NULL; return NULL;
} }
@@ -141,12 +140,25 @@ struct gpu_info* get_gpu_info_cuda(struct pci_dev *devices, int gpu_idx) {
return NULL; return NULL;
} }
gpu->freq = deviceProp.clockRate * 1e-3f; int core_clk = 0;
if (cudaDeviceGetAttribute(&core_clk, cudaDevAttrClockRate, gpu->idx) == cudaSuccess) {
if (core_clk > 1000000)
gpu->freq = core_clk / 1000000.0f;
else
gpu->freq = core_clk * 0.001f;
} else {
gpu->freq = 0.0f;
}
gpu->vendor = GPU_VENDOR_NVIDIA; gpu->vendor = GPU_VENDOR_NVIDIA;
gpu->name = (char *) emalloc(sizeof(char) * (strlen(deviceProp.name) + 1)); gpu->name = (char *) emalloc(strlen(deviceProp.name) + 1);
strcpy(gpu->name, deviceProp.name); strcpy(gpu->name, deviceProp.name);
gpu->pci = get_pci_from_pciutils(devices, PCI_VENDOR_ID_NVIDIA, gpu_idx); if((gpu->pci = get_pci_from_pciutils(devices, PCI_VENDOR_ID_NVIDIA, gpu_idx)) == NULL) {
printErr("Unable to find a valid device for vendor id 0x%.4X using pciutils", PCI_VENDOR_ID_NVIDIA);
return NULL;
}
gpu->arch = get_uarch_from_cuda(gpu); gpu->arch = get_uarch_from_cuda(gpu);
gpu->cach = get_cache_info(deviceProp); gpu->cach = get_cache_info(deviceProp);
gpu->mem = get_memory_info(gpu, deviceProp); gpu->mem = get_memory_info(gpu, deviceProp);
@@ -157,19 +169,7 @@ struct gpu_info* get_gpu_info_cuda(struct pci_dev *devices, int gpu_idx) {
return gpu; return gpu;
} }
char* get_str_sm(struct gpu_info* gpu) { char* get_str_sm(struct gpu_info* gpu) { return get_str_generic(gpu->topo_c->streaming_mp); }
return get_str_generic(gpu->topo_c->streaming_mp); char* get_str_cores_sm(struct gpu_info* gpu) { return get_str_generic(gpu->topo_c->cores_per_mp); }
} char* get_str_cuda_cores(struct gpu_info* gpu) { return get_str_generic(gpu->topo_c->cuda_cores); }
char* get_str_tensor_cores(struct gpu_info* gpu) { return get_str_generic(gpu->topo_c->tensor_cores); }
char* get_str_cores_sm(struct gpu_info* gpu) {
return get_str_generic(gpu->topo_c->cores_per_mp);
}
char* get_str_cuda_cores(struct gpu_info* gpu) {
return get_str_generic(gpu->topo_c->cuda_cores);
}
char* get_str_tensor_cores(struct gpu_info* gpu) {
return get_str_generic(gpu->topo_c->tensor_cores);
}

View File

@@ -0,0 +1,63 @@
#ifndef __GPUFETCH_HELPER_CUDA__
#define __GPUFETCH_HELPER_CUDA__
// gpufetch self contained helper_cuda.h
//
// Avoids relying on helper_cuda.h, which is
// often very hard to include properly, causing
// compilation issues.
//
// URL: https://github.com/NVIDIA/cuda-samples
// Commit: 8199209
inline int _ConvertSMVer2Cores(int major, int minor) {
// Defines for GPU Architecture types (using the SM version to determine
// the # of cores per SM
typedef struct {
int SM; // 0xMm (hexidecimal notation), M = SM Major version,
// and m = SM minor version
int Cores;
} sSMtoCores;
sSMtoCores nGpuArchCoresPerSM[] = {
{0x30, 192},
{0x32, 192},
{0x35, 192},
{0x37, 192},
{0x50, 128},
{0x52, 128},
{0x53, 128},
{0x60, 64},
{0x61, 128},
{0x62, 128},
{0x70, 64},
{0x72, 64},
{0x75, 64},
{0x80, 64},
{0x86, 128},
{0x87, 128},
// I added this one because it was missing in original cuda-samples...
{0x89, 128},
{0x90, 128},
{-1, -1}};
int index = 0;
while (nGpuArchCoresPerSM[index].SM != -1) {
if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor)) {
return nGpuArchCoresPerSM[index].Cores;
}
index++;
}
// If we don't find the values, we default use the previous one
// to run properly
printf(
"MapSMtoCores for SM %d.%d is undefined."
" Default to use %d Cores/SM\n",
major, minor, nGpuArchCoresPerSM[index - 1].Cores);
return nGpuArchCoresPerSM[index - 1].Cores;
}
#endif

View File

@@ -8,7 +8,7 @@
#define CHECK_PCI_START if (false) {} #define CHECK_PCI_START if (false) {}
#define CHECK_PCI(pci, id, chip) \ #define CHECK_PCI(pci, id, chip) \
else if (pci->device_id == id) return chip; else if (pci->device_id == id) return chip;
#define CHECK_PCI_END else { printBug("Unkown CUDA device id: 0x%.4X", pci->device_id); return CHIP_UNKNOWN_CUDA; } #define CHECK_PCI_END else { printBug("Unknown CUDA device id: 0x%.4X", pci->device_id); return CHIP_UNKNOWN_CUDA; }
/* /*
* pci ids were retrieved using https://github.com/pciutils/pciids * pci ids were retrieved using https://github.com/pciutils/pciids
@@ -21,61 +21,110 @@
GPUCHIP get_chip_from_pci_cuda(struct pci* pci) { GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI_START CHECK_PCI_START
CHECK_PCI(pci, 0x27b8, CHIP_AD104GL)
CHECK_PCI(pci, 0x2785, CHIP_AD104)
CHECK_PCI(pci, 0x26b8, CHIP_AD102GL)
CHECK_PCI(pci, 0x26b5, CHIP_AD102GL)
CHECK_PCI(pci, 0x26b1, CHIP_AD102GL)
CHECK_PCI(pci, 0x2684, CHIP_AD102)
CHECK_PCI(pci, 0x25fa, CHIP_GA107)
CHECK_PCI(pci, 0x25f9, CHIP_GA107)
CHECK_PCI(pci, 0x25e5, CHIP_GA107BM) CHECK_PCI(pci, 0x25e5, CHIP_GA107BM)
CHECK_PCI(pci, 0x25e2, CHIP_GA107BM) CHECK_PCI(pci, 0x25e2, CHIP_GA107BM)
CHECK_PCI(pci, 0x25e0, CHIP_GA107BM) CHECK_PCI(pci, 0x25e0, CHIP_GA107BM)
CHECK_PCI(pci, 0x25bb, CHIP_GA107GLM)
CHECK_PCI(pci, 0x25ba, CHIP_GA107GLM)
CHECK_PCI(pci, 0x25b9, CHIP_GA107GLM)
CHECK_PCI(pci, 0x25b8, CHIP_GA107GLM) CHECK_PCI(pci, 0x25b8, CHIP_GA107GLM)
CHECK_PCI(pci, 0x25b6, CHIP_GA107GL)
CHECK_PCI(pci, 0x25b5, CHIP_GA107GLM) CHECK_PCI(pci, 0x25b5, CHIP_GA107GLM)
CHECK_PCI(pci, 0x25af, CHIP_GA107) CHECK_PCI(pci, 0x25af, CHIP_GA107)
CHECK_PCI(pci, 0x25aa, CHIP_GA107M)
CHECK_PCI(pci, 0x25a9, CHIP_GA107M)
CHECK_PCI(pci, 0x25a7, CHIP_GA107M)
CHECK_PCI(pci, 0x25a6, CHIP_GA107M)
CHECK_PCI(pci, 0x25a5, CHIP_GA107M) CHECK_PCI(pci, 0x25a5, CHIP_GA107M)
CHECK_PCI(pci, 0x25a4, CHIP_GA107) CHECK_PCI(pci, 0x25a4, CHIP_GA107)
CHECK_PCI(pci, 0x25a3, CHIP_GA107)
CHECK_PCI(pci, 0x25a2, CHIP_GA107M) CHECK_PCI(pci, 0x25a2, CHIP_GA107M)
CHECK_PCI(pci, 0x25a0, CHIP_GA107M) CHECK_PCI(pci, 0x25a0, CHIP_GA107M)
CHECK_PCI(pci, 0x2583, CHIP_GA107) CHECK_PCI(pci, 0x2583, CHIP_GA107)
CHECK_PCI(pci, 0x2571, CHIP_GA106)
CHECK_PCI(pci, 0x2563, CHIP_GA106M) CHECK_PCI(pci, 0x2563, CHIP_GA106M)
CHECK_PCI(pci, 0x2561, CHIP_GA106M)
CHECK_PCI(pci, 0x2560, CHIP_GA106M) CHECK_PCI(pci, 0x2560, CHIP_GA106M)
CHECK_PCI(pci, 0x2544, CHIP_GA106)
CHECK_PCI(pci, 0x2531, CHIP_GA106)
CHECK_PCI(pci, 0x252f, CHIP_GA106) CHECK_PCI(pci, 0x252f, CHIP_GA106)
CHECK_PCI(pci, 0x2523, CHIP_GA106M) CHECK_PCI(pci, 0x2523, CHIP_GA106M)
CHECK_PCI(pci, 0x2521, CHIP_GA106M)
CHECK_PCI(pci, 0x2520, CHIP_GA106M) CHECK_PCI(pci, 0x2520, CHIP_GA106M)
CHECK_PCI(pci, 0x2508, CHIP_GA106)
CHECK_PCI(pci, 0x2507, CHIP_GA106)
CHECK_PCI(pci, 0x2505, CHIP_GA106) CHECK_PCI(pci, 0x2505, CHIP_GA106)
CHECK_PCI(pci, 0x2504, CHIP_GA106) CHECK_PCI(pci, 0x2504, CHIP_GA106)
CHECK_PCI(pci, 0x2503, CHIP_GA106) CHECK_PCI(pci, 0x2503, CHIP_GA106)
CHECK_PCI(pci, 0x2501, CHIP_GA106) CHECK_PCI(pci, 0x2501, CHIP_GA106)
CHECK_PCI(pci, 0x24fa, CHIP_GA104)
CHECK_PCI(pci, 0x24e0, CHIP_GA104M)
CHECK_PCI(pci, 0x24df, CHIP_GA104M)
CHECK_PCI(pci, 0x24dd, CHIP_GA104M) CHECK_PCI(pci, 0x24dd, CHIP_GA104M)
CHECK_PCI(pci, 0x24dc, CHIP_GA104M) CHECK_PCI(pci, 0x24dc, CHIP_GA104M)
CHECK_PCI(pci, 0x24c9, CHIP_GA104)
CHECK_PCI(pci, 0x24bf, CHIP_GA104) CHECK_PCI(pci, 0x24bf, CHIP_GA104)
CHECK_PCI(pci, 0x24bb, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24ba, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24b9, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24b8, CHIP_GA104GLM) CHECK_PCI(pci, 0x24b8, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24b7, CHIP_GA104GLM) CHECK_PCI(pci, 0x24b7, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24b6, CHIP_GA104GLM) CHECK_PCI(pci, 0x24b6, CHIP_GA104GLM)
CHECK_PCI(pci, 0x24b1, CHIP_GA104GL)
CHECK_PCI(pci, 0x24b0, CHIP_GA104GL) CHECK_PCI(pci, 0x24b0, CHIP_GA104GL)
CHECK_PCI(pci, 0x24af, CHIP_GA104) CHECK_PCI(pci, 0x24af, CHIP_GA104)
CHECK_PCI(pci, 0x24ad, CHIP_GA104) CHECK_PCI(pci, 0x24ad, CHIP_GA104)
CHECK_PCI(pci, 0x24ac, CHIP_GA104) CHECK_PCI(pci, 0x24ac, CHIP_GA104)
CHECK_PCI(pci, 0x24a0, CHIP_GA104)
CHECK_PCI(pci, 0x249f, CHIP_GA104M) CHECK_PCI(pci, 0x249f, CHIP_GA104M)
CHECK_PCI(pci, 0x249d, CHIP_GA104M) CHECK_PCI(pci, 0x249d, CHIP_GA104M)
CHECK_PCI(pci, 0x249c, CHIP_GA104M) CHECK_PCI(pci, 0x249c, CHIP_GA104M)
CHECK_PCI(pci, 0x248a, CHIP_GA104) CHECK_PCI(pci, 0x248a, CHIP_GA104)
CHECK_PCI(pci, 0x2489, CHIP_GA104) CHECK_PCI(pci, 0x2489, CHIP_GA104)
CHECK_PCI(pci, 0x2488, CHIP_GA104) CHECK_PCI(pci, 0x2488, CHIP_GA104)
CHECK_PCI(pci, 0x2487, CHIP_GA104)
CHECK_PCI(pci, 0x2486, CHIP_GA104) CHECK_PCI(pci, 0x2486, CHIP_GA104)
CHECK_PCI(pci, 0x2484, CHIP_GA104) CHECK_PCI(pci, 0x2484, CHIP_GA104)
CHECK_PCI(pci, 0x2483, CHIP_GA104) CHECK_PCI(pci, 0x2483, CHIP_GA104)
CHECK_PCI(pci, 0x2482, CHIP_GA104) CHECK_PCI(pci, 0x2482, CHIP_GA104)
CHECK_PCI(pci, 0x2460, CHIP_GA103M)
CHECK_PCI(pci, 0x2438, CHIP_GA103GLM)
CHECK_PCI(pci, 0x2420, CHIP_GA103M)
CHECK_PCI(pci, 0x2414, CHIP_GA103)
CHECK_PCI(pci, 0x2336, CHIP_GH100)
CHECK_PCI(pci, 0x2331, CHIP_GH100)
CHECK_PCI(pci, 0x2321, CHIP_GH100)
CHECK_PCI(pci, 0x2302, CHIP_GH100)
CHECK_PCI(pci, 0x228e, CHIP_GA106)
CHECK_PCI(pci, 0x228b, CHIP_GA104) CHECK_PCI(pci, 0x228b, CHIP_GA104)
CHECK_PCI(pci, 0x223f, CHIP_GA102GL) CHECK_PCI(pci, 0x223f, CHIP_GA102GL)
CHECK_PCI(pci, 0x2238, CHIP_GA102GL)
CHECK_PCI(pci, 0x2237, CHIP_GA102GL) CHECK_PCI(pci, 0x2237, CHIP_GA102GL)
CHECK_PCI(pci, 0x2236, CHIP_GA102GL) CHECK_PCI(pci, 0x2236, CHIP_GA102GL)
CHECK_PCI(pci, 0x2235, CHIP_GA102GL) CHECK_PCI(pci, 0x2235, CHIP_GA102GL)
CHECK_PCI(pci, 0x2233, CHIP_GA102GL)
CHECK_PCI(pci, 0x2232, CHIP_GA102GL)
CHECK_PCI(pci, 0x2231, CHIP_GA102GL) CHECK_PCI(pci, 0x2231, CHIP_GA102GL)
CHECK_PCI(pci, 0x2230, CHIP_GA102GL) CHECK_PCI(pci, 0x2230, CHIP_GA102GL)
CHECK_PCI(pci, 0x222f, CHIP_GA102) CHECK_PCI(pci, 0x222f, CHIP_GA102)
CHECK_PCI(pci, 0x222b, CHIP_GA102) CHECK_PCI(pci, 0x222b, CHIP_GA102)
CHECK_PCI(pci, 0x2216, CHIP_GA102) CHECK_PCI(pci, 0x2216, CHIP_GA102)
CHECK_PCI(pci, 0x220d, CHIP_GA102) CHECK_PCI(pci, 0x220d, CHIP_GA102)
CHECK_PCI(pci, 0x220a, CHIP_GA102)
CHECK_PCI(pci, 0x2208, CHIP_GA102) CHECK_PCI(pci, 0x2208, CHIP_GA102)
CHECK_PCI(pci, 0x2207, CHIP_GA102)
CHECK_PCI(pci, 0x2206, CHIP_GA102) CHECK_PCI(pci, 0x2206, CHIP_GA102)
CHECK_PCI(pci, 0x2205, CHIP_GA102) CHECK_PCI(pci, 0x2205, CHIP_GA102)
CHECK_PCI(pci, 0x2204, CHIP_GA102) CHECK_PCI(pci, 0x2204, CHIP_GA102)
CHECK_PCI(pci, 0x2203, CHIP_GA102)
CHECK_PCI(pci, 0x2200, CHIP_GA102) CHECK_PCI(pci, 0x2200, CHIP_GA102)
CHECK_PCI(pci, 0x21d1, CHIP_TU116BM) CHECK_PCI(pci, 0x21d1, CHIP_TU116BM)
CHECK_PCI(pci, 0x21c4, CHIP_TU116) CHECK_PCI(pci, 0x21c4, CHIP_TU116)
@@ -90,27 +139,45 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x2184, CHIP_TU116) CHECK_PCI(pci, 0x2184, CHIP_TU116)
CHECK_PCI(pci, 0x2183, CHIP_TU116) CHECK_PCI(pci, 0x2183, CHIP_TU116)
CHECK_PCI(pci, 0x2182, CHIP_TU116) CHECK_PCI(pci, 0x2182, CHIP_TU116)
CHECK_PCI(pci, 0x20f6, CHIP_GA100)
CHECK_PCI(pci, 0x20f5, CHIP_GA100)
CHECK_PCI(pci, 0x20f2, CHIP_GA100)
CHECK_PCI(pci, 0x20f1, CHIP_GA100) CHECK_PCI(pci, 0x20f1, CHIP_GA100)
CHECK_PCI(pci, 0x20f0, CHIP_GA100)
CHECK_PCI(pci, 0x20c2, CHIP_GA100)
CHECK_PCI(pci, 0x20bf, CHIP_GA100) CHECK_PCI(pci, 0x20bf, CHIP_GA100)
CHECK_PCI(pci, 0x20be, CHIP_GA100) CHECK_PCI(pci, 0x20be, CHIP_GA100)
CHECK_PCI(pci, 0x20bb, CHIP_GA100)
CHECK_PCI(pci, 0x20b9, CHIP_GA100)
CHECK_PCI(pci, 0x20b8, CHIP_GA100)
CHECK_PCI(pci, 0x20b7, CHIP_GA100GL) CHECK_PCI(pci, 0x20b7, CHIP_GA100GL)
CHECK_PCI(pci, 0x20b6, CHIP_GA100GL) CHECK_PCI(pci, 0x20b6, CHIP_GA100GL)
CHECK_PCI(pci, 0x20b5, CHIP_GA100) CHECK_PCI(pci, 0x20b5, CHIP_GA100)
CHECK_PCI(pci, 0x20b3, CHIP_GA100)
CHECK_PCI(pci, 0x20b2, CHIP_GA100) CHECK_PCI(pci, 0x20b2, CHIP_GA100)
CHECK_PCI(pci, 0x20b1, CHIP_GA100) CHECK_PCI(pci, 0x20b1, CHIP_GA100)
CHECK_PCI(pci, 0x20b0, CHIP_GA100) CHECK_PCI(pci, 0x20b0, CHIP_GA100)
CHECK_PCI(pci, 0x2082, CHIP_GA100)
CHECK_PCI(pci, 0x1ff9, CHIP_TU117GLM) CHECK_PCI(pci, 0x1ff9, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1ff2, CHIP_TU117GL)
CHECK_PCI(pci, 0x1ff0, CHIP_TU117GL)
CHECK_PCI(pci, 0x1fdd, CHIP_TU117BM) CHECK_PCI(pci, 0x1fdd, CHIP_TU117BM)
CHECK_PCI(pci, 0x1fd9, CHIP_TU117BM) CHECK_PCI(pci, 0x1fd9, CHIP_TU117BM)
CHECK_PCI(pci, 0x1fbf, CHIP_TU117GL) CHECK_PCI(pci, 0x1fbf, CHIP_TU117GL)
CHECK_PCI(pci, 0x1fbc, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fbb, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fbb, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fba, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fba, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb9, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fb9, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb8, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fb8, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb7, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb6, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb2, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fb2, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fb1, CHIP_TU117GL) CHECK_PCI(pci, 0x1fb1, CHIP_TU117GL)
CHECK_PCI(pci, 0x1fb0, CHIP_TU117GLM) CHECK_PCI(pci, 0x1fb0, CHIP_TU117GLM)
CHECK_PCI(pci, 0x1fae, CHIP_TU117GL) CHECK_PCI(pci, 0x1fae, CHIP_TU117GL)
CHECK_PCI(pci, 0x1fa1, CHIP_TU117M)
CHECK_PCI(pci, 0x1fa0, CHIP_TU117M)
CHECK_PCI(pci, 0x1f9f, CHIP_TU117M)
CHECK_PCI(pci, 0x1f9d, CHIP_TU117M) CHECK_PCI(pci, 0x1f9d, CHIP_TU117M)
CHECK_PCI(pci, 0x1f9c, CHIP_TU117M) CHECK_PCI(pci, 0x1f9c, CHIP_TU117M)
CHECK_PCI(pci, 0x1f99, CHIP_TU117M) CHECK_PCI(pci, 0x1f99, CHIP_TU117M)
@@ -121,6 +188,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1f94, CHIP_TU117M) CHECK_PCI(pci, 0x1f94, CHIP_TU117M)
CHECK_PCI(pci, 0x1f92, CHIP_TU117M) CHECK_PCI(pci, 0x1f92, CHIP_TU117M)
CHECK_PCI(pci, 0x1f91, CHIP_TU117M) CHECK_PCI(pci, 0x1f91, CHIP_TU117M)
CHECK_PCI(pci, 0x1f83, CHIP_TU117)
CHECK_PCI(pci, 0x1f82, CHIP_TU117) CHECK_PCI(pci, 0x1f82, CHIP_TU117)
CHECK_PCI(pci, 0x1f81, CHIP_TU117) CHECK_PCI(pci, 0x1f81, CHIP_TU117)
CHECK_PCI(pci, 0x1f76, CHIP_TU106GLM) CHECK_PCI(pci, 0x1f76, CHIP_TU106GLM)
@@ -144,6 +212,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1f07, CHIP_TU106) CHECK_PCI(pci, 0x1f07, CHIP_TU106)
CHECK_PCI(pci, 0x1f06, CHIP_TU106) CHECK_PCI(pci, 0x1f06, CHIP_TU106)
CHECK_PCI(pci, 0x1f04, CHIP_TU106) CHECK_PCI(pci, 0x1f04, CHIP_TU106)
CHECK_PCI(pci, 0x1f03, CHIP_TU106)
CHECK_PCI(pci, 0x1f02, CHIP_TU106) CHECK_PCI(pci, 0x1f02, CHIP_TU106)
CHECK_PCI(pci, 0x1ef5, CHIP_TU104GLM) CHECK_PCI(pci, 0x1ef5, CHIP_TU104GLM)
CHECK_PCI(pci, 0x1ed3, CHIP_TU104BM) CHECK_PCI(pci, 0x1ed3, CHIP_TU104BM)
@@ -156,6 +225,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1eb8, CHIP_TU104GL) CHECK_PCI(pci, 0x1eb8, CHIP_TU104GL)
CHECK_PCI(pci, 0x1eb6, CHIP_TU104GLM) CHECK_PCI(pci, 0x1eb6, CHIP_TU104GLM)
CHECK_PCI(pci, 0x1eb5, CHIP_TU104GLM) CHECK_PCI(pci, 0x1eb5, CHIP_TU104GLM)
CHECK_PCI(pci, 0x1eb4, CHIP_TU104GL)
CHECK_PCI(pci, 0x1eb1, CHIP_TU104GL) CHECK_PCI(pci, 0x1eb1, CHIP_TU104GL)
CHECK_PCI(pci, 0x1eb0, CHIP_TU104GL) CHECK_PCI(pci, 0x1eb0, CHIP_TU104GL)
CHECK_PCI(pci, 0x1eae, CHIP_TU104M) CHECK_PCI(pci, 0x1eae, CHIP_TU104M)
@@ -186,6 +256,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1df5, CHIP_GV100GL) CHECK_PCI(pci, 0x1df5, CHIP_GV100GL)
CHECK_PCI(pci, 0x1df2, CHIP_GV100GL) CHECK_PCI(pci, 0x1df2, CHIP_GV100GL)
CHECK_PCI(pci, 0x1df0, CHIP_GV100GL) CHECK_PCI(pci, 0x1df0, CHIP_GV100GL)
CHECK_PCI(pci, 0x1dbe, CHIP_GV100)
CHECK_PCI(pci, 0x1dba, CHIP_GV100GL) CHECK_PCI(pci, 0x1dba, CHIP_GV100GL)
CHECK_PCI(pci, 0x1db8, CHIP_GV100GL) CHECK_PCI(pci, 0x1db8, CHIP_GV100GL)
CHECK_PCI(pci, 0x1db7, CHIP_GV100GL) CHECK_PCI(pci, 0x1db7, CHIP_GV100GL)
@@ -205,6 +276,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1d12, CHIP_GP108M) CHECK_PCI(pci, 0x1d12, CHIP_GP108M)
CHECK_PCI(pci, 0x1d11, CHIP_GP108M) CHECK_PCI(pci, 0x1d11, CHIP_GP108M)
CHECK_PCI(pci, 0x1d10, CHIP_GP108M) CHECK_PCI(pci, 0x1d10, CHIP_GP108M)
CHECK_PCI(pci, 0x1d02, CHIP_GP108)
CHECK_PCI(pci, 0x1d01, CHIP_GP108) CHECK_PCI(pci, 0x1d01, CHIP_GP108)
CHECK_PCI(pci, 0x1cfb, CHIP_GP107GL) CHECK_PCI(pci, 0x1cfb, CHIP_GP107GL)
CHECK_PCI(pci, 0x1cfa, CHIP_GP107GL) CHECK_PCI(pci, 0x1cfa, CHIP_GP107GL)
@@ -290,6 +362,7 @@ GPUCHIP get_chip_from_pci_cuda(struct pci* pci) {
CHECK_PCI(pci, 0x1b02, CHIP_GP102) CHECK_PCI(pci, 0x1b02, CHIP_GP102)
CHECK_PCI(pci, 0x1b01, CHIP_GP102) CHECK_PCI(pci, 0x1b01, CHIP_GP102)
CHECK_PCI(pci, 0x1b00, CHIP_GP102) CHECK_PCI(pci, 0x1b00, CHIP_GP102)
CHECK_PCI(pci, 0x1af1, CHIP_GA100)
CHECK_PCI(pci, 0x1aef, CHIP_GA102) CHECK_PCI(pci, 0x1aef, CHIP_GA102)
CHECK_PCI(pci, 0x1aed, CHIP_TU116) CHECK_PCI(pci, 0x1aed, CHIP_TU116)
CHECK_PCI(pci, 0x1aec, CHIP_TU116) CHECK_PCI(pci, 0x1aec, CHIP_TU116)

View File

@@ -1,11 +1,14 @@
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <helper_cuda.h> #include <cstdlib>
#include <cstdint> #include <cstdint>
#include <cstddef> #include <cstddef>
#include <cstdio>
#include <cstring>
#include "../common/uarch.hpp" #include "../common/uarch.hpp"
#include "../common/global.hpp" #include "../common/global.hpp"
#include "../common/gpu.hpp" #include "../common/gpu.hpp"
#include "pci.hpp"
#include "chips.hpp" #include "chips.hpp"
// Any clock multiplier // Any clock multiplier
@@ -22,6 +25,8 @@ enum {
UARCH_VOLTA, UARCH_VOLTA,
UARCH_TURING, UARCH_TURING,
UARCH_AMPERE, UARCH_AMPERE,
UARCH_ADA,
UARCH_HOPPER
}; };
static const char *uarch_str[] = { static const char *uarch_str[] = {
@@ -34,6 +39,8 @@ static const char *uarch_str[] = {
/*[ARCH_VOLTA] = */ "Volta", /*[ARCH_VOLTA] = */ "Volta",
/*[ARCH_TURING] = */ "Turing", /*[ARCH_TURING] = */ "Turing",
/*[ARCH_AMPERE] = */ "Ampere", /*[ARCH_AMPERE] = */ "Ampere",
/*[ARCH_ADA] = */ "Ada Lovelace",
/*[ARCH_HOPPER] = */ "Hopper"
}; };
#define CHECK_UARCH_START if (false) {} #define CHECK_UARCH_START if (false) {}
@@ -216,6 +223,9 @@ void map_chip_to_uarch_cuda(struct uarch* arch) {
CHECK_UARCH(arch, CHIP_GA100GL, "GA100", UARCH_AMPERE, 7) CHECK_UARCH(arch, CHIP_GA100GL, "GA100", UARCH_AMPERE, 7)
CHECK_UARCH(arch, CHIP_GA102, "GA102", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA102, "GA102", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA102GL, "GA102", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA102GL, "GA102", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA103, "GA103", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA103GLM, "GA103", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA103M, "GA103", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA104, "GA104", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA104, "GA104", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA104GL, "GA104", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA104GL, "GA104", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA104GLM, "GA104", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA104GLM, "GA104", UARCH_AMPERE, 8)
@@ -226,6 +236,13 @@ void map_chip_to_uarch_cuda(struct uarch* arch) {
CHECK_UARCH(arch, CHIP_GA107BM, "GA107", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA107BM, "GA107", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA107GLM, "GA107", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA107GLM, "GA107", UARCH_AMPERE, 8)
CHECK_UARCH(arch, CHIP_GA107M, "GA107", UARCH_AMPERE, 8) CHECK_UARCH(arch, CHIP_GA107M, "GA107", UARCH_AMPERE, 8)
// ADA LOVELACE (8.9)
CHECK_UARCH(arch, CHIP_AD102, "AD102", UARCH_ADA, 4)
CHECK_UARCH(arch, CHIP_AD102GL, "AD102", UARCH_ADA, 4)
CHECK_UARCH(arch, CHIP_AD104, "AD104", UARCH_ADA, 4)
CHECK_UARCH(arch, CHIP_AD104GL, "AD104", UARCH_ADA, 4)
// HOPPER (9.0)
CHECK_UARCH(arch, CHIP_GH100, "GH100", UARCH_HOPPER, 4)
CHECK_UARCH_END CHECK_UARCH_END
} }
@@ -264,6 +281,8 @@ bool clkm_possible_for_uarch(int clkm, struct uarch* arch) {
case UARCH_VOLTA: return clkm == 1; case UARCH_VOLTA: return clkm == 1;
case UARCH_TURING: return clkm == 2 || clkm == 4; case UARCH_TURING: return clkm == 2 || clkm == 4;
case UARCH_AMPERE: return clkm == 1 || clkm == 4 || clkm == 8; case UARCH_AMPERE: return clkm == 1 || clkm == 4 || clkm == 8;
case UARCH_ADA: return clkm == 8;
case UARCH_HOPPER: return clkm == 1;
} }
return false; return false;
} }
@@ -315,6 +334,10 @@ MEMTYPE guess_memtype_from_cmul_and_uarch(int clkm, struct uarch* arch) {
CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 1, MEMTYPE_HBM2) CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 1, MEMTYPE_HBM2)
CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 4, MEMTYPE_GDDR6) CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 4, MEMTYPE_GDDR6)
CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 8, MEMTYPE_GDDR6X) CHECK_MEMTYPE(arch, clkm, UARCH_AMPERE, 8, MEMTYPE_GDDR6X)
// ADA
CHECK_MEMTYPE(arch, clkm, UARCH_ADA, 8, MEMTYPE_GDDR6X)
// HOPPER
CHECK_MEMTYPE(arch, clkm, UARCH_HOPPER, 1, MEMTYPE_HBM2)
CHECK_MEMTYPE_END CHECK_MEMTYPE_END
} }
@@ -329,6 +352,7 @@ char* get_str_chip(struct uarch* arch) {
return arch->chip_str; return arch->chip_str;
} }
// TODO: What about _ConvertSMVer2ArchName?
const char* get_str_uarch_cuda(struct uarch* arch) { const char* get_str_uarch_cuda(struct uarch* arch) {
return uarch_str[arch->uarch]; return uarch_str[arch->uarch];
} }
@@ -338,3 +362,8 @@ void free_uarch_struct(struct uarch* arch) {
free(arch->chip_str); free(arch->chip_str);
free(arch); free(arch);
} }
bool is_chip_TU116(struct uarch* arch) {
return arch->chip == CHIP_TU116 || arch->chip == CHIP_TU116BM ||
arch->chip == CHIP_TU116GL || arch->chip == CHIP_TU116M;
}

View File

@@ -13,5 +13,6 @@ char* get_str_cc(struct uarch* arch);
char* get_str_chip(struct uarch* arch); char* get_str_chip(struct uarch* arch);
char* get_str_process(struct uarch* arch); char* get_str_process(struct uarch* arch);
void free_uarch_struct(struct uarch* arch); void free_uarch_struct(struct uarch* arch);
bool is_chip_TU116(struct uarch* arch);
#endif #endif

37
src/hsa/chips.hpp Normal file
View File

@@ -0,0 +1,37 @@
#ifndef __HSA_GPUCHIPS__
#define __HSA_GPUCHIPS__
typedef uint32_t GPUCHIP;
enum {
CHIP_UNKNOWN_HSA,
// VEGA (TODO)
// ...
// RDNA
CHIP_NAVI_10,
CHIP_NAVI_12,
CHIP_NAVI_14,
// RDNA2
// There are way more (eg Oberon)
// Maybe we'll add them in the future.
CHIP_NAVI_21,
CHIP_NAVI_22,
CHIP_NAVI_23,
CHIP_NAVI_24,
// RDNA3
// There are way more as well.
// Supporting Navi only for now.
CHIP_NAVI_31,
CHIP_NAVI_32,
CHIP_NAVI_33,
// RDNA4
CHIP_NAVI_44,
CHIP_NAVI_48,
// CDNA
CHIP_ARCTURUS, // MI100 series
CHIP_ALDEBARAN, // MI200 series
CHIP_AQUA_VANJARAM, // MI300 series
CHIP_CDNA_NEXT // MI350 series
};
#endif

242
src/hsa/hsa.cpp Normal file
View File

@@ -0,0 +1,242 @@
#include <iostream>
#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include <iostream>
#include <iomanip>
#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include "hsa.hpp"
#include "uarch.hpp"
#include "../common/global.hpp"
#include "../common/uarch.hpp"
struct agent_info {
unsigned deviceId; // ID of the target GPU device
char gpu_name[64];
char vendor_name[64];
char device_mkt_name[64];
uint32_t max_clock_freq;
// Memory
uint32_t bus_width;
uint32_t lds_size;
uint64_t global_size;
// Topology
uint32_t compute_unit;
uint32_t num_shader_engines;
uint32_t simds_per_cu;
uint32_t num_xcc; // Acccelerator Complex Dies (XCDs)
uint32_t matrix_cores; // Cores with WMMA/MFMA capabilities
};
#define RET_IF_HSA_ERR(err) { \
if ((err) != HSA_STATUS_SUCCESS) { \
char err_val[12]; \
char* err_str = NULL; \
if (hsa_status_string(err, \
(const char**)&err_str) != HSA_STATUS_SUCCESS) { \
snprintf(&(err_val[0]), sizeof(err_val), "%#x", (uint32_t)err); \
err_str = &(err_val[0]); \
} \
printErr("HSA failure at: %s:%d\n", __FILE__, __LINE__); \
printErr("Call returned %s\n", err_str); \
return (err); \
} \
}
hsa_status_t memory_pool_callback(hsa_amd_memory_pool_t pool, void* data) {
struct agent_info* info = reinterpret_cast<struct agent_info *>(data);
hsa_amd_segment_t segment;
hsa_status_t err = hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &segment);
RET_IF_HSA_ERR(err);
if (segment == HSA_AMD_SEGMENT_GROUP) {
// LDS memory
// We want to make sure that this memory pool is not repeated.
if (info->lds_size != 0) {
printErr("Found HSA_AMD_SEGMENT_GROUP twice!");
return HSA_STATUS_ERROR;
}
uint32_t size = 0;
err = hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SIZE, &size);
RET_IF_HSA_ERR(err);
info->lds_size = size;
}
else if (segment == HSA_AMD_SEGMENT_GLOBAL) {
// Global memory
uint32_t global_flags = 0;
err = hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &global_flags);
RET_IF_HSA_ERR(err);
if (global_flags & HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXTENDED_SCOPE_FINE_GRAINED) {
if (info->global_size != 0) {
printErr("Found HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXTENDED_SCOPE_FINE_GRAINED twice!");
return HSA_STATUS_ERROR;
}
uint64_t size = 0;
err = hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SIZE, &size);
RET_IF_HSA_ERR(err);
info->global_size = size;
}
}
return HSA_STATUS_SUCCESS;
}
hsa_status_t agent_callback(hsa_agent_t agent, void *data) {
struct agent_info* info = reinterpret_cast<struct agent_info *>(data);
hsa_device_type_t type;
hsa_status_t err = hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &type);
RET_IF_HSA_ERR(err);
if (type == HSA_DEVICE_TYPE_GPU) {
err = hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, info->gpu_name);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, HSA_AGENT_INFO_VENDOR_NAME, info->vendor_name);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_PRODUCT_NAME, &info->device_mkt_name);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_MAX_CLOCK_FREQUENCY, &info->max_clock_freq);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &info->compute_unit);
RET_IF_HSA_ERR(err);
// According to the documentation, this is deprecated. But what should I be using then?
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_MEMORY_WIDTH, &info->bus_width);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES, &info->num_shader_engines);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_NUM_SIMDS_PER_CU, &info->simds_per_cu);
RET_IF_HSA_ERR(err);
err = hsa_agent_get_info(agent, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_NUM_XCC, &info->num_xcc);
RET_IF_HSA_ERR(err);
// We will check against zero to see if it was set beforehand.
info->global_size = 0;
info->lds_size = 0;
// This will fill global_size and lds_size.
err = hsa_amd_agent_iterate_memory_pools(agent, memory_pool_callback, data);
RET_IF_HSA_ERR(err);
}
return HSA_STATUS_SUCCESS;
}
struct topology_h* get_topology_info(struct agent_info info) {
struct topology_h* topo = (struct topology_h*) emalloc(sizeof(struct topology_h));
topo->compute_units = info.compute_unit;
topo->num_shader_engines = info.num_shader_engines; // not printed at the moment
topo->simds_per_cu = info.simds_per_cu; // not printed at the moment
topo->num_xcc = info.num_xcc;
// Old GPUs (GCN I guess) might not have matrix cores.
// Not sure what would happen here?
topo->matrix_cores = topo->compute_units * topo->simds_per_cu;
return topo;
}
struct memory* get_memory_info(struct gpu_info* gpu, struct agent_info info) {
struct memory* mem = (struct memory*) emalloc(sizeof(struct memory));
mem->bus_width = info.bus_width;
mem->lds_size = info.lds_size;
mem->size_bytes = info.global_size;
return mem;
}
struct gpu_info* get_gpu_info_hsa(int gpu_idx) {
struct gpu_info* gpu = (struct gpu_info*) emalloc(sizeof(struct gpu_info));
gpu->pci = NULL;
gpu->idx = gpu_idx;
if(gpu->idx < 0) {
printErr("GPU index must be equal or greater than zero");
return NULL;
}
if(gpu->idx > 0) {
// Currently we only support fetching GPU 0.
return NULL;
}
hsa_status_t err = hsa_init();
if (err != HSA_STATUS_SUCCESS) {
printErr("Failed to initialize HSA runtime");
return NULL;
}
struct agent_info info;
info.deviceId = gpu_idx;
// Iterate over all agents in the system
err = hsa_iterate_agents(agent_callback, &info);
if (err != HSA_STATUS_SUCCESS) {
printErr("Failed to iterate HSA agents");
hsa_shut_down();
return NULL;
}
if (strcmp(info.vendor_name, "AMD") != 0) {
printErr("HSA vendor name is: '%s'. Only AMD is supported!", info.vendor_name);
return NULL;
}
gpu->vendor = GPU_VENDOR_AMD;
gpu->freq = info.max_clock_freq;
gpu->topo_h = get_topology_info(info);
gpu->name = (char *) emalloc(sizeof(char) * (strlen(info.device_mkt_name) + 1));
strcpy(gpu->name, info.device_mkt_name);
gpu->arch = get_uarch_from_hsa(gpu, info.gpu_name);
gpu->mem = get_memory_info(gpu, info);
if (gpu->arch == NULL) {
return NULL;
}
// Shut down the HSA runtime
err = hsa_shut_down();
if (err != HSA_STATUS_SUCCESS) {
printErr("Failed to shutdown HSA runtime");
return NULL;
}
return gpu;
}
char* get_str_cu(struct gpu_info* gpu) {
return get_str_generic(gpu->topo_h->compute_units);
}
char* get_str_xcds(struct gpu_info* gpu) {
// If there is a single XCD, then we dont want to
// print it.
if (gpu->topo_h->num_xcc == 1) {
return NULL;
}
return get_str_generic(gpu->topo_h->num_xcc);
}
char* get_str_matrix_cores(struct gpu_info* gpu) {
// TODO: Show XX (WMMA/MFMA)
return get_str_generic(gpu->topo_h->matrix_cores);
}

11
src/hsa/hsa.hpp Normal file
View File

@@ -0,0 +1,11 @@
#ifndef __HSA_GPU__
#define __HSA_GPU__
#include "../common/gpu.hpp"
struct gpu_info* get_gpu_info_hsa(int gpu_idx);
char* get_str_cu(struct gpu_info* gpu);
char* get_str_xcds(struct gpu_info* gpu);
char* get_str_matrix_cores(struct gpu_info* gpu);
#endif

321
src/hsa/uarch.cpp Normal file
View File

@@ -0,0 +1,321 @@
#include <cstdlib>
#include <cstdint>
#include <cstring>
#include "../common/uarch.hpp"
#include "../common/global.hpp"
#include "../common/gpu.hpp"
#include "chips.hpp"
// MICROARCH values
enum {
UARCH_UNKNOWN,
// GCN (Graphics Core Next)
// Empty for now
// ...
// RDNA (Radeon DNA)
UARCH_RDNA,
UARCH_RDNA2,
UARCH_RDNA3,
UARCH_RDNA4,
// CDNA (Compute DNA)
UARCH_CDNA,
UARCH_CDNA2,
UARCH_CDNA3,
UARCH_CDNA4
};
static const char *uarch_str[] = {
/*[ARCH_UNKNOWN] = */ STRING_UNKNOWN,
/*[UARCH_RDNA] = */ "RDNA",
/*[UARCH_RDNA2] = */ "RDNA2",
/*[UARCH_RDNA3] = */ "RDNA3",
/*[UARCH_RDNA4] = */ "RDNA4",
/*[UARCH_CDNA] = */ "CDNA",
/*[UARCH_CDNA2] = */ "CDNA2",
/*[UARCH_CDNA3] = */ "CDNA3",
/*[UARCH_CDNA4] = */ "CDNA4",
};
// Sources:
// - https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html
// - https://www.techpowerup.com
//
// This is sometimes refered to as LLVM target, but also shader ISA.
//
// LLVM target *usually* maps to a specific architecture. However there
// are case where this is not true:
// MI8 is GCN3.0 with LLVM target gfx803
// MI6 is GCN4.0 with LLVM target gfx803
// or
// Strix Point can be gfx1150 or gfx1151
//
// NOTE: GCN chips are stored for completeness, but they are
// not actively supported.
enum {
TARGET_UNKNOWN_HSA,
/// GCN (Graphics Core Next)
/// ------------------------
// GCN 1.0
TARGET_GFX600,
TARGET_GFX601,
TARGET_GFX602,
// GCN 2.0
TARGET_GFX700,
TARGET_GFX701,
TARGET_GFX702,
TARGET_GFX703,
TARGET_GFX704,
TARGET_GFX705,
// GCN 3.0 / 4.0
TARGET_GFX801,
TARGET_GFX802,
TARGET_GFX803,
TARGET_GFX805,
TARGET_GFX810,
// GCN 5.0
TARGET_GFX900,
TARGET_GFX902,
TARGET_GFX904,
// GCN 5.1
TARGET_GFX906,
// ???
TARGET_GFX909,
TARGET_GFX90C,
/// RDNA (Radeon DNA)
/// -----------------
// RDNA1
TARGET_GFX1010,
TARGET_GFX1011,
TARGET_GFX1012,
// RDNA2
TARGET_GFX1013, // Oberon
TARGET_GFX1030,
TARGET_GFX1031,
TARGET_GFX1032,
TARGET_GFX1033,
TARGET_GFX1034,
TARGET_GFX1035, // ??
TARGET_GFX1036, // ??
// RDNA3
TARGET_GFX1100,
TARGET_GFX1101,
TARGET_GFX1102,
TARGET_GFX1103, // ???
// RDNA3.5
TARGET_GFX1150, // Strix Point
TARGET_GFX1151, // Strix Halo / Strix Point
TARGET_GFX1152, // Krackan Point
TARGET_GFX1153, // ???
// RDNA4
TARGET_GFX1200,
TARGET_GFX1201,
TARGET_GFX1250, // ???
TARGET_GFX1251, // ???
/// CDNA (Compute DNA)
/// ------------------
// CDNA
TARGET_GFX908,
// CDNA2
TARGET_GFX90A,
// CDNA3
TARGET_GFX942,
// CDNA4
TARGET_GFX950
};
#define CHECK_UARCH_START if (false) {}
#define CHECK_UARCH(arch, chip_, str, uarch, process) \
else if (arch->chip == chip_) fill_uarch(arch, str, uarch, process);
#define CHECK_UARCH_END else { if(arch->chip != CHIP_UNKNOWN_HSA) printBug("map_chip_to_uarch_hsa: Unknown chip id: %d", arch->chip); fill_uarch(arch, STRING_UNKNOWN, UARCH_UNKNOWN, UNK); }
void fill_uarch(struct uarch* arch, char const *str, MICROARCH u, uint32_t process) {
arch->chip_str = (char *) emalloc(sizeof(char) * (strlen(str)+1));
strcpy(arch->chip_str, str);
arch->uarch = u;
arch->process = process;
}
// On chiplet based chips (such as Navi31, Navi32, etc),
// we have 2 different processes: The MCD process and the
// rest of the chip process. They might be different and here
// we just take one - let's take MCD process for now.
//
// TODO: Should we differentiate?
void map_chip_to_uarch_hsa(struct uarch* arch) {
CHECK_UARCH_START
// RDNA
CHECK_UARCH(arch, CHIP_NAVI_10, "Navi 10", UARCH_RDNA, 7)
CHECK_UARCH(arch, CHIP_NAVI_12, "Navi 12", UARCH_RDNA, 7)
CHECK_UARCH(arch, CHIP_NAVI_14, "Navi 14", UARCH_RDNA, 7)
CHECK_UARCH(arch, CHIP_NAVI_21, "Navi 21", UARCH_RDNA2, 7)
CHECK_UARCH(arch, CHIP_NAVI_22, "Navi 22", UARCH_RDNA2, 7)
CHECK_UARCH(arch, CHIP_NAVI_23, "Navi 23", UARCH_RDNA2, 7)
CHECK_UARCH(arch, CHIP_NAVI_24, "Navi 24", UARCH_RDNA2, 6)
CHECK_UARCH(arch, CHIP_NAVI_31, "Navi 31", UARCH_RDNA3, 6)
CHECK_UARCH(arch, CHIP_NAVI_32, "Navi 32", UARCH_RDNA3, 6)
CHECK_UARCH(arch, CHIP_NAVI_33, "Navi 33", UARCH_RDNA3, 6)
CHECK_UARCH(arch, CHIP_NAVI_44, "Navi 44", UARCH_RDNA4, 4)
CHECK_UARCH(arch, CHIP_NAVI_48, "Navi 48", UARCH_RDNA4, 4)
// CDNA
// NOTE: We will not show chip name for CDNA, thus use empty str
CHECK_UARCH(arch, CHIP_ARCTURUS, "", UARCH_CDNA, 7)
CHECK_UARCH(arch, CHIP_ALDEBARAN, "", UARCH_CDNA2, 6)
CHECK_UARCH(arch, CHIP_AQUA_VANJARAM, "", UARCH_CDNA3, 6)
CHECK_UARCH(arch, CHIP_CDNA_NEXT, "", UARCH_CDNA4, 6) // big difference between MCD and rest of the chip process
CHECK_UARCH_END
}
#define CHECK_TGT_START if (false) {}
#define CHECK_TGT(target, llvm_target, chip) \
else if (target == llvm_target) return chip;
#define CHECK_TGT_END else { printBug("LLVM target '%d' has no matching chip", target); return CHIP_UNKNOWN_HSA; }
// We have at least 2 choices to infer the chip:
//
// - LLVM target (e.g., gfx1101 is Navi 32)
// - PCI ID (e.g., 0x7470 is Navi 32)
//
// For now we will use the first approach, which seems to have
// some issues like mentioned in the enum.
// However PCI detection is also not perfect, since it is
// quite hard to find PCI ids from old hardware.
GPUCHIP get_chip_from_target_hsa(int32_t target) {
CHECK_TGT_START
/// RDNA
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX1010, CHIP_NAVI_10)
CHECK_TGT(target, TARGET_GFX1011, CHIP_NAVI_12)
CHECK_TGT(target, TARGET_GFX1012, CHIP_NAVI_14)
// CHECK_TGT(target, TARGET_GFX1013, TODO)
/// RDNA2
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX1030, CHIP_NAVI_21)
CHECK_TGT(target, TARGET_GFX1031, CHIP_NAVI_22)
CHECK_TGT(target, TARGET_GFX1032, CHIP_NAVI_23)
CHECK_TGT(target, TARGET_GFX1033, CHIP_NAVI_21)
CHECK_TGT(target, TARGET_GFX1034, CHIP_NAVI_24)
// CHECK_TGT(target, TARGET_GFX1035, TODO)
// CHECK_TGT(target, TARGET_GFX1036, TODO)
/// RDNA3
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX1100, CHIP_NAVI_31)
CHECK_TGT(target, TARGET_GFX1101, CHIP_NAVI_32)
CHECK_TGT(target, TARGET_GFX1102, CHIP_NAVI_33)
// CHECK_TGT(target, TARGET_GFX1103, TODO)
/// RDNA3.5
/// -------------------------------------------
// CHECK_TGT(target, TARGET_GFX1150, TODO)
// CHECK_TGT(target, TARGET_GFX1151, TODO)
// CHECK_TGT(target, TARGET_GFX1152, TODO)
// CHECK_TGT(target, TARGET_GFX1153, TODO)
/// RDNA4
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX1200, CHIP_NAVI_44)
CHECK_TGT(target, TARGET_GFX1201, CHIP_NAVI_48)
// CHECK_TGT(target, TARGET_GFX1250, TODO)
// CHECK_TGT(target, TARGET_GFX1251, TODO)
/// CDNA
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX908, CHIP_ARCTURUS)
/// CDNA2
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX90A, CHIP_ALDEBARAN)
/// CDNA3
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX942, CHIP_AQUA_VANJARAM)
/// CDNA4
/// -------------------------------------------
CHECK_TGT(target, TARGET_GFX950, CHIP_CDNA_NEXT)
CHECK_TGT_END
}
#define CHECK_TGT_STR_START if (false) {}
#define CHECK_TGT_STR(target, llvm_target, chip) \
else if (strcmp(target, llvm_target) == 0) return chip;
#define CHECK_TGT_STR_END else { return TARGET_UNKNOWN_HSA; }
// Maps the LLVM target string to the enum value
int32_t get_llvm_target_from_str(char* target) {
// TODO: Autogenerate this
// TODO: Add all, not only the ones we support in get_chip_from_target_hsa
CHECK_TGT_STR_START
CHECK_TGT_STR(target, "gfx1010", TARGET_GFX1010)
CHECK_TGT_STR(target, "gfx1011", TARGET_GFX1011)
CHECK_TGT_STR(target, "gfx1012", TARGET_GFX1012)
CHECK_TGT_STR(target, "gfx1013", TARGET_GFX1013)
CHECK_TGT_STR(target, "gfx1030", TARGET_GFX1030)
CHECK_TGT_STR(target, "gfx1031", TARGET_GFX1031)
CHECK_TGT_STR(target, "gfx1032", TARGET_GFX1032)
CHECK_TGT_STR(target, "gfx1033", TARGET_GFX1033)
CHECK_TGT_STR(target, "gfx1034", TARGET_GFX1034)
CHECK_TGT_STR(target, "gfx1035", TARGET_GFX1035)
CHECK_TGT_STR(target, "gfx1036", TARGET_GFX1036)
CHECK_TGT_STR(target, "gfx1100", TARGET_GFX1100)
CHECK_TGT_STR(target, "gfx1101", TARGET_GFX1101)
CHECK_TGT_STR(target, "gfx1102", TARGET_GFX1102)
CHECK_TGT_STR(target, "gfx1103", TARGET_GFX1103)
CHECK_TGT_STR(target, "gfx1200", TARGET_GFX1200)
CHECK_TGT_STR(target, "gfx1201", TARGET_GFX1201)
CHECK_TGT_STR(target, "gfx1250", TARGET_GFX1250)
CHECK_TGT_STR(target, "gfx1251", TARGET_GFX1251)
CHECK_TGT_STR(target, "gfx908", TARGET_GFX908)
CHECK_TGT_STR(target, "gfx90a", TARGET_GFX90A)
CHECK_TGT_STR(target, "gfx942", TARGET_GFX942)
CHECK_TGT_STR(target, "gfx950", TARGET_GFX950)
CHECK_TGT_STR_END
}
struct uarch* get_uarch_from_hsa(struct gpu_info* gpu, char* gpu_name) {
struct uarch* arch = (struct uarch*) emalloc(sizeof(struct uarch));
arch->llvm_target = get_llvm_target_from_str(gpu_name);
if (arch->llvm_target == TARGET_UNKNOWN_HSA) {
printErr("Unknown LLVM target: '%s'", gpu_name);
return NULL;
}
arch->chip_str = NULL;
arch->chip = get_chip_from_target_hsa(arch->llvm_target);
map_chip_to_uarch_hsa(arch);
return arch;
}
bool is_uarch_valid(struct uarch* arch) {
if (arch == NULL) {
printBug("Invalid uarch: arch is NULL");
return false;
}
if (arch->uarch >= UARCH_UNKNOWN && arch->uarch <= UARCH_CDNA4) {
return true;
}
else {
printBug("Invalid uarch: %d", arch->uarch);
return false;
}
}
bool is_cdna(struct uarch* arch) {
return arch->uarch == UARCH_CDNA ||
arch->uarch == UARCH_CDNA2 ||
arch->uarch == UARCH_CDNA3 ||
arch->uarch == UARCH_CDNA4;
}
char* get_str_chip(struct uarch* arch) {
// We dont want to show CDNA chip names as they add
// no value, since each architecture maps one to one
// to a chip.
if (is_cdna(arch)) return NULL;
return arch->chip_str;
}
const char* get_str_uarch_hsa(struct uarch* arch) {
if (!is_uarch_valid(arch)) {
return NULL;
}
return uarch_str[arch->uarch];
}

13
src/hsa/uarch.hpp Normal file
View File

@@ -0,0 +1,13 @@
#ifndef __HSA_UARCH__
#define __HSA_UARCH__
#include "../common/gpu.hpp"
struct uarch;
struct uarch* get_uarch_from_hsa(struct gpu_info* gpu, char* gpu_name);
char* get_str_uarch_hsa(struct uarch* arch);
char* get_str_process(struct uarch* arch); // TODO: Shouldnt we define this in the cpp?
char* get_str_chip(struct uarch* arch);
#endif

View File

@@ -59,13 +59,18 @@ enum {
CHIP_HD_P630, CHIP_HD_P630,
CHIP_IRISP_640, CHIP_IRISP_640,
CHIP_IRISP_650, CHIP_IRISP_650,
CHIP_UHD_KBL_GT1,
CHIP_UHD_KBL_GT2,
// Gen11 // Gen11
CHIP_UHD_G1, CHIP_UHD_G1,
CHIP_IRISP_G4, CHIP_IRISP_G4,
CHIP_IRISP_G7, CHIP_IRISP_G7,
// Gen12 // Gen12
CHIP_UHD_730, CHIP_UHD_710,
CHIP_UHD_730_ALD,
CHIP_UHD_730_RKL,
CHIP_UHD_750, CHIP_UHD_750,
CHIP_UHD_770,
CHIP_XE_G4, CHIP_XE_G4,
CHIP_XE_G7 CHIP_XE_G7
}; };

View File

@@ -9,7 +9,13 @@
#include "../common/global.hpp" #include "../common/global.hpp"
int64_t get_peak_performance_intel(struct gpu_info* gpu) { int64_t get_peak_performance_intel(struct gpu_info* gpu) {
if(gpu->topo_i->eu_subslice < 0 || gpu->topo_i->subslices < 0) return -1; // Check that we have valid data
if(gpu->topo_i->eu_subslice < 0 ||
gpu->topo_i->subslices < 0 ||
gpu->freq <= 0)
{
return -1;
}
return gpu->freq * 1000000 * gpu->topo_i->eu_subslice * gpu->topo_i->subslices * 8 * 2; return gpu->freq * 1000000 * gpu->topo_i->eu_subslice * gpu->topo_i->subslices * 8 * 2;
} }
@@ -20,6 +26,7 @@ struct gpu_info* get_gpu_info_intel(struct pci_dev *devices) {
if(gpu->pci == NULL) { if(gpu->pci == NULL) {
// No Intel iGPU found in PCI, which means it is not present // No Intel iGPU found in PCI, which means it is not present
printWarn("Unable to find a valid device for vendor id 0x%.4X using pciutils", PCI_VENDOR_ID_INTEL);
return NULL; return NULL;
} }

View File

@@ -8,12 +8,13 @@
#define CHECK_PCI_START if (false) {} #define CHECK_PCI_START if (false) {}
#define CHECK_PCI(pci, id, chip) \ #define CHECK_PCI(pci, id, chip) \
else if (pci->device_id == id) return chip; else if (pci->device_id == id) return chip;
#define CHECK_PCI_END else { printBug("Unkown Intel device id: 0x%.4X", pci->device_id); return CHIP_UNKNOWN_INTEL; } #define CHECK_PCI_END else { printBug("Unknown Intel device id: 0x%.4X", pci->device_id); return CHIP_UNKNOWN_INTEL; }
// TODO: Review wikipedia link to improve the LUT // TODO: Review wikipedia link to improve the LUT
/* /*
* https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units * https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units
* https://github.com/mesa3d/mesa/blob/main/include/pci_ids/iris_pci_ids.h * https://github.com/mesa3d/mesa/blob/main/include/pci_ids/iris_pci_ids.h
* https://raw.githubusercontent.com/smxi/inxi/master/inxi
*/ */
GPUCHIP get_chip_from_pci_intel(struct pci* pci) { GPUCHIP get_chip_from_pci_intel(struct pci* pci) {
CHECK_PCI_START CHECK_PCI_START
@@ -88,6 +89,7 @@ GPUCHIP get_chip_from_pci_intel(struct pci* pci) {
CHECK_PCI(pci, 0x3185, CHIP_UHD_600) CHECK_PCI(pci, 0x3185, CHIP_UHD_600)
CHECK_PCI(pci, 0x3184, CHIP_UHD_605) CHECK_PCI(pci, 0x3184, CHIP_UHD_605)
CHECK_PCI(pci, 0x5917, CHIP_UHD_620) CHECK_PCI(pci, 0x5917, CHIP_UHD_620)
CHECK_PCI(pci, 0x3EA0, CHIP_UHD_620)
CHECK_PCI(pci, 0x3E91, CHIP_UHD_630) CHECK_PCI(pci, 0x3E91, CHIP_UHD_630)
CHECK_PCI(pci, 0x3E92, CHIP_UHD_630) CHECK_PCI(pci, 0x3E92, CHIP_UHD_630)
CHECK_PCI(pci, 0x3E98, CHIP_UHD_630) CHECK_PCI(pci, 0x3E98, CHIP_UHD_630)
@@ -112,11 +114,16 @@ GPUCHIP get_chip_from_pci_intel(struct pci* pci) {
CHECK_PCI(pci, 0x8A51, CHIP_IRISP_G7) CHECK_PCI(pci, 0x8A51, CHIP_IRISP_G7)
CHECK_PCI(pci, 0x8A52, CHIP_IRISP_G7) CHECK_PCI(pci, 0x8A52, CHIP_IRISP_G7)
CHECK_PCI(pci, 0x8A53, CHIP_IRISP_G7) CHECK_PCI(pci, 0x8A53, CHIP_IRISP_G7)
// Gen12 // Xe (Gen12)
CHECK_PCI(pci, 0x4C8B, CHIP_UHD_730) CHECK_PCI(pci, 0x4693, CHIP_UHD_710)
CHECK_PCI(pci, 0x4C8B, CHIP_UHD_750) CHECK_PCI(pci, 0x4692, CHIP_UHD_730_ALD)
CHECK_PCI(pci, 0x4C8B, CHIP_UHD_730_RKL)
CHECK_PCI(pci, 0x4C8A, CHIP_UHD_750)
CHECK_PCI(pci, 0x4690, CHIP_UHD_770)
CHECK_PCI(pci, 0x4680, CHIP_UHD_770)
CHECK_PCI(pci, 0x9A78, CHIP_XE_G4) CHECK_PCI(pci, 0x9A78, CHIP_XE_G4)
CHECK_PCI(pci, 0x9A40, CHIP_XE_G7) // G7 may have 80 or 96 EUs CHECK_PCI(pci, 0x9A40, CHIP_XE_G7) // G7 may have 80 or 96 EUs
CHECK_PCI(pci, 0x9A49, CHIP_XE_G7) // Same for this G7 CHECK_PCI(pci, 0x9A49, CHIP_XE_G7) // Same for this G7
// TODO: Add generic generic UHD Graphics and Iris Xe Graphics from Mobile
CHECK_PCI_END CHECK_PCI_END
} }

View File

@@ -27,6 +27,7 @@
* Gen9.5: Kaby Lake * Gen9.5: Kaby Lake
* Gen11: Ice Lake (10th Gen) * Gen11: Ice Lake (10th Gen)
* Gen12: Rocket/Tiger Lake (11th Gen) * Gen12: Rocket/Tiger Lake (11th Gen)
* Gen12: Alder Lake (12th Gen)
*/ */
enum { enum {
UARCH_UNKNOWN, UARCH_UNKNOWN,
@@ -39,6 +40,7 @@ enum {
UARCH_GEN11, UARCH_GEN11,
UARCH_GEN12_RKL, UARCH_GEN12_RKL,
UARCH_GEN12_TGL, UARCH_GEN12_TGL,
UARCH_GEN12_ALD,
}; };
static const char *uarch_str[] = { static const char *uarch_str[] = {
@@ -50,13 +52,15 @@ static const char *uarch_str[] = {
/*[ARCH_GEN9] = */ "Gen9", /*[ARCH_GEN9] = */ "Gen9",
/*[ARCH_GEN9_5] = */ "Gen9.5", /*[ARCH_GEN9_5] = */ "Gen9.5",
/*[ARCH_GEN11] = */ "Gen11", /*[ARCH_GEN11] = */ "Gen11",
/*[ARCH_GEN12_RKL] = */ "Gen12" /*[ARCH_GEN12_RKL] = */ "Xe",
/*[ARCH_GEN12_TGL] = */ "Gen12" /*[ARCH_GEN12_TGL] = */ "Xe",
/*[ARCH_GEN12_ALD] = */ "Xe",
}; };
// Graphic Tiers (GT) // Graphic Tiers (GT)
enum { enum {
GT_UNKNOWN, GT_UNKNOWN,
GT0_5, // Saw that 0.5 thing in iris_pci_ids.h
GT1, GT1,
GT1_4, // GT1 with 4 EUs GT1_4, // GT1 with 4 EUs
GT1_5, GT1_5,
@@ -68,6 +72,7 @@ enum {
static const char *gt_str[] = { static const char *gt_str[] = {
/*[GT_UNKNOWN] = */ STRING_UNKNOWN, /*[GT_UNKNOWN] = */ STRING_UNKNOWN,
/*[GT0_5] = */ "GT0.5",
/*[GT1] = */ "GT1", /*[GT1] = */ "GT1",
/*[GT1_4] = */ "GT1", /*[GT1_4] = */ "GT1",
/*[GT1_5] = */ "GT1.5", /*[GT1_5] = */ "GT1.5",
@@ -85,6 +90,8 @@ static const char *gt_str[] = {
#define CHECK_TOPO_START if (false) {} #define CHECK_TOPO_START if (false) {}
#define CHECK_TOPO(topo, arch, uarch_, gt_, eu_sub, sub, sli) \ #define CHECK_TOPO(topo, arch, uarch_, gt_, eu_sub, sub, sli) \
else if(arch->uarch == uarch_ && arch->gt == gt_) fill_topo(topo, eu_sub, sub, sli); else if(arch->uarch == uarch_ && arch->gt == gt_) fill_topo(topo, eu_sub, sub, sli);
#define CHECK_TOPO_CHIP(topo, arch, uarch_, chip_, eu_sub, sub, sli) \
else if(arch->uarch == uarch_ && arch->chip == chip_) fill_topo(topo, eu_sub, sub, sli);
#define CHECK_TOPO_END else { printBug("get_topology_info: Invalid uarch and gt combination: '%s' and '%s'", arch->chip_str, get_str_gt(arch)); fill_topo(topo, UNK, UNK, UNK); } #define CHECK_TOPO_END else { printBug("get_topology_info: Invalid uarch and gt combination: '%s' and '%s'", arch->chip_str, get_str_gt(arch)); fill_topo(topo, UNK, UNK, UNK); }
void fill_topo(struct topology_i* topo_i, int32_t eu_sub, int32_t sub, int32_t sli) { void fill_topo(struct topology_i* topo_i, int32_t eu_sub, int32_t sub, int32_t sli) {
@@ -143,6 +150,8 @@ void map_chip_to_uarch_intel(struct uarch* arch) {
CHECK_UARCH(arch, CHIP_UHD_605, "UHD Graphics 605", UARCH_GEN9_5, GT1_5, 14) CHECK_UARCH(arch, CHIP_UHD_605, "UHD Graphics 605", UARCH_GEN9_5, GT1_5, 14)
CHECK_UARCH(arch, CHIP_UHD_620, "UHD Graphics 620", UARCH_GEN9_5, GT2, 14) CHECK_UARCH(arch, CHIP_UHD_620, "UHD Graphics 620", UARCH_GEN9_5, GT2, 14)
CHECK_UARCH(arch, CHIP_UHD_630, "UHD Graphics 630", UARCH_GEN9_5, GT2, 14) CHECK_UARCH(arch, CHIP_UHD_630, "UHD Graphics 630", UARCH_GEN9_5, GT2, 14)
CHECK_UARCH(arch, CHIP_UHD_KBL_GT1, "UHD Graphics", UARCH_GEN9_5, GT1, 14)
CHECK_UARCH(arch, CHIP_UHD_KBL_GT2, "UHD Graphics", UARCH_GEN9_5, GT2, 14)
CHECK_UARCH(arch, CHIP_HD_610, "HD Graphics 610", UARCH_GEN9_5, GT1, 14) CHECK_UARCH(arch, CHIP_HD_610, "HD Graphics 610", UARCH_GEN9_5, GT1, 14)
CHECK_UARCH(arch, CHIP_HD_615, "HD Graphics 615", UARCH_GEN9_5, GT2, 14) CHECK_UARCH(arch, CHIP_HD_615, "HD Graphics 615", UARCH_GEN9_5, GT2, 14)
CHECK_UARCH(arch, CHIP_HD_630, "HD Graphics 630", UARCH_GEN9_5, GT2, 14) CHECK_UARCH(arch, CHIP_HD_630, "HD Graphics 630", UARCH_GEN9_5, GT2, 14)
@@ -153,8 +162,11 @@ void map_chip_to_uarch_intel(struct uarch* arch) {
CHECK_UARCH(arch, CHIP_UHD_G1, "UHD Graphics G1", UARCH_GEN11, GT1, 10) CHECK_UARCH(arch, CHIP_UHD_G1, "UHD Graphics G1", UARCH_GEN11, GT1, 10)
CHECK_UARCH(arch, CHIP_IRISP_G4, "Iris Plus Graphics G4", UARCH_GEN11, GT1_5, 10) CHECK_UARCH(arch, CHIP_IRISP_G4, "Iris Plus Graphics G4", UARCH_GEN11, GT1_5, 10)
CHECK_UARCH(arch, CHIP_IRISP_G7, "Iris Plus Graphics G7", UARCH_GEN11, GT2, 10) CHECK_UARCH(arch, CHIP_IRISP_G7, "Iris Plus Graphics G7", UARCH_GEN11, GT2, 10)
// Gen12 // Xe (Gen12)
CHECK_UARCH(arch, CHIP_UHD_730, "UHD Graphics 730", UARCH_GEN12_RKL, GT1, 14) CHECK_UARCH(arch, CHIP_UHD_710, "UHD Graphics 710", UARCH_GEN12_ALD, GT1, 10)
CHECK_UARCH(arch, CHIP_UHD_730_ALD, "UHD Graphics 730", UARCH_GEN12_ALD, GT1, 10)
CHECK_UARCH(arch, CHIP_UHD_770, "UHD Graphics 770", UARCH_GEN12_ALD, GT1, 10)
CHECK_UARCH(arch, CHIP_UHD_730_RKL, "UHD Graphics 730", UARCH_GEN12_RKL, GT1, 14)
CHECK_UARCH(arch, CHIP_UHD_750, "UHD Graphics 750", UARCH_GEN12_RKL, GT1, 14) CHECK_UARCH(arch, CHIP_UHD_750, "UHD Graphics 750", UARCH_GEN12_RKL, GT1, 14)
CHECK_UARCH(arch, CHIP_XE_G4, "Iris Xe G4", UARCH_GEN12_TGL, GT2, 10) CHECK_UARCH(arch, CHIP_XE_G4, "Iris Xe G4", UARCH_GEN12_TGL, GT2, 10)
CHECK_UARCH(arch, CHIP_XE_G7, "Iris Xe G7", UARCH_GEN12_TGL, GT2, 10) CHECK_UARCH(arch, CHIP_XE_G7, "Iris Xe G7", UARCH_GEN12_TGL, GT2, 10)
@@ -201,6 +213,8 @@ char* get_name_from_uarch(struct uarch* arch) {
* Gen9.5: https://en.wikichip.org/wiki/intel/microarchitectures/gen9.5#Configuration * Gen9.5: https://en.wikichip.org/wiki/intel/microarchitectures/gen9.5#Configuration
* Also: https://www.techpowerup.com/gpu-specs/intel-rocket-lake-gt1.g993 * Also: https://www.techpowerup.com/gpu-specs/intel-rocket-lake-gt1.g993
https://www.techpowerup.com/gpu-specs/?architecture=Generation%2012.1
https://elixir.bootlin.com/linux/latest/source/include/drm/i915_pciids.h
*/ */
struct topology_i* get_topology_info(struct uarch* arch) { struct topology_i* get_topology_info(struct uarch* arch) {
struct topology_i* topo = (struct topology_i*) emalloc(sizeof(struct topology_i)); struct topology_i* topo = (struct topology_i*) emalloc(sizeof(struct topology_i));
@@ -238,9 +252,13 @@ struct topology_i* get_topology_info(struct uarch* arch) {
CHECK_TOPO(topo, arch, UARCH_GEN11, GT1, 8, 4, 1) CHECK_TOPO(topo, arch, UARCH_GEN11, GT1, 8, 4, 1)
CHECK_TOPO(topo, arch, UARCH_GEN11, GT1_5, 8, 6, 1) CHECK_TOPO(topo, arch, UARCH_GEN11, GT1_5, 8, 6, 1)
CHECK_TOPO(topo, arch, UARCH_GEN11, GT2, 8, 8, 1) CHECK_TOPO(topo, arch, UARCH_GEN11, GT2, 8, 8, 1)
// Gen12 // Xe (Gen12)
CHECK_TOPO(topo, arch, UARCH_GEN12_RKL, GT1, 16, 2, 1) // NOTE: Instead of checking for uarch + graphics tier,
else if(arch->uarch == UARCH_GEN12_TGL && arch->gt == GT2) { // we have to check for uarch + exact chip
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_RKL, CHIP_UHD_730_RKL, 8, 3, 1)
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_RKL, CHIP_UHD_750, 8, 4, 1)
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_TGL, CHIP_XE_G4, 8, 6, 1)
else if(arch->uarch == UARCH_GEN12_TGL && arch->chip == CHIP_XE_G7) {
// Special case: TigerLake GT2 needs to check if is i5/i7 to know the exact topology // Special case: TigerLake GT2 needs to check if is i5/i7 to know the exact topology
if(is_corei5()) { if(is_corei5()) {
fill_topo(topo, 10, 8, 1); // Should be 80 EUs, but not sure about the organization fill_topo(topo, 10, 8, 1); // Should be 80 EUs, but not sure about the organization
@@ -249,6 +267,10 @@ struct topology_i* get_topology_info(struct uarch* arch) {
fill_topo(topo, 16, 6, 1); fill_topo(topo, 16, 6, 1);
} }
} }
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_ALD, CHIP_UHD_710, 8, 2, 1)
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_ALD, CHIP_UHD_730_ALD, 8, 3, 1)
CHECK_TOPO_CHIP(topo, arch, UARCH_GEN12_ALD, CHIP_UHD_770, 8, 4, 1)
// TODO: Add ALD UHD Graphics/Xe Graphics
CHECK_TOPO_END CHECK_TOPO_END
return topo; return topo;
} }