AMD - ROCm

ROCm実行プロバイダー

ROCm実行プロバイダーは、AMD ROCm対応GPU上でハードウェア加速計算を可能にします。

注意 ROCm 7.1以降、MicrosoftによるROCm実行プロバイダーのサポートは終了します。

アプリケーションをMIGraphX実行プロバイダーの使用に移行してください。

ROCm 7.0は、このプロバイダーの最後の公式AMD サポート配布版であり、今後のすべてのビルド（ROCm 7.1以降）ではROCm EPが削除されます。

背景については、このプルリクエストを参照してください。

インストール

注意 PyTorchバージョンで指定されている適切なバージョンのPytorchをインストールしてください。

ナイトリーPyTorchビルドについては、Pytorchホームを参照し、コンピュートプラットフォームとしてROCmを選択してください。

ROCm EPを含むONNX Runtimeのプリビルドバイナリは、ほとんどの言語バインディング用に公開されています。ORT のインストールを参照してください。

ソースからのビルド

ビルド手順については、ビルドページを参照してください。プリビルド.whlファイルは以下の要件セクションで提供され、repo.radeon.comでホストされています。Ubuntuベースのdocker開発環境は、Dockerサポートセクションで提供されています。新しいwheelとdockerは各ROCmリリースで公開されます。

要件

以下は、Ubuntuビルドに対応するサポートされているROCmバージョンのマトリックスです。

ROCm 6.0.2以降、プリビルドPython Wheels（.whl）のリンクは、Ubuntuサポートに基づくホストOSのPythonバージョンに対応して以下にリンクされています。すべてのリンクは、各ROCmリリースに対応するAMDのrepo.radeon manylinuxページで見つけることができます。

ROCm 7.0は、ROCm実行プロバイダーを含む最後の公式サポートAMDリリースです。代わりにアプリケーションにはMIGraphX実行プロバイダーを使用してください。

ONNX Runtime Version	ROCm Version	Python 3.8	Python 3.9	Python 3.10	Python 3.12

ONNX Runtime Version	MIGraphX ROCm Release	Python 3.8	Python 3.9	Python 3.10	Python 3.12
---	---	---	---	---	---
1.22.1	7.0			3.10	3.12
1.21	6.4.4		3.9	3.10	3.12
1.21	6.4.3		3.9	3.10	3.12
1.21	6.4.2		3.9	3.10	3.12
1.21	6.4.1		3.9	3.10	3.12
1.21	6.4			3.10	3.12
1.19	6.3.1			3.10	3.12
1.19	6.3			3.10	3.12
1.18	6.2.4			3.10
1.18	6.2.3			3.10
1.18	6.2	3.8		3.10
1.17	6.1.3			3.10
1.17	6.1	3.8		3.10
1.17	6.0.2			3.10
1.17	6.0 5.7
1.16	5.6 5.5 5.4.2
1.15	5.4.2 5.4 5.3.2
1.14	5.4 5.3.2
1.13	5.4 5.3.2
1.12	5.2.3 5.2

Dockerサポート

シンプルなワークロードやプロトタイピング用に、AMDは最新のROCmリリースとサポートされているROCm-Pytorchビルドを使用したUbuntuベースのDockerイメージを作成しています。ROCM Dockerhubで見つけることができます。

この目的は、ユーザーがPythonでカスタムワークロードを迅速に開始できるようにし、Onnxruntimeをビルドする必要なく開始するために必要なプリビルドROCm、Onnxruntime、MIGraphXパッケージの環境を提供することです。

設定オプション

ROCm実行プロバイダーは以下の設定オプションをサポートしています。

device_id

デバイスID。

デフォルト値：0

tunable_op_enable

TunableOpを使用するように設定します。

デフォルト値：false

tunable_op_tuning_enable

TunableOpがオンラインチューニングを試行するように設定します。

デフォルト値：false

user_compute_stream

Defines the compute stream for the inference to run on. It implicitly sets the has_user_compute_stream option. It cannot be set through UpdateROCMProviderOptions. This cannot be used in combination with an external allocator.

Example python usage:

providers = [("ROCMExecutionProvider", {"device_id": torch.cuda.current_device(),
                                        "user_compute_stream": str(torch.cuda.current_stream().cuda_stream)})]
sess_options = ort.SessionOptions()
sess = ort.InferenceSession("my_model.onnx", sess_options=sess_options, providers=providers)

To take advantage of user compute stream, it is recommended to use I/O Binding to bind inputs and outputs to tensors in device.

do_copy_in_default_stream

Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are race conditions and possibly better performance.

Default value: true

gpu_mem_limit

The size limit of the device memory arena in bytes. This size limit is only for the execution provider’s arena. The total device memory usage may be higher. s: max value of C++ size_t type (effectively unlimited)

Note: Will be over-ridden by contents of default_memory_arena_cfg (if specified)

arena_extend_strategy

The strategy for extending the device memory arena.

Value	Description
kNextPowerOfTwo (0)	subsequent extensions extend by larger amounts (multiplied by powers of two)
kSameAsRequested (1)	extend by the requested amount

Default value: kNextPowerOfTwo

Note: Will be over-ridden by contents of default_memory_arena_cfg (if specified)

gpu_external_[alloc|free|empty_cache]

gpu_external_* is used to pass external allocators. Example python usage:

from onnxruntime.training.ortmodule.torch_cpp_extensions import torch_gpu_allocator

provider_option_map["gpu_external_alloc"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_alloc_address())
provider_option_map["gpu_external_free"] = str(torch_gpu_allocator.gpu_caching_allocator_raw_delete_address())
provider_option_map["gpu_external_empty_cache"] = str(torch_gpu_allocator.gpu_caching_allocator_empty_cache_address())

Default value: 0

使用方法

C/C++

Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
Ort::SessionOptions so;
int device_id = 0;
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ROCm(so, device_id));

C APIの詳細はこちらにあります。

Python

Python APIの詳細はこちらにあります。

サンプル

Python

import onnxruntime as ort

model_path = '<path to model>'

providers = [
    'ROCMExecutionProvider',
    'CPUExecutionProvider',
]

session = ort.InferenceSession(model_path, providers=providers)

AMD - ROCm

ROCm実行プロバイダー

目次

インストール

ソースからのビルド

要件

Dockerサポート

設定オプション

device_id

tunable_op_enable

tunable_op_tuning_enable

user_compute_stream

do_copy_in_default_stream

gpu_mem_limit

arena_extend_strategy

gpu_external_[alloc|free|empty_cache]

使用方法

C/C++

Python

サンプル

Python