C++ API
ONNX Runtime GenAI C++ API
Section titled “ONNX Runtime GenAI C++ API”Note: this API is in preview and is subject to change.
Overview
Section titled “Overview”This document describes the C++ API for ONNX Runtime GenAI.
Below are the main classes and methods, with code snippets and descriptions for each.
OgaModel
Section titled “OgaModel”Create
Section titled “Create”Creates a model from a configuration directory, with optional runtime settings or config object.
auto model = OgaModel::Create("path/to/model_dir");auto model2 = OgaModel::Create("path/to/model_dir", *settings);auto model3 = OgaModel::Create(*config);GetType
Section titled “GetType”Gets the type of the model.
auto type = model->GetType();GetDeviceType
Section titled “GetDeviceType”Gets the device type used by the model.
auto device_type = model->GetDeviceType();OgaConfig
Section titled “OgaConfig”Create
Section titled “Create”Creates a configuration object from a config path.
auto config = OgaConfig::Create("path/to/model_dir");ClearProviders
Section titled “ClearProviders”Clears all providers from the configuration.
config->ClearProviders();AppendProvider
Section titled “AppendProvider”Appends a provider to the configuration.
config->AppendProvider("CUDAExecutionProvider");SetProviderOption
Section titled “SetProviderOption”Sets a provider option in the configuration.
config->SetProviderOption("CUDAExecutionProvider", "device_id", "0");Overlay
Section titled “Overlay”Overlays a JSON string onto the configuration.
config->Overlay("{\"option\": \"value\"}");OgaRuntimeSettings
Section titled “OgaRuntimeSettings”Create
Section titled “Create”Creates a runtime settings object.
auto settings = OgaRuntimeSettings::Create();SetHandle
Section titled “SetHandle”Sets a named handle in the runtime settings.
settings->SetHandle("custom_handle", handle_ptr);OgaTokenizer
Section titled “OgaTokenizer”Create
Section titled “Create”Creates a tokenizer for the given model.
auto tokenizer = OgaTokenizer::Create(*model);Encode
Section titled “Encode”Encodes a string and adds the encoded sequence of tokens to the provided OgaSequences.
auto sequences = OgaSequences::Create();tokenizer->Encode("Hello world", *sequences);EncodeBatch
Section titled “EncodeBatch”Encodes a batch of strings.
const char* texts[] = {"Hello", "World"};auto tensor = tokenizer->EncodeBatch(texts, 2);ToTokenId
Section titled “ToTokenId”Converts a string to its corresponding token ID.
int32_t token_id = tokenizer->ToTokenId("Hello");Decode
Section titled “Decode”Decodes a sequence of tokens into a string.
auto str = tokenizer->Decode(tokens, token_count);ApplyChatTemplate
Section titled “ApplyChatTemplate”Applies a chat template to messages and tools.
auto templated = tokenizer->ApplyChatTemplate("template", "messages", "tools", true);DecodeBatch
Section titled “DecodeBatch”Decodes a batch of token sequences.
auto decoded = tokenizer->DecodeBatch(*tensor);OgaTokenizerStream
Section titled “OgaTokenizerStream”Create
Section titled “Create”Creates a tokenizer stream for incremental decoding.
auto stream = OgaTokenizerStream::Create(*tokenizer);Decode
Section titled “Decode”Decodes a single token in the stream. If this results in a word being generated, it will be returned.
const char* chunk = stream->Decode(token);OgaSequences
Section titled “OgaSequences”Create
Section titled “Create”Creates an empty OgaSequences object.
auto sequences = OgaSequences::Create();Returns the number of sequences.
size_t n = sequences->Count();SequenceCount
Section titled “SequenceCount”Returns the number of tokens in the sequence at the given index.
size_t tokens = sequences->SequenceCount(0);SequenceData
Section titled “SequenceData”Returns a pointer to the token data for the sequence at the given index.
const int32_t* data = sequences->SequenceData(0);Append
Section titled “Append”Appends a sequence of tokens or a single token to the sequences.
sequences->Append(tokens, token_count);sequences->Append(token, sequence_index);OgaGeneratorParams
Section titled “OgaGeneratorParams”Create
Section titled “Create”Creates generator parameters for the given model.
auto params = OgaGeneratorParams::Create(*model);SetSearchOption
Section titled “SetSearchOption”Sets a numeric search option.
params->SetSearchOption("max_length", 128);SetSearchOptionBool
Section titled “SetSearchOptionBool”Sets a boolean search option.
params->SetSearchOptionBool("do_sample", true);SetModelInput
Section titled “SetModelInput”Sets an additional model input.
params->SetModelInput("input_name", *tensor);SetInputs
Section titled “SetInputs”Sets named tensors as inputs.
params->SetInputs(*named_tensors);SetGuidance
Section titled “SetGuidance”Sets guidance data.
params->SetGuidance("type", "data");OgaGenerator
Section titled “OgaGenerator”Create
Section titled “Create”Creates a generator from the given model and parameters.
auto generator = OgaGenerator::Create(*model, *params);IsDone
Section titled “IsDone”Checks if generation is complete.
bool done = generator->IsDone();AppendTokenSequences
Section titled “AppendTokenSequences”Appends token sequences to the generator.
generator->AppendTokenSequences(*sequences);AppendTokens
Section titled “AppendTokens”Appends tokens to the generator.
generator->AppendTokens(tokens, token_count);IsSessionTerminated
Section titled “IsSessionTerminated”Checks if the session is terminated.
bool terminated = generator->IsSessionTerminated();GenerateNextToken
Section titled “GenerateNextToken”Generates the next token.
generator->GenerateNextToken();RewindTo
Section titled “RewindTo”Rewinds the sequence to a new length.
generator->RewindTo(new_length);SetRuntimeOption
Section titled “SetRuntimeOption”Sets a runtime option.
generator->SetRuntimeOption("terminate_session", "1");GetSequenceCount
Section titled “GetSequenceCount”Returns the number of tokens in the sequence at the given index.
size_t count = generator->GetSequenceCount(0);GetSequenceData
Section titled “GetSequenceData”Returns a pointer to the sequence data at the given index.
const int32_t* data = generator->GetSequenceData(0);GetOutput
Section titled “GetOutput”Gets a named output tensor.
auto tensor = generator->GetOutput("output_name");GetLogits
Section titled “GetLogits”Gets the logits tensor.
auto logits = generator->GetLogits();SetLogits
Section titled “SetLogits”Sets the logits tensor.
generator->SetLogits(*tensor);SetActiveAdapter
Section titled “SetActiveAdapter”Sets the active adapter for the generator.
generator->SetActiveAdapter(*adapters, "adapter_name");OgaTensor
Section titled “OgaTensor”Create
Section titled “Create”Creates a tensor from a buffer.
auto tensor = OgaTensor::Create(data, shape, shape_dims_count, element_type);Returns the element type of the tensor.
auto type = tensor->Type();Returns the shape of the tensor.
auto shape = tensor->Shape();Returns a pointer to the tensor data.
void* data = tensor->Data();OgaImages
Section titled “OgaImages”Loads images from file paths or memory buffers.
std::vector<const char*> image_paths = {"img1.png", "img2.png"};auto images = OgaImages::Load(image_paths);
auto images2 = OgaImages::Load(image_data_ptrs, image_sizes, count);OgaAudios
Section titled “OgaAudios”Loads audios from file paths or memory buffers.
std::vector<const char*> audio_paths = {"audio1.wav", "audio2.wav"};auto audios = OgaAudios::Load(audio_paths);
auto audios2 = OgaAudios::Load(audio_data_ptrs, audio_sizes, count);OgaNamedTensors
Section titled “OgaNamedTensors”Create
Section titled “Create”Creates a named tensors object.
auto named_tensors = OgaNamedTensors::Create();Gets a tensor by name.
auto tensor = named_tensors->Get("input_name");Sets a tensor by name.
named_tensors->Set("input_name", *tensor);Delete
Section titled “Delete”Deletes a tensor by name.
named_tensors->Delete("input_name");Returns the number of named tensors.
size_t count = named_tensors->Count();GetNames
Section titled “GetNames”Gets the names of all tensors.
auto names = named_tensors->GetNames();OgaAdapters
Section titled “OgaAdapters”Create
Section titled “Create”Creates an adapters manager for the given model.
auto adapters = OgaAdapters::Create(*model);LoadAdapter
Section titled “LoadAdapter”Loads an adapter from file.
adapters->LoadAdapter("adapter_file_path", "adapter_name");UnloadAdapter
Section titled “UnloadAdapter”Unloads an adapter by name.
adapters->UnloadAdapter("adapter_name");OgaMultiModalProcessor
Section titled “OgaMultiModalProcessor”Create
Section titled “Create”Creates a multi-modal processor for the given model.
auto processor = OgaMultiModalProcessor::Create(*model);ProcessImages
Section titled “ProcessImages”Processes images and returns named tensors.
auto named_tensors = processor->ProcessImages("prompt", images.get());ProcessAudios
Section titled “ProcessAudios”Processes audios and returns named tensors.
auto named_tensors = processor->ProcessAudios(audios.get());ProcessImagesAndAudios
Section titled “ProcessImagesAndAudios”Processes both images and audios.
auto named_tensors = processor->ProcessImagesAndAudios("prompt", images.get(), audios.get());Decode
Section titled “Decode”Decodes a sequence of tokens into a string.
auto str = processor->Decode(tokens, token_count);OgaHandle
Section titled “OgaHandle”Constructor / Destructor
Section titled “Constructor / Destructor”Initializes and shuts down the global Oga runtime.
OgaHandle handle;Oga Utility Functions
Section titled “Oga Utility Functions”SetLogBool
Section titled “SetLogBool”Sets a boolean logging option.
Oga::SetLogBool("option_name", true);SetLogString
Section titled “SetLogString”Sets a string logging option.
Oga::SetLogString("option_name", "value");SetCurrentGpuDeviceId
Section titled “SetCurrentGpuDeviceId”Sets the current GPU device ID.
Oga::SetCurrentGpuDeviceId(0);GetCurrentGpuDeviceId
Section titled “GetCurrentGpuDeviceId”Gets the current GPU device ID.
int id = Oga::GetCurrentGpuDeviceId();