This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Protocol Reference

Low-level technical documentation of the Dapr runtime protocols and internal mechanics for each building block.

This section provides a deep dive into the internal workings of the Dapr runtime. It is intended for maintainers, contributors, and anyone interested in the low-level implementation details of Dapr’s building blocks.

Unlike the user-facing API reference, these documents focus on:

  • How the runtime processes requests.
  • Internal state transitions.
  • Interaction with component interfaces.
  • Protocol-level details (gRPC and HTTP).

Building Blocks

Select a building block to explore its internal protocol and mechanics:

1 - Workflow Protocol

Low-level description of the Workflow building block internals.

This document specifies the Dapr Workflow protocol and runtime contract at a low level. It targets SDK authors building Workflow Workers and runtime maintainers evolving the Dapr sidecar’s Workflow Engine.

Overview

Dapr Workflow implements a sidecar-as-scheduler pattern: the Dapr runtime (sidecar) acts as the Workflow Engine, and the application SDK acts as the Workflow Worker. All control and execution traffic flows over gRPC.

There are two protocol surfaces:

  1. Management API (standard Dapr gRPC accessible via SDK):
    • Start, terminate, pause, resume, re-run, purge and query workflow instances.
  2. Execution API (Task Hub Protocol):
    • Worker facing, used to receive orchestration/activity work items and to report completion (e.g., via TaskHubSidecarService).

Key Components

  • Workflow Engine (Dapr Sidecar)

    Manages workflow state transitions, history persistence, scheduling of orchestration and activity tasks, and reliable delivery semantics. By default, it leverages Dapr Actors as the backend for durable, partitioned execution.

  • Workflow Worker (Application SDK)

    Connects to the sidecar, polls for orchestration and activity work items, executes user-defined logic, and returns results, failures, and heartbeats to the engine. Orchestration logic must be deterministic; activity logic need not be.

  • Orchestration

    The deterministic coordinator that defines the workflow. The engine drives orchestrations via history replay to rebuild state and schedule outbound tasks (activities, sub-orchestrations, timers, external events).

  • Activity

    The atomic unit of work. Activities are executed at-least-once and report results or failures back to the engine. Idempotency is recommended and task execution identifiers are available on context to assist with this.

  • State Store & Backend

    Workflow history and state are durably persisted. The engine typically implements a task hub pattern over the chosen persistence and uses Dapr Actors as the default reliability substrate.

Execution Model

Dapr Workflow is based on the Durable Task Framework (DTFx) execution semantics:

Replay-based Execution

Orchestrators are replayed from their event history to rebuild deterministic state. All nondeterministic operations (time, random values, I/O) must be mediated by the engine (e.g., timers, activity calls, external events).

Deterministic Orchestrators

Orchestrator code must be side-effect free except via engine-mediated effects. Control flow must be reproducible during replay.

At-least-once Activities, Exactly-once State Commit

Activities may be delivered more than once. The engine ensures workflow state commits are idempotent and applied exactly once.

Sidecar-as-Scheduler

The sidecar owns scheduling and persists all history/events before dispatching work to workers. Workers are stateless executors from the engine’s perspective.

Protocol Surfaces

  1. Management API (Standard Dapr gRPC)
  • Start Workflow: Create and persist an initial history event; return instance metadata
  • Terminate / Pause/ Resume: Drive lifecycle transitions through persisted control events.
  • Query: Retrieve instance status, history, output, failure details, and custom metadata.
  • Re-run: Start a new workflow instance from a history event.
  • Purge: Proactively clear workflow history and state.

Note: See: Management API specification for exact RPC shapes, error codes and semantics.

  1. Execution API (Task Hub Protocol)
  • Poll for Work: Workers fetch orchestration and activity work items.
  • Complete / Fail Work: Workers report completion results or failures; the engine appends these to history and advances orchestration progress.
  • Heartbeats / Leases: Optional mechanisms for long-running activities and cooperative rebalancing.
  • Timers & External Events: Delivered to orchestrations as history events to keep replay deterministic.

Note: See Execution API specification defining TaskHubSidecarService contracts, payload schemas and sequencing rules.

Request & Runtime Lifecycle

  1. Start Workflow
  • Client calls StartWorkflow via the Management API.
  • Engine persists the initial event (e.g., ExecutionStarted) and materializes an instance.
  1. Orchestrator Execution (Replay-driven)
  • Engine replays orchestration history to rehydrate state.
  • Orchestrator schedules effects (activities, sub-orchestrations, timers) by issuing commands, which the engine persists as new history events.
  1. Activity Dispatch & Execution
  • Engine dispatches activity work items to workers
  • Worker runs the activity (may be retried and delivered at least once).
  • Worker responds with completion (result) or failure; engine appends to history.
  1. Timers & External Signals
  • Engine delivers timer fired or external event records as history entries.
  • Orchestrator consumes these deterministically on next replay.
  1. Progress & Checkpointing
  • Each step appends to the history log and advances orchestration state.
  • The engine safeguards idempotence and exactly-once commit of orchestration state.
  1. Completion
  • Orchestration returns an output (success) or a failure (exception details).
  • Final state and output are persisted; status queries reflect the terminal state.

Protocol Principles

  • GRIEF (GRpc IntErFace): All worker/engine and client/engine communication is gRPC.
  • Replay-based Orchestration: Determinism enforced through history replay.
  • At-least-once Activity Delivery: Activities may re-execute; design for idempotency.
  • Engine-mediated Effects: All nondeterminism/time/IO flows through the engine to remain replay-safe.

Documentation Map

  1. Management API Detailed Dapr gRPC control-plane operations and payloads.
  2. Execution API (Task Hub Protocol) TaskHubSidecarService worker protocol, work item contracts, result/failure reporting, and sequencing.
  3. Orchestration Lifecycle Replay semantics, scheduling, external events, timers, and completion.
  4. Activity Lifecycle Dispatch, retries, idempotency, heartbeat semantics, and failure handling.
  5. State & History History schema, state snapshots, and persistence guarantees.
  6. Versioning How Dapr handles multiple versions of the same workflow definition.

1.1 - Workflow Protocol - Management API

Low-level description of the Workflow building block internals.

Workflow Management API

The Workflow Management API allows Dapr clients to control the lifecycle of workflow instances. These APIs are exposed via the standard Dapr gRPC endpoint and are typically made available via the SDKs.

gRPC Service Definition

The management APIs are part of the Dapr service in dapr.proto.runtime.v1. While multiple versions (Alpha1, Beta1) may exist, the following describes the current implementation logic.

StartWorkflow

Starts a new instance of a workflow.

Request (StartWorkflowRequest):

FieldTypeDescription
instance_idstringOptional. A unique identifier for the workflow instance. If not provided, Dapr will generate a random UUID.
workflow_componentstringThe name of the workflow component to use. Currently, Dapr uses the built-in engine.
workflow_namestringThe name of the workflow definition to execute.
optionsmap<string, string>Optional. Component-specific options.
inputbytesOptional. Input data for the workflow instance, typically a JSON-serialized string.

Response (StartWorkflowResponse):

FieldTypeDescription
instance_idstringThe ID of the started workflow instance.

GetWorkflow

Retrieves the current status and metadata of a workflow instance.

Request (GetWorkflowRequest):

FieldTypeDescription
instance_idstringThe ID of the workflow instance to query.
workflow_componentstringThe name of the workflow component.

Response (GetWorkflowResponse):

FieldTypeDescription
instance_idstringThe ID of the workflow instance.
workflow_namestringThe name of the workflow.
created_atTimestampThe time the instance was created.
last_updated_atTimestampThe time the instance was last updated.
runtime_statusstringThe status (e.g., RUNNING, COMPLETED, FAILED, TERMINATED, PENDING).
propertiesmap<string, string>Additional component-specific metadata.

TerminateWorkflow

Forcefully terminates a running workflow instance.

Request (TerminateWorkflowRequest):

FieldTypeDescription
instance_idstringThe ID of the workflow instance to terminate.
workflow_componentstringThe name of the workflow component.

RaiseEventWorkflow

Sends an event to a running workflow instance.

Request (RaiseEventWorkflowRequest):

FieldTypeDescription
instance_idstringThe ID of the workflow instance.
workflow_componentstringThe name of the workflow component.
event_namestringThe name of the event to raise.
event_databytesThe data associated with the event.

PauseWorkflow & ResumeWorkflow

Pauses or resumes a workflow instance.

Request (PauseWorkflowRequest / ResumeWorkflowRequest):

FieldTypeDescription
instance_idstringThe ID of the workflow instance.
workflow_componentstringThe name of the workflow component.

PurgeWorkflow

Removes all state and history associated with a workflow instance. This can usually only be done for completed, failed, or terminated instances.

Request (PurgeWorkflowRequest):

FieldTypeDescription
instance_idstringThe ID of the workflow instance.
workflow_componentstringThe name of the workflow component.

ListInstanceIDs (Task Hub Protocol Only)

Retrieves a list of workflow instance IDs, optionally filtered by status or name. This is currently part of the internal Task Hub protocol and used for pagination in management tools.

Request (ListInstanceIDsRequest):

FieldTypeDescription
page_sizeint32The maximum number of IDs to return.
continuation_tokenstringAn opaque token used to retrieve the next page of results.

Response (ListInstanceIDsResponse):

FieldTypeDescription
instance_idsrepeated stringThe list of instance IDs.
continuation_tokenstringA token for the next page of results.

Continuation Tokens and Pagination

The continuation_token is an opaque string generated by the Dapr runtime (and the underlying state store). Its purpose is to allow clients to reliably paginate through large sets of workflow instances without loading all IDs into memory at once.

SDK Requirements for Continuation Tokens:

  1. Opaqueness: The SDK MUST treat this token as a black box. It should not attempt to parse, modify, or construct its own tokens.
  2. State Management: When the SDK receives a ListInstanceIDsResponse, it should store the continuation_token if it intends to fetch more results.
  3. Request Propagation: To fetch the next page, the SDK MUST pass the exact continuation_token received from the previous response into the next ListInstanceIDsRequest.
  4. Termination: An empty or null continuation_token in the response indicates that there are no more pages to retrieve.

Runtime Behavior: The runtime derives this token from the state store’s KeysLike operation. Because it is tied to the underlying database’s pagination mechanism, the token may have an expiration or be tied to the specific query parameters (like page_size) used in the initial request.

Implementation Details

The sidecar receives these requests and translates them into operations on the underlying durabletask-go client. For example, StartWorkflow calls the backend to create a new orchestration instance and enqueue an ExecutionStarted event.

1.2 - Workflow Protocol - Execution API

Low-level description of the Workflow building block internals.

Workflow Execution API (Task Hub Protocol)

The Workflow Execution API is a low-level gRPC protocol used by Dapr Workflow SDKs to act as “Workers”. The SDK connects to the Dapr sidecar via this protocol to poll for work and report completion.

The service is named TaskHubSidecarService.

Service Definition (gRPC)

service TaskHubSidecarService {
  rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
  rpc CompleteOrchestratorTask(OrchestratorResponse) returns (CompleteBatchResponse);
  rpc CompleteActivityTask(ActivityResponse) returns (CompleteBatchResponse);
  // ... other management methods
}

Worker Lifecycle

  1. Connection: The SDK opens a long-running bidirectional stream to GetWorkItems.
  2. Polling: The SDK receives WorkItem messages from the stream.
  3. Execution:
    • If the work item is an Orchestration, the SDK retrieves and replays the history events to determine the next actions.
    • If the work item is an Activity, the SDK executes the activity logic.
  4. Completion:
    • For Orchestrations, the SDK calls CompleteOrchestratorTask with a list of actions to take.
    • For Activities, the SDK calls CompleteActivityTask with the result or failure information.

gRPC Service: TaskHubSidecarService

GetWorkItems

Opens a stream to receive work items for orchestrations and activities.

Request (GetWorkItemsRequest): Typically empty or contains worker metadata.

Response (stream WorkItem): A WorkItem can be one of:

  • orchestrator_item: Contains history and new events for an orchestration.
  • activity_item: Contains details for a single activity task.

CompleteOrchestratorTask

Reports the results of an orchestration execution.

Request (OrchestratorResponse):

  • instance_id: The ID of the workflow instance.
  • actions: A list of OrchestratorAction messages.
  • custom_status: Optional string for user-defined status.

OrchestratorAction Types:

  • ScheduleTask: Schedule a new activity.
  • CreateTimer: Schedule a durable timer.
  • CreateSubOrchestration: Start a child workflow.
  • CompleteOrchestration: Mark the workflow as completed (success or failure).
  • TerminateOrchestration: Forcefully terminate the instance.
  • SendEvent: Send an event to another workflow.

CompleteActivityTask

Reports the result of an activity execution.

Request (ActivityResponse):

  • instance_id: The ID of the workflow instance.
  • task_id: The unique ID of the activity task.
  • completion_token: The opaque token received in the ActivityWorkItem.
  • result: The serialized output of the activity (if successful).
  • failure_details: Details about the error (if failed).

Data Models

HistoryEvent

Workflows in Dapr are event-sourced. The state of an orchestration is rebuilt by replaying a sequence of HistoryEvent messages.

Common event types:

  • ExecutionStarted: Initial event containing workflow name and input.
  • TaskScheduled: An activity was scheduled.
  • TaskCompleted: An activity finished successfully.
  • TaskFailed: An activity failed.
  • TimerCreated: A timer was scheduled.
  • TimerFired: A timer expired.
  • OrchestrationCompleted: The workflow finished.

FailureDetails

Used to report errors from activities or orchestrations.

  • error_type: A string identifying the type of error.
  • error_message: A human-readable error message.
  • stack_trace: Optional stack trace.
  • is_non_retriable: Boolean flag.

Protocol Nuances

  • Streaming: GetWorkItems is a server-to-client stream. Dapr pushes work to the SDK as it becomes available.
  • Sticky Sessions: Dapr attempts to send work items for the same instance to the same worker if possible, but the SDK must not rely on this for correctness.
  • Determinism: The SDK must ensure that the orchestration logic is deterministic. During replay, the SDK uses the history provided in the OrchestratorWorkItem to avoid re-executing actions that have already been recorded.

1.3 - Workflow Protocol - Orchestration Lifecycle

Low-level description of the Workflow building block internals.

Orchestration Lifecycle

This document describes the lifecycle of an orchestration at the protocol level, specifically how the Dapr engine and the SDK interact to execute workflow logic reliably.

Replay-based Execution

Dapr Workflows use event sourcing and replay to maintain state. Instead of saving the entire state of the worker process (stack, variables, etc.), Dapr saves a history of events that have occurred.

The Replay Loop

  1. Work Item Arrival: The Dapr engine sends an OrchestratorWorkItem to the SDK via the GetWorkItems stream. This work item contains the full history of the workflow instance plus any new events (e.g., an activity completion or an external event).
  2. Reconstruction: The SDK starts executing the orchestration function from the very beginning.
  3. Deterministic Execution: As the function executes, it encounters “tasks” (e.g., calling an activity, sleeping).
    • For each task, the SDK checks the provided History to see if that task has already completed.
    • If the task is in the history, the SDK returns the recorded result immediately without actually re-executing the task logic.
    • If the task is NOT in the history, the SDK records that this task needs to be scheduled and suspends execution of the orchestration function (typically by throwing a special exception or returning a pending promise).
  4. Reporting: Once the orchestration function is suspended or completes, the SDK sends a CompleteOrchestratorTask request to Dapr. This request contains a list of Actions (e.g., ScheduleTask, CreateTimer) that the engine should perform.
  5. State Commitment: The Dapr engine receives the actions, updates the workflow history in the state store, and schedules any requested tasks (e.g., by sending work to an activity worker).

Step-by-Step Example

Imagine a workflow: Activity A -> Activity B.

1. Workflow Start

  • Engine: Enqueues ExecutionStarted event.
  • SDK: Receives OrchestratorWorkItem with [ExecutionStarted].
  • SDK: Runs function. Function calls Activity A.
  • SDK: Checks history. Activity A is not there.
  • SDK: Suspends. Sends CompleteOrchestratorTask with [ScheduleTask(Activity A)].
  • Engine: Records TaskScheduled(Activity A) in history.

2. Activity A Completes

  • Engine: Records TaskCompleted(Activity A, result="foo") in history.
  • SDK: Receives OrchestratorWorkItem with [ExecutionStarted, TaskScheduled(A), TaskCompleted(A)].
  • SDK: Runs function from the start.
  • SDK: Function calls Activity A. SDK finds TaskCompleted(A) in history. Returns "foo".
  • SDK: Function calls Activity B.
  • SDK: Checks history. Activity B is not there.
  • SDK: Suspends. Sends CompleteOrchestratorTask with [ScheduleTask(Activity B)].
  • Engine: Records TaskScheduled(Activity B).

3. Workflow Completion

  • Activity B completes.
  • SDK: Receives history with both A and B completed.
  • SDK: Runs function. Both A and B return results from history.
  • SDK: Function finishes and returns a final result.
  • SDK: Sends CompleteOrchestratorTask with [CompleteOrchestration(result="final")].
  • Engine: Records OrchestrationCompleted and marks the instance as COMPLETED.

Critical Requirements for SDK Authors

1. Determinism

The orchestration function MUST be deterministic. It cannot use:

  • Random numbers.
  • Current date/time (must use a durable timer or a provided CurrentUtcDateTime).
  • Direct IO (must be done in activities).
  • Global state that can change between replays.

2. Patching (In-flight updates)

When a workflow is already running, you might need to update its logic. However, since workflows are replay-based, changing the logic directly would break determinism for in-flight instances.

Dapr provides a Patching mechanism (e.g., ctx.IsPatched("patch-id")) to safely introduce changes:

  • Logic Branching: The SDK provides an API to check if a specific “patch” is active for the current instance.
  • Patch Recording: When a patch check is encountered during execution, the result (true/false) is recorded in the workflow history.
  • Consistency: Once a patch is recorded as active (or inactive) for an instance, it remains so for the lifetime of that instance, even if the worker code changes or the instance is moved to another worker.
  • Safety: The Dapr engine validates that the sequence of patches encountered during replay exactly matches the sequence in history. If there’s a mismatch, the workflow enters a Stalled state to prevent data corruption.

3. Named Versions (In-flight updates)

Dapr also provides a Named Versioning mechanism wherein the SDK maintains a registry of available named workflow versions. When it receives a request to initialize a new workflow by name, it’ll consult the registry to determine if the name is a match for a different workflow version than the workflow name specified and is responsible for redirecting the request to the intended “latest” version.

  • Logic Branching: The SDK provides an API to register different versions for a given workflow name.
  • Replay Consistency: The request to run a workflow may contain a property specifying a specific workflow name to execute. This ensures that in-flight workflows will always run using the same workflow version whereas new workflows will use the latest available version.

3. Stalled State

A workflow instance enters the STALLED state when the engine detects an unrecoverable condition that requires manual intervention or a code fix to proceed. Common reasons include:

  • Patch Mismatch: The current code’s patching logic contradicts the instance’s history.
  • Execution Errors: A fatal error occurred that cannot be handled by retries.

When stalled, the instance stops execution but remains in the system. Once the underlying issue is resolved (e.g., the correct code version is deployed), the instance can be resumed or will automatically resume on the next event.

4. History Management

The SDK must efficiently search the history. Typically, this is done by maintaining a counter of tasks encountered during execution and matching them against the sequence of events in the history.

5. Graceful Suspension

The SDK needs a mechanism to stop execution of the orchestration function when a task is scheduled but not yet completed, without losing the ability to restart it later.

1.4 - Workflow Protocol - Activity Lifecycle

Low-level description of the Workflow building block internals.

Activity Lifecycle

Activities are the basic units of work in a Dapr Workflow. Unlike orchestrations, activities are not replayed and do not need to be deterministic. They are executed exactly once per “schedule” (though retries may occur).

Execution Flow

  1. Scheduling: An orchestration requests an activity by sending a ScheduleTask action to the Dapr engine.
  2. Work Item Dispatch: The Dapr engine enqueues an activity task. When an activity worker (SDK) is available, the engine sends an ActivityWorkItem via the GetWorkItems stream.
  3. Execution: The SDK receives the ActivityWorkItem, which contains:
    • name: The name of the activity to execute.
    • input: The input data for the activity.
    • instance_id: The ID of the workflow instance that scheduled the activity.
    • task_id: A unique identifier for this specific activity execution.
    • task_execution_id: A unique identifier for the specific attempt of this activity. This is useful for implementing idempotency in activity logic.
    • completion_token: An opaque token used to correlate the response with this specific work item.
  4. Reporting: After the activity logic finishes, the SDK sends a CompleteActivityTask request back to Dapr.
    • Success: The SDK provides the serialized output in the result field.
    • Failure: The SDK provides failure_details (error message, type, stack trace).

Task Execution IDs

The task_execution_id (also known as the Task Execution Key) is a unique, runtime-generated string (typically a UUID) that identifies a specific attempt to execute an activity task.

Why it matters to the SDK

While the Workflow SDK is generally stateless between work items, the task_execution_id provides critical context for the Activity Worker:

  1. Distributed Idempotency: If an activity performs a side effect (e.g., charging a credit card), it should use the task_execution_id as an idempotency key.
  2. Distinguishing Retries: Unlike the task_id (which remains constant for a specific step in the workflow), the task_execution_id changes every time the engine retries the activity (e.g., due to a timeout or worker crash).
  3. Zombie Detection: If an activity worker takes too long and the engine times it out and retries on another worker, the original worker might eventually finish. By checking the task_execution_id against a persistent store or external API, the worker can determine if it is a “zombie” whose results are no longer wanted.

Implementation Guidelines for SDKs:

  • Expose to User: The SDK MUST expose the task_execution_id to the activity implementation logic (e.g., via an ActivityContext).
  • Do Not Cache: The SDK should not attempt to cache or reuse this ID across different work items.
  • Opaque Usage: The SDK should treat the value as an opaque string. It is generated by the Dapr sidecar when the activity is dispatched and is not something the SDK needs to create or parse.

Completion Tokens

The completion_token is an opaque string generated by the Dapr runtime and delivered to the SDK as part of the ActivityWorkItem.

Purpose and Intent

  1. Response Correlation: The sidecar uses the completion_token to reliably match an ActivityResponse (from CompleteActivityTask) to the original task it dispatched.
  2. Stateless Tracking: It allows the sidecar to remain stateless or minimize state lookups when receiving a completion, as the token contains (or points to) the necessary context (instance ID, task ID, etc.).
  3. Zombie Prevention: If an activity times out and is retried, the new attempt will have a different completion_token. If the original “zombie” worker eventually responds with the old token, the sidecar can easily identify and ignore the late response.

SDK Implementation Guidelines

  • Capture: The SDK MUST capture the completion_token from the incoming ActivityWorkItem.
  • Propagation: The SDK MUST include the exact same completion_token in the ActivityResponse sent via CompleteActivityTask.
  • Opaqueness: The SDK MUST treat the token as a black box. It should not attempt to parse, modify, or construct its own tokens.
  • Storage: While the activity is executing, the SDK must keep this token in memory (e.g., in the ActivityContext).

Task Activity IDs

In the Dapr runtime (specifically when using the Actors backend), activities are represented as actors. Each activity execution has a unique Task Activity ID (also known as the Activity Actor ID).

The ID follows a specific pattern: {workflowInstanceID}::{taskID}::{generation}

  • workflowInstanceID: The unique ID of the workflow instance that scheduled the activity.
  • taskID: The sequence number of the task within the workflow execution (e.g., 0, 1, 2…).
  • generation: A counter that increments if the workflow is restarted or “continued as new”.

This unique ID ensures that activity executions are isolated and can be tracked reliably across retries and restarts.

Retries

Dapr handles activity retries based on the policy defined in the orchestration (if the SDK supports defining retry policies in the ScheduleTask action). If an activity fails and a retry policy is in place, the engine will re-enqueue the activity task after the specified delay.

From the activity worker’s perspective, a retry is simply a new ActivityWorkItem with the same name and input, but potentially a different task_id (or the same, depending on the backend implementation).

Idempotency

Because activities might be executed more than once (e.g., if the worker crashes after execution but before reporting completion), it is recommended that activity logic be idempotent where possible.

Comparison with Workflows

FeatureOrchestrationActivity
Execution StyleReplay-based (Deterministic)Direct execution
StateManaged via History EventsNo internal workflow state
Side EffectsForbidden (must use activities)Allowed (IO, Database, etc.)
LifetimeCan be long-running (days/months)Usually short-lived
ConnectivityConnected via GetWorkItemsConnected via GetWorkItems

1.5 - Workflow Protocol - State & History

Low-level description of the Workflow building block internals.

State and History Management

Dapr Workflows are event-sourced, meaning the state of a workflow is derived from a sequence of events. This document describes how Dapr stores and manages this history and state.

Backend Storage: Dapr Actors

By default, the Dapr Workflow engine uses Dapr Actors as its storage backend. Each workflow instance is mapped to a unique actor instance. This provides:

  • Concurrency Control: Actors ensure that only one operation is happening on a workflow instance at a time.
  • Reliability: Actor state is persisted in the configured Dapr State Store.
  • Timers: Dapr Actors provide durable reminders which are used to implement workflow timers.

Workflow State Schema

The state of a workflow instance (actor) consists of several components:

1. Metadata

Stores high-level information about the instance:

  • instance_id: The unique ID of the workflow.
  • name: The name of the workflow.
  • status: The current runtime status (Running, Completed, Stalled etc.).
  • version: The name of the workflow version and any active patches.
  • created_at: Creation timestamp.
  • last_updated_at: Last activity timestamp.
  • input: Original input data.
  • output: Final output data (if completed).

2. History

A sequence of HistoryEvent objects that record everything that has happened in the workflow. To optimize for large histories, Dapr often stores history events in chunks or as separate keys in the state store:

  • Key Format: wf-history-<instance_id>-<index>
  • Event Content: Serialized protobuf message containing event type, timestamp, and type-specific data (e.g., TaskScheduled, TaskCompleted).

3. Inbox (Pending Events)

A collection of events that have occurred but have not yet been processed by the orchestrator (replayed). This includes:

  • External events raised to the workflow.
  • Completed activity results.
  • Fired timers.

When the orchestrator next runs, it “drains” the inbox, moves those events into the history, and then replays the logic.

Replay and State Reconstruction

When a worker (SDK) receives a work item, Dapr provides the history events. The SDK reconstructs the internal state of the orchestration (e.g., local variables, current execution point) by replaying these events in order.

Determinism and History

The history is the “source of truth”. If the orchestration code changes in a non-deterministic way (e.g., adding a new activity call in the middle of existing code), the replay will fail because the code’s requests won’t match the recorded history.

Purging State

When a workflow is purged, Dapr deletes the metadata and all associated history event keys from the state store. This is typically done to clean up after finished workflows.

1.6 - Workflow Protocol - Versioning

Low-level description of the Workflow building block internals.

Workflow Versioning

Dapr Workflow supports versioning of workflow definitions, allowing you to update workflow logic while existing instances continue to run on their original logic.

Named Workflow Versioning

When registering a workflow with the Dapr engine, you can provide a version name. This allows multiple versions of the same workflow to coexist.

  • Default Version: One version of a workflow can be marked as the default. If a client starts a workflow by name without specifying a version, the default version is used.
  • Specific Version: Clients can request a specific version of a workflow when starting a new instance.

Registration API

SDKs register versioned workflows using the AddVersionedOrchestrator (or similar) method in their task registry.

// Example (Internal Registry API)
registry.AddVersionedOrchestrator("MyWorkflow", "v2", true, MyWorkflowV2)
registry.AddVersionedOrchestrator("MyWorkflow", "v1", false, MyWorkflowV1)

Sidecar Versioning

The Dapr sidecar tracks which named version of a workflow an instance is running as well as the list of applied patches observed up to that point in the workflow execution. This information is stored in the workflow history within the OrchestratorStarted event’s version field.

message OrchestrationVersion {
  string name = 1;
  repeated string patches = 2;
}

When an instance is resumed (e.g., after an activity completes), the Dapr engine is responsible for recovering from stalls introduced by a version mismatch on the client. It does so by monitoring changes to the placement table, indicating that a new SDK client has connected (potentially representing a newer replica instance of the app) and dispatching a repeat of the last event in an attempt to retry the operation. This time, if the SDK is able to successfully complete the task, the runtime will chnage the workflow status out of Stalled and back to Running.