Protocol Reference
Low-level technical documentation of the Dapr runtime protocols and internal mechanics for each building block.
This section provides a deep dive into the internal workings of the Dapr runtime. It is intended for maintainers,
contributors, and anyone interested in the low-level implementation details of Dapr’s building blocks.
Unlike the user-facing API reference, these documents focus on:
- How the runtime processes requests.
- Internal state transitions.
- Interaction with component interfaces.
- Protocol-level details (gRPC and HTTP).
Building Blocks
Select a building block to explore its internal protocol and mechanics:
1 - Workflow Protocol
Low-level description of the Workflow building block internals.
This document specifies the Dapr Workflow protocol and runtime contract at a low level. It targets SDK authors building
Workflow Workers and runtime maintainers evolving the Dapr sidecar’s Workflow Engine.
Overview
Dapr Workflow implements a sidecar-as-scheduler pattern: the Dapr runtime (sidecar) acts as the Workflow Engine, and
the application SDK acts as the Workflow Worker. All control and execution traffic flows over gRPC.
There are two protocol surfaces:
- Management API (standard Dapr gRPC accessible via SDK):
- Start, terminate, pause, resume, re-run, purge and query workflow instances.
- Execution API (Task Hub Protocol):
- Worker facing, used to receive orchestration/activity work items and to report completion (e.g., via
TaskHubSidecarService).
Key Components
Workflow Engine (Dapr Sidecar)
Manages workflow state transitions, history persistence, scheduling of orchestration and activity tasks, and
reliable delivery semantics. By default, it leverages Dapr Actors as the backend for durable, partitioned execution.
Workflow Worker (Application SDK)
Connects to the sidecar, polls for orchestration and activity work items, executes user-defined logic, and returns
results, failures, and heartbeats to the engine. Orchestration logic must be deterministic; activity logic need not
be.
Orchestration
The deterministic coordinator that defines the workflow. The engine drives orchestrations via history replay to
rebuild state and schedule outbound tasks (activities, sub-orchestrations, timers, external events).
Activity
The atomic unit of work. Activities are executed at-least-once and report results or failures back to the engine.
Idempotency is recommended and task execution identifiers are available on context to assist with this.
State Store & Backend
Workflow history and state are durably persisted. The engine typically implements a task hub pattern over the chosen
persistence and uses Dapr Actors as the default reliability substrate.
Execution Model
Dapr Workflow is based on the Durable Task Framework (DTFx) execution semantics:
Replay-based Execution
Orchestrators are replayed from their event history to rebuild deterministic state. All nondeterministic operations
(time, random values, I/O) must be mediated by the engine (e.g., timers, activity calls, external events).
Deterministic Orchestrators
Orchestrator code must be side-effect free except via engine-mediated effects. Control flow must be reproducible
during replay.
At-least-once Activities, Exactly-once State Commit
Activities may be delivered more than once. The engine ensures workflow state commits are idempotent and applied
exactly once.
Sidecar-as-Scheduler
The sidecar owns scheduling and persists all history/events before dispatching work to workers. Workers are stateless
executors from the engine’s perspective.
Protocol Surfaces
- Management API (Standard Dapr gRPC)
- Start Workflow: Create and persist an initial history event; return instance metadata
- Terminate / Pause/ Resume: Drive lifecycle transitions through persisted control events.
- Query: Retrieve instance status, history, output, failure details, and custom metadata.
- Re-run: Start a new workflow instance from a history event.
- Purge: Proactively clear workflow history and state.
Note: See: Management API specification for exact RPC shapes,
error codes and semantics.
- Execution API (Task Hub Protocol)
- Poll for Work: Workers fetch orchestration and activity work items.
- Complete / Fail Work: Workers report completion results or failures; the engine appends these to history and advances
orchestration progress.
- Heartbeats / Leases: Optional mechanisms for long-running activities and cooperative rebalancing.
- Timers & External Events: Delivered to orchestrations as history events to keep replay deterministic.
Note: See Execution API specification defining TaskHubSidecarService
contracts, payload schemas and sequencing rules.
Request & Runtime Lifecycle
- Start Workflow
- Client calls
StartWorkflow via the Management API. - Engine persists the initial event (e.g.,
ExecutionStarted) and materializes an instance.
- Orchestrator Execution (Replay-driven)
- Engine replays orchestration history to rehydrate state.
- Orchestrator schedules effects (activities, sub-orchestrations, timers) by issuing commands, which the engine
persists as new history events.
- Activity Dispatch & Execution
- Engine dispatches activity work items to workers
- Worker runs the activity (may be retried and delivered at least once).
- Worker responds with completion (result) or failure; engine appends to history.
- Timers & External Signals
- Engine delivers timer fired or external event records as history entries.
- Orchestrator consumes these deterministically on next replay.
- Progress & Checkpointing
- Each step appends to the history log and advances orchestration state.
- The engine safeguards idempotence and exactly-once commit of orchestration state.
- Completion
- Orchestration returns an output (success) or a failure (exception details).
- Final state and output are persisted; status queries reflect the terminal state.
Protocol Principles
- GRIEF (GRpc IntErFace): All worker/engine and client/engine communication is gRPC.
- Replay-based Orchestration: Determinism enforced through history replay.
- At-least-once Activity Delivery: Activities may re-execute; design for idempotency.
- Engine-mediated Effects: All nondeterminism/time/IO flows through the engine to remain replay-safe.
Documentation Map
- Management API
Detailed Dapr gRPC control-plane operations and payloads.
- Execution API (Task Hub Protocol)
TaskHubSidecarService worker protocol, work item contracts, result/failure reporting, and sequencing. - Orchestration Lifecycle
Replay semantics, scheduling, external events, timers, and completion.
- Activity Lifecycle
Dispatch, retries, idempotency, heartbeat semantics, and failure handling.
- State & History
History schema, state snapshots, and persistence guarantees.
- Versioning
How Dapr handles multiple versions of the same workflow definition.
1.1 - Workflow Protocol - Management API
Low-level description of the Workflow building block internals.
Workflow Management API
The Workflow Management API allows Dapr clients to control the lifecycle of workflow instances. These APIs are exposed
via the standard Dapr gRPC endpoint and are typically made available via the SDKs.
gRPC Service Definition
The management APIs are part of the Dapr service in dapr.proto.runtime.v1. While multiple versions
(Alpha1, Beta1) may exist, the following describes the current implementation logic.
StartWorkflow
Starts a new instance of a workflow.
Request (StartWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | Optional. A unique identifier for the workflow instance. If not provided, Dapr will generate a random UUID. |
workflow_component | string | The name of the workflow component to use. Currently, Dapr uses the built-in engine. |
workflow_name | string | The name of the workflow definition to execute. |
options | map<string, string> | Optional. Component-specific options. |
input | bytes | Optional. Input data for the workflow instance, typically a JSON-serialized string. |
Response (StartWorkflowResponse):
| Field | Type | Description |
|---|
instance_id | string | The ID of the started workflow instance. |
GetWorkflow
Retrieves the current status and metadata of a workflow instance.
Request (GetWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance to query. |
workflow_component | string | The name of the workflow component. |
Response (GetWorkflowResponse):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance. |
workflow_name | string | The name of the workflow. |
created_at | Timestamp | The time the instance was created. |
last_updated_at | Timestamp | The time the instance was last updated. |
runtime_status | string | The status (e.g., RUNNING, COMPLETED, FAILED, TERMINATED, PENDING). |
properties | map<string, string> | Additional component-specific metadata. |
TerminateWorkflow
Forcefully terminates a running workflow instance.
Request (TerminateWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance to terminate. |
workflow_component | string | The name of the workflow component. |
RaiseEventWorkflow
Sends an event to a running workflow instance.
Request (RaiseEventWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance. |
workflow_component | string | The name of the workflow component. |
event_name | string | The name of the event to raise. |
event_data | bytes | The data associated with the event. |
PauseWorkflow & ResumeWorkflow
Pauses or resumes a workflow instance.
Request (PauseWorkflowRequest / ResumeWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance. |
workflow_component | string | The name of the workflow component. |
PurgeWorkflow
Removes all state and history associated with a workflow instance. This can usually only be done for completed,
failed, or terminated instances.
Request (PurgeWorkflowRequest):
| Field | Type | Description |
|---|
instance_id | string | The ID of the workflow instance. |
workflow_component | string | The name of the workflow component. |
ListInstanceIDs (Task Hub Protocol Only)
Retrieves a list of workflow instance IDs, optionally filtered by status or name. This is currently part of the
internal Task Hub protocol and used for pagination in management tools.
Request (ListInstanceIDsRequest):
| Field | Type | Description |
|---|
page_size | int32 | The maximum number of IDs to return. |
continuation_token | string | An opaque token used to retrieve the next page of results. |
Response (ListInstanceIDsResponse):
| Field | Type | Description |
|---|
instance_ids | repeated string | The list of instance IDs. |
continuation_token | string | A token for the next page of results. |
Continuation Tokens and Pagination
The continuation_token is an opaque string generated by the Dapr runtime (and the underlying state store).
Its purpose is to allow clients to reliably paginate through large sets of workflow instances without loading all
IDs into memory at once.
SDK Requirements for Continuation Tokens:
- Opaqueness: The SDK MUST treat this token as a black box. It should not attempt to parse, modify, or
construct its own tokens.
- State Management: When the SDK receives a
ListInstanceIDsResponse, it should store the continuation_token
if it intends to fetch more results. - Request Propagation: To fetch the next page, the SDK MUST pass the exact
continuation_token received from
the previous response into the next ListInstanceIDsRequest. - Termination: An empty or null
continuation_token in the response indicates that there are no more pages to
retrieve.
Runtime Behavior:
The runtime derives this token from the state store’s KeysLike operation. Because it is tied to the underlying
database’s pagination mechanism, the token may have an expiration or be tied to the specific query parameters
(like page_size) used in the initial request.
Implementation Details
The sidecar receives these requests and translates them into operations on the underlying durabletask-go client.
For example, StartWorkflow calls the backend to create a new orchestration instance and enqueue an
ExecutionStarted event.
1.2 - Workflow Protocol - Execution API
Low-level description of the Workflow building block internals.
Workflow Execution API (Task Hub Protocol)
The Workflow Execution API is a low-level gRPC protocol used by Dapr Workflow SDKs to act as “Workers”. The SDK
connects to the Dapr sidecar via this protocol to poll for work and report completion.
The service is named TaskHubSidecarService.
Service Definition (gRPC)
service TaskHubSidecarService {
rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
rpc CompleteOrchestratorTask(OrchestratorResponse) returns (CompleteBatchResponse);
rpc CompleteActivityTask(ActivityResponse) returns (CompleteBatchResponse);
// ... other management methods
}
Worker Lifecycle
- Connection: The SDK opens a long-running bidirectional stream to
GetWorkItems. - Polling: The SDK receives
WorkItem messages from the stream. - Execution:
- If the work item is an Orchestration, the SDK retrieves and replays the history events to determine the next actions.
- If the work item is an Activity, the SDK executes the activity logic.
- Completion:
- For Orchestrations, the SDK calls
CompleteOrchestratorTask with a list of actions to take. - For Activities, the SDK calls
CompleteActivityTask with the result or failure information.
gRPC Service: TaskHubSidecarService
GetWorkItems
Opens a stream to receive work items for orchestrations and activities.
Request (GetWorkItemsRequest):
Typically empty or contains worker metadata.
Response (stream WorkItem):
A WorkItem can be one of:
orchestrator_item: Contains history and new events for an orchestration.activity_item: Contains details for a single activity task.
CompleteOrchestratorTask
Reports the results of an orchestration execution.
Request (OrchestratorResponse):
instance_id: The ID of the workflow instance.actions: A list of OrchestratorAction messages.custom_status: Optional string for user-defined status.
OrchestratorAction Types:
ScheduleTask: Schedule a new activity.CreateTimer: Schedule a durable timer.CreateSubOrchestration: Start a child workflow.CompleteOrchestration: Mark the workflow as completed (success or failure).TerminateOrchestration: Forcefully terminate the instance.SendEvent: Send an event to another workflow.
CompleteActivityTask
Reports the result of an activity execution.
Request (ActivityResponse):
instance_id: The ID of the workflow instance.task_id: The unique ID of the activity task.completion_token: The opaque token received in the ActivityWorkItem.result: The serialized output of the activity (if successful).failure_details: Details about the error (if failed).
Data Models
HistoryEvent
Workflows in Dapr are event-sourced. The state of an orchestration is rebuilt by replaying a sequence of
HistoryEvent messages.
Common event types:
ExecutionStarted: Initial event containing workflow name and input.TaskScheduled: An activity was scheduled.TaskCompleted: An activity finished successfully.TaskFailed: An activity failed.TimerCreated: A timer was scheduled.TimerFired: A timer expired.OrchestrationCompleted: The workflow finished.
FailureDetails
Used to report errors from activities or orchestrations.
error_type: A string identifying the type of error.error_message: A human-readable error message.stack_trace: Optional stack trace.is_non_retriable: Boolean flag.
Protocol Nuances
- Streaming:
GetWorkItems is a server-to-client stream. Dapr pushes work to the SDK as it becomes available. - Sticky Sessions: Dapr attempts to send work items for the same instance to the same worker if possible, but
the SDK must not rely on this for correctness.
- Determinism: The SDK must ensure that the orchestration logic is deterministic. During replay, the SDK
uses the history provided in the
OrchestratorWorkItem to avoid re-executing actions that have already been recorded.
1.3 - Workflow Protocol - Orchestration Lifecycle
Low-level description of the Workflow building block internals.
Orchestration Lifecycle
This document describes the lifecycle of an orchestration at the protocol level, specifically how the Dapr engine and
the SDK interact to execute workflow logic reliably.
Replay-based Execution
Dapr Workflows use event sourcing and replay to maintain state. Instead of saving the entire state of the
worker process (stack, variables, etc.), Dapr saves a history of events that have occurred.
The Replay Loop
- Work Item Arrival: The Dapr engine sends an
OrchestratorWorkItem to the SDK via the GetWorkItems stream.
This work item contains the full history of the workflow instance plus any new events (e.g., an activity completion
or an external event). - Reconstruction: The SDK starts executing the orchestration function from the very beginning.
- Deterministic Execution: As the function executes, it encounters “tasks” (e.g., calling an activity, sleeping).
- For each task, the SDK checks the provided History to see if that task has already completed.
- If the task is in the history, the SDK returns the recorded result immediately without actually re-executing
the task logic.
- If the task is NOT in the history, the SDK records that this task needs to be scheduled and suspends
execution of the orchestration function (typically by throwing a special exception or returning a pending
promise).
- Reporting: Once the orchestration function is suspended or completes, the SDK sends a
CompleteOrchestratorTask request to Dapr. This request contains a list of Actions (e.g., ScheduleTask,
CreateTimer) that the engine should perform. - State Commitment: The Dapr engine receives the actions, updates the workflow history in the state store, and
schedules any requested tasks (e.g., by sending work to an activity worker).
Step-by-Step Example
Imagine a workflow: Activity A -> Activity B.
1. Workflow Start
- Engine: Enqueues
ExecutionStarted event. - SDK: Receives
OrchestratorWorkItem with [ExecutionStarted]. - SDK: Runs function. Function calls
Activity A. - SDK: Checks history.
Activity A is not there. - SDK: Suspends. Sends
CompleteOrchestratorTask with [ScheduleTask(Activity A)]. - Engine: Records
TaskScheduled(Activity A) in history.
2. Activity A Completes
- Engine: Records
TaskCompleted(Activity A, result="foo") in history. - SDK: Receives
OrchestratorWorkItem with [ExecutionStarted, TaskScheduled(A), TaskCompleted(A)]. - SDK: Runs function from the start.
- SDK: Function calls
Activity A. SDK finds TaskCompleted(A) in history. Returns "foo". - SDK: Function calls
Activity B. - SDK: Checks history.
Activity B is not there. - SDK: Suspends. Sends
CompleteOrchestratorTask with [ScheduleTask(Activity B)]. - Engine: Records
TaskScheduled(Activity B).
3. Workflow Completion
- Activity B completes.
- SDK: Receives history with both A and B completed.
- SDK: Runs function. Both A and B return results from history.
- SDK: Function finishes and returns a final result.
- SDK: Sends
CompleteOrchestratorTask with [CompleteOrchestration(result="final")]. - Engine: Records
OrchestrationCompleted and marks the instance as COMPLETED.
Critical Requirements for SDK Authors
1. Determinism
The orchestration function MUST be deterministic. It cannot use:
- Random numbers.
- Current date/time (must use a durable timer or a provided
CurrentUtcDateTime). - Direct IO (must be done in activities).
- Global state that can change between replays.
2. Patching (In-flight updates)
When a workflow is already running, you might need to update its logic. However, since workflows are replay-based,
changing the logic directly would break determinism for in-flight instances.
Dapr provides a Patching mechanism (e.g., ctx.IsPatched("patch-id")) to safely introduce changes:
- Logic Branching: The SDK provides an API to check if a specific “patch” is active for the current instance.
- Patch Recording: When a patch check is encountered during execution, the result (true/false) is recorded in
the workflow history.
- Consistency: Once a patch is recorded as active (or inactive) for an instance, it remains so for the lifetime
of that instance, even if the worker code changes or the instance is moved to another worker.
- Safety: The Dapr engine validates that the sequence of patches encountered during replay exactly matches the
sequence in history. If there’s a mismatch, the workflow enters a Stalled state to prevent data corruption.
3. Named Versions (In-flight updates)
Dapr also provides a Named Versioning mechanism wherein the SDK maintains a registry of available named workflow
versions. When it receives a request to initialize a new workflow by name, it’ll consult the registry to determine
if the name is a match for a different workflow version than the workflow name specified and is responsible for
redirecting the request to the intended “latest” version.
- Logic Branching: The SDK provides an API to register different versions for a given workflow name.
- Replay Consistency: The request to run a workflow may contain a property specifying a specific workflow name to
execute. This ensures that in-flight workflows will always run using the same workflow version whereas new workflows
will use the latest available version.
3. Stalled State
A workflow instance enters the STALLED state when the engine detects an unrecoverable condition that requires manual
intervention or a code fix to proceed. Common reasons include:
- Patch Mismatch: The current code’s patching logic contradicts the instance’s history.
- Execution Errors: A fatal error occurred that cannot be handled by retries.
When stalled, the instance stops execution but remains in the system. Once the underlying issue is resolved (e.g.,
the correct code version is deployed), the instance can be resumed or will automatically resume on the next event.
4. History Management
The SDK must efficiently search the history. Typically, this is done by maintaining a counter of tasks encountered
during execution and matching them against the sequence of events in the history.
5. Graceful Suspension
The SDK needs a mechanism to stop execution of the orchestration function when a task is scheduled but not yet
completed, without losing the ability to restart it later.
1.4 - Workflow Protocol - Activity Lifecycle
Low-level description of the Workflow building block internals.
Activity Lifecycle
Activities are the basic units of work in a Dapr Workflow. Unlike orchestrations, activities are not replayed and do
not need to be deterministic. They are executed exactly once per “schedule” (though retries may occur).
Execution Flow
- Scheduling: An orchestration requests an activity by sending a
ScheduleTask action to the Dapr engine. - Work Item Dispatch: The Dapr engine enqueues an activity task. When an activity worker (SDK) is available, the
engine sends an
ActivityWorkItem via the GetWorkItems stream. - Execution: The SDK receives the
ActivityWorkItem, which contains:name: The name of the activity to execute.input: The input data for the activity.instance_id: The ID of the workflow instance that scheduled the activity.task_id: A unique identifier for this specific activity execution.task_execution_id: A unique identifier for the specific attempt of this activity. This is useful for
implementing idempotency in activity logic.completion_token: An opaque token used to correlate the response with this specific work item.
- Reporting: After the activity logic finishes, the SDK sends a
CompleteActivityTask request back to Dapr.- Success: The SDK provides the serialized output in the
result field. - Failure: The SDK provides
failure_details (error message, type, stack trace).
Task Execution IDs
The task_execution_id (also known as the Task Execution Key) is a unique, runtime-generated string (typically a UUID)
that identifies a specific attempt to execute an activity task.
Why it matters to the SDK
While the Workflow SDK is generally stateless between work items, the task_execution_id provides critical context
for the Activity Worker:
- Distributed Idempotency: If an activity performs a side effect (e.g., charging a credit card), it
should use the
task_execution_id as an idempotency key. - Distinguishing Retries: Unlike the
task_id (which remains constant for a specific step in the workflow),
the task_execution_id changes every time the engine retries the activity (e.g., due to a timeout or worker crash). - Zombie Detection: If an activity worker takes too long and the engine times it out and retries on another
worker, the original worker might eventually finish. By checking the
task_execution_id against a persistent store
or external API, the worker can determine if it is a “zombie” whose results are no longer wanted.
Implementation Guidelines for SDKs:
- Expose to User: The SDK MUST expose the
task_execution_id to the activity implementation logic
(e.g., via an ActivityContext). - Do Not Cache: The SDK should not attempt to cache or reuse this ID across different work items.
- Opaque Usage: The SDK should treat the value as an opaque string. It is generated by the Dapr sidecar when
the activity is dispatched and is not something the SDK needs to create or parse.
Completion Tokens
The completion_token is an opaque string generated by the Dapr runtime and delivered to the SDK as part of the
ActivityWorkItem.
Purpose and Intent
- Response Correlation: The sidecar uses the
completion_token to reliably match an ActivityResponse
(from CompleteActivityTask) to the original task it dispatched. - Stateless Tracking: It allows the sidecar to remain stateless or minimize state lookups when receiving a
completion, as the token contains (or points to) the necessary context (instance ID, task ID, etc.).
- Zombie Prevention: If an activity times out and is retried, the new attempt will have a different
completion_token. If the original “zombie” worker eventually responds with the old token, the sidecar can easily
identify and ignore the late response.
SDK Implementation Guidelines
- Capture: The SDK MUST capture the
completion_token from the incoming ActivityWorkItem. - Propagation: The SDK MUST include the exact same
completion_token in the ActivityResponse sent via
CompleteActivityTask. - Opaqueness: The SDK MUST treat the token as a black box. It should not attempt to parse, modify, or construct
its own tokens.
- Storage: While the activity is executing, the SDK must keep this token in memory (e.g., in the
ActivityContext).
Task Activity IDs
In the Dapr runtime (specifically when using the Actors backend), activities are represented as actors. Each activity
execution has a unique Task Activity ID (also known as the Activity Actor ID).
The ID follows a specific pattern:
{workflowInstanceID}::{taskID}::{generation}
- workflowInstanceID: The unique ID of the workflow instance that scheduled the activity.
- taskID: The sequence number of the task within the workflow execution (e.g., 0, 1, 2…).
- generation: A counter that increments if the workflow is restarted or “continued as new”.
This unique ID ensures that activity executions are isolated and can be tracked reliably across retries and restarts.
Retries
Dapr handles activity retries based on the policy defined in the orchestration (if the SDK supports defining retry
policies in the ScheduleTask action). If an activity fails and a retry policy is in place, the engine will re-enqueue
the activity task after the specified delay.
From the activity worker’s perspective, a retry is simply a new ActivityWorkItem with the same name and input, but
potentially a different task_id (or the same, depending on the backend implementation).
Idempotency
Because activities might be executed more than once (e.g., if the worker crashes after execution but before reporting
completion), it is recommended that activity logic be idempotent where possible.
Comparison with Workflows
| Feature | Orchestration | Activity |
|---|
| Execution Style | Replay-based (Deterministic) | Direct execution |
| State | Managed via History Events | No internal workflow state |
| Side Effects | Forbidden (must use activities) | Allowed (IO, Database, etc.) |
| Lifetime | Can be long-running (days/months) | Usually short-lived |
| Connectivity | Connected via GetWorkItems | Connected via GetWorkItems |
1.5 - Workflow Protocol - State & History
Low-level description of the Workflow building block internals.
State and History Management
Dapr Workflows are event-sourced, meaning the state of a workflow is derived from a sequence of events. This document
describes how Dapr stores and manages this history and state.
Backend Storage: Dapr Actors
By default, the Dapr Workflow engine uses Dapr Actors as its storage backend. Each workflow instance is mapped to
a unique actor instance. This provides:
- Concurrency Control: Actors ensure that only one operation is happening on a workflow instance at a time.
- Reliability: Actor state is persisted in the configured Dapr State Store.
- Timers: Dapr Actors provide durable reminders which are used to implement workflow timers.
Workflow State Schema
The state of a workflow instance (actor) consists of several components:
Stores high-level information about the instance:
instance_id: The unique ID of the workflow.name: The name of the workflow.status: The current runtime status (Running, Completed, Stalled etc.).version: The name of the workflow version and any active patches.created_at: Creation timestamp.last_updated_at: Last activity timestamp.input: Original input data.output: Final output data (if completed).
2. History
A sequence of HistoryEvent objects that record everything that has happened in the workflow.
To optimize for large histories, Dapr often stores history events in chunks or as separate keys in the state store:
- Key Format:
wf-history-<instance_id>-<index> - Event Content: Serialized protobuf message containing event type, timestamp, and type-specific data (e.g.,
TaskScheduled, TaskCompleted).
3. Inbox (Pending Events)
A collection of events that have occurred but have not yet been processed by the orchestrator (replayed). This includes:
- External events raised to the workflow.
- Completed activity results.
- Fired timers.
When the orchestrator next runs, it “drains” the inbox, moves those events into the history, and then replays the logic.
Replay and State Reconstruction
When a worker (SDK) receives a work item, Dapr provides the history events. The SDK reconstructs the internal state
of the orchestration (e.g., local variables, current execution point) by replaying these events in order.
Determinism and History
The history is the “source of truth”. If the orchestration code changes in a non-deterministic way (e.g., adding a new
activity call in the middle of existing code), the replay will fail because the code’s requests won’t match the
recorded history.
Purging State
When a workflow is purged, Dapr deletes the metadata and all associated history event keys from the state store. This
is typically done to clean up after finished workflows.
1.6 - Workflow Protocol - Versioning
Low-level description of the Workflow building block internals.
Workflow Versioning
Dapr Workflow supports versioning of workflow definitions, allowing you to update workflow logic while existing
instances continue to run on their original logic.
Named Workflow Versioning
When registering a workflow with the Dapr engine, you can provide a version name. This allows multiple versions of the
same workflow to coexist.
- Default Version: One version of a workflow can be marked as the default. If a client starts a workflow by name
without specifying a version, the default version is used.
- Specific Version: Clients can request a specific version of a workflow when starting a new instance.
Registration API
SDKs register versioned workflows using the AddVersionedOrchestrator (or similar) method in their task registry.
// Example (Internal Registry API)
registry.AddVersionedOrchestrator("MyWorkflow", "v2", true, MyWorkflowV2)
registry.AddVersionedOrchestrator("MyWorkflow", "v1", false, MyWorkflowV1)
Sidecar Versioning
The Dapr sidecar tracks which named version of a workflow an instance is running as well as the list of applied
patches observed up to that point in the workflow execution. This information is stored in the workflow
history within the OrchestratorStarted event’s version field.
message OrchestrationVersion {
string name = 1;
repeated string patches = 2;
}
When an instance is resumed (e.g., after an activity completes), the Dapr engine is responsible for recovering from
stalls introduced by a version mismatch on the client. It does so by monitoring changes to the placement table, indicating
that a new SDK client has connected (potentially representing a newer replica instance of the app) and dispatching a
repeat of the last event in an attempt to retry the operation. This time, if the SDK is able to successfully complete
the task, the runtime will chnage the workflow status out of Stalled and back to Running.