• June 27, 2025

SnowPro – Streams and Task

Streams 

  • Are used for Change Data Capture
  • To query a stream a Role must have SELECT privilege on the underlying table
  • Stream object record DML statements INSERT, UPDATE and DELETE
  • A table for which changes are recorded for a stream is called the SOURCE table
  • Any number of stream objects can be created on a single SOURCE table
  • Streams have no Fail-safe or Time retention period
  • Metadata of streams CANNOT be recovered if the Stream is DROPPED
  • Standard streams cannot retrieve changes for geo spatial data. Used APPEND ONLY streams
  • Streams can be queried multiple times in the SAME transaction and return the same data
  • Streams can be cloned
  • Cloned Stream objects inherit the current offset
  • Stream objects CANNOT be SHARED like a table object
  • Creating an initial stream automatically enables change tracking on a table

Streams Source Objects 

  • Permanent tables
  • Transient tables
  • Temporary tables
  • External tables
  • Directory tables
  • Views (DML changes to the underlying tables are tracked)

Types of Stream

  • Standard (Default)
  • Insert only 
  • Append only

Stream columns

Streams has 3 additional columns to the source table

  • METADATA$ACTION – records DML operation (INSERT and DELETE)
  • METADATA$ISUPDATE – records if the operation was part of an UPDATE
  • METADATA$ROW_ID – immutable ID for a row

Shared Tables

  • Stream objects can be created on SHARED tables
  • The retention period for the source table is not extended automatically to prevent any streams on the table from becoming stale

Streams on Views

  • Any stream on a given breaks if the View or the Underlying tables are dropped or recreated
  • Any stream on a Secure View adheres to the constraints of the Secure View
  • Streams based on views where the view uses non-deterministic functions can return non-deterministic results

Repeatable Read Locations

  • In repeatable read mode, multiple SQL statements within the SAME transaction, sees the same set of records in a stream
  • The stream position advances to the transaction start time if the transaction commits

Tasks 

  • Can execute a single SQL statement, a Stored Procedure call or a procedural logic using Snowflake Scripting
  • Accounts are currently limited to 10000 resumed task
  • Can be combined with a table Stream for continuous ELT workflows
  • Either Snowflake manage compute or user managed warehouses can be used
  • Omit the WAREHOUSE parameter while creating a task to use Snowflake managed compute
  • Creating task with Snowflake managed compute requires the global EXECUTE MANAGED TASK privilege
  • Maximum size Snowflake will scale up the managed compute is equivalent to a XXLARGE warehouse
  • Slowflake managed compute task (Serverless Task) cannot invoke UDF containing Python or JAVA code
  • Serverless task also cannot involve Stored Procedures written in SCALA or which call UDF containing Python or JAVA code
  • Requires CREATE TASK privilege on the schema to where the task is being created

DAG

  • A series of task composed of a single root task
  • DAG flow is single direction. A task cannot prompt a task earlier in the series
  • A task runs only after all is predecessor has run successfully to completion
  • Each task except for the ROOT task can have multiple predecessor tasks
  • Each task can have multiple children tasks
  • The ROOT task should have a defined schedule that initiates the run of a DAG
  • Each task other than the ROOT task should have at least one predecessor task
  • A DAG is limited to 1000 task in total including the ROOT task
  • A single task can have a maximum of 100 predecessor tasks and a maximum of a 100 children tasks
  • All task in a DAG must be in the same schema
  • The ROOT task must be suspended before any task in recreated
  • The ROOT task must be suspended before any new task can be added
  • Newly created tasks or cloned task are in the SUSPENDED state
  • If another role clones the task, then the task clone can have a schedule and not the predecessor
  • Snowflake guarantees only one instance of a task with a schedule is running at a given time
  • Snowflake does not provide guarantee that only one instance of a task with a predecessor is running at a given time
  • When a DAG runs with one or more SUSPENDED task, these suspended task will be ignored
  • A task with multiple predecessor tasks runs if atleast one predecessor is the RESUMED state and all the resumed predecessors task runs to successful completion.
  • A brief lag occurs after a parent task finishes running and child task is executed

Scheduling

  • A standalone task or the ROOT task in a DAG is generally run on a schedule
  • Snowflake ensure only one instance of a task with a schedule is running at a given time
  • If a task is running when the next scheduled execution time occurs, then that scheduled time is skipped
  • EXECUTE TASK manually triggers a single run of a scheduled task independent of the scheduled defined on the task
  • Requires EXECUTE TASK privilege on the task to manually execute the task

Ownership

  • All task in a DAG must have the same owner
  • Transferring ownership of a task severs the dependencies between any predecessors or children
  • If the current owner of all task in DAG is is dropped using DROP role, the ownership of those task are transferred to to the role that executes the DROP
  • Executing GRANT OWNERSHIP on all task in a schema explicitly transfers all task in a DAG to another role
  • If any task in the DAG is cloned the role that clones the DAG becomes the owner of the clone
  • If the owner of an original task creates the clone, the task clone retains the link to the predecessor. The original predecessor task triggers both the original clone and the cloned task

History

  • ACCOUNTADMIN, Task Owners or role with global MONITOR EXECUTION privilege can view task history
  • TABLE_HISTORY function to view history for a single task
  • CURRENT_TASK_GRAPH function to view details of current scheduled or running DAG
  • COMPLETE_TASK_GRAPH function to view history of DAGs that executed successfully, failed or were canceled in the last 60 minutes