Topic Overview

Databricks offers several types of compute, and picking the right one for the job is a common exam topic. The three main categories are All Purpose Clusters, Jobs Clusters, and SQL Warehouses. Each is designed for specific workloads, and choosing the wrong one wastes money or delivers poor performance.

The exam will present scenarios and ask you to identify which compute type is most appropriate. The key is to match the workload type (interactive development, scheduled production jobs, SQL analytics) to the right compute resource. You should also know about Serverless compute, which is Databricks' managed option that removes the need to configure and manage clusters yourself.

This is one of those topics where understanding the trade offs is more important than knowing deep technical details. Think about cost, startup time, who the user is, and what they are trying to do.


Key Concepts


Quick Comparison

Compute Type Best For Lifecycle Cost Languages
All Purpose Cluster Interactive dev, notebooks, exploration Manual start/stop Higher (always on) Python, SQL, Scala, R
Jobs Cluster Production ETL, scheduled jobs Auto created/terminated Lower (on demand) Python, SQL, Scala, R
SQL Warehouse SQL queries, dashboards, BI tools Auto scaling Optimized for SQL SQL only
Serverless Any workload, zero management Fully managed Pay per use Depends on type

Common Exam Scenarios

Scenario 1: Choosing Compute for a Production Pipeline

A data engineer has developed an ETL pipeline in a notebook and wants to schedule it to run every night at midnight. The pipeline processes sales data and loads it into a gold table. Which compute type should they use?

The answer is a Jobs Cluster (or Serverless compute for jobs). Since this is a scheduled production workload, you want compute that spins up when the job starts and terminates when it finishes. Using an All Purpose Cluster for a scheduled nightly job would mean either keeping it running 24/7 (expensive) or dealing with startup delays. Jobs Clusters are created automatically by the scheduler and terminated immediately after completion.

Scenario 2: Powering a BI Dashboard