Databricks provides several built in debugging tools for data engineers to understand, troubleshoot, and optimize their data pipelines. These tools give you visibility into what's happening at every layer of your application, from cluster resource usage to individual task execution times. When something goes wrong, having quick access to the right debugging information can save you hours of guesswork.
The exam expects you to know which tool to reach for in different scenarios. If you need to understand why a Spark job is slow, you'll want the Spark UI. If you need to see what a notebook cell printed out, you check the cell output. If you're debugging a cluster configuration issue, you look at the cluster event log. Understanding the purpose and access point for each tool is critical.
The good news is that most of these tools are already available in the Databricks interface. You don't need to install anything or write complicated logging code. They're just waiting for you to find them and use them.
# Using print() for simple output (shows in cell output and driver logs)
print("Processing started at", spark.sql("SELECT current_timestamp()").collect()[0][0])
# Using display() to render a DataFrame beautifully
df = spark.read.table("bronze_customers")
display(df.limit(100))
# Debugging with print() and a condition
row_count = df.count()
print(f"Total rows in bronze_customers: {row_count}")
if row_count == 0:
print("WARNING: Table is empty!")
# Using display() with SQL for interactive exploration
display(spark.sql("SELECT * FROM bronze_customers WHERE status = 'ACTIVE' LIMIT 50"))
-- Check cluster configurations and current states
SELECT cluster_id, cluster_name, state, worker_count, driver_node_type_id
FROM system.compute.clusters
WHERE state = 'RUNNING'
ORDER BY cluster_name;
-- Audit log: who accessed what and when
SELECT timestamp, user_identity.email as user_email, action, object_type, object_id
FROM system.access.audit
WHERE action LIKE '%READ%' OR action LIKE '%WRITE%'
ORDER BY timestamp DESC
LIMIT 100;
-- Find recent access to a specific table
SELECT timestamp, user_identity.email, action
FROM system.access.audit
WHERE object_type = 'TABLE' AND object_id = 'silver_orders'
ORDER BY timestamp DESC
LIMIT 50;