DDL and DML are the core languages for managing data in Databricks. DDL (Data Definition Language) creates and modifies table structures, while DML (Data Manipulation Language) handles data operations like inserts, updates, and deletes. What makes Databricks special is that Delta Lake extends both with powerful features like time travel, atomic transactions, and upsert operations that aren't available with traditional data warehouses.
For the Associate exam, you need to understand how to create and modify tables, query data at different versions, optimize table performance, and use MERGE for upsert patterns. The exam heavily tests your knowledge of Delta specific capabilities like time travel and OPTIMIZE/VACUUM commands.
These commands are used every day in production data pipelines. Understanding them well means you can build efficient, maintainable data solutions and debug issues when they arise.
MERGE INTO: The Classic Upsert Pattern
MERGE INTO customers t
USING new_customers s
ON t.customer_id = s.customer_id
WHEN MATCHED AND s.is_deleted = true THEN DELETE
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
This statement handles three cases: if a customer exists and is marked deleted, remove it; if they exist, update their record; if they don't exist, insert them. The * syntax matches all columns automatically.
Time Travel Queries