Managing Databricks Costs as We Scale

My team doubled in size this year, and with that kind of growth, it’s easy for costs to get out of hand when you’re using usage-based platforms like Databricks. But despite more users and more workflows, we’ve kept our total cost of ownership low. We did this by being intentional — we’ve kept everything self-contained inside Databricks, avoiding expensive third-party ETL tools, and leaning on open-source where it makes sense. Here’s what’s worked.

Smarter Monitoring

We require all clusters and jobs to be tagged with project, environment, and owner. This lets us tie usage directly back to use cases and users. If you can’t see where the money is going, you can’t control it. We also set up budget monitoring, a fairly recent feature in public preview, that sends alerts whenever we’ve hit a certain cost threshold in a month.

Effective Cluster Management

All pipelines run as job clusters. That means the cluster spins up, does its work, and shuts down — no lingering compute. For our all-purpose compute, we typically set our auto-termination to 2 hours; long enough to come back to between meetings but short enough to save us in costs in the off-hours.

In-House Workflows

We deliberately avoided using external orchestration or ETL tools like Airflow or Fivetran. Everything — ingestion, transformation, scheduling — happens within Databricks using workflows, notebooks, alerts, and queries. It keeps architecture simpler and avoids tool sprawl.

Embrace Open-Source

As our team grew, it got harder to manage all the SQL and python logic buried in notebooks. We’ve started piloting dbt Core (the free version) to bring some structure and CI/CD into our SQL pipelines. It runs directly inside Databricks via workflows, giving us automated testing without requiring a SaaS subscription or vendor lock-in.

Audit and Improve

I review our data engineering workflows every month, keeping track of metrics like number of jobs, average runtime, failure rate, etc. This data is scraped from the Databricks Jobs API and collated into a Tableau Report that’s updated on a nightly basis. The top 10 longest running jobs are identified and that helps me prioritize where we should be optimizing our workloads.

Scaling doesn’t have to mean spending more. Connecting with my peers, I often find that our SaaS contracts are often the lowest, yet (and I’m humble-bragging a bit here) our team is very successful at growing and maintaining our data footprint. Databricks gives you all the tools to run lean, you just have to use them intentionally. We’ve kept our stack tight, built everything natively, and only brought in open-source tools like dbt Core when the complexity called for it.

The result is a modern, scalable data platform that doesn’t blow the budget — even as our headcount keeps growing.

Smarter Monitoring#

Effective Cluster Management#

In-House Workflows#

Embrace Open-Source#

Audit and Improve#

Smarter Monitoring

Effective Cluster Management

In-House Workflows

Embrace Open-Source

Audit and Improve