Future of the Metrics Layer with Drew Banin (dbt) and Nick Handel (Transform)

Hot takes on what we get wrong about the metrics layer and where it fits in the modern data stack

How would you explain the metrics layer to a beginner data analyst?

Since it’s a new concept, there’s a lot of confusion about what really the metrics layer is. Drew and Nick cut through the confusion with succinct definitions about creating a common source of truth for metrics.

Define your metrics once and reference them everywhere so that if your metrics ever change, you get updated results everywhere you look at data.”

Nick Handel: “The way that I’ve explained it to family and people who are totally out of the space is just, businesses have data. They use that data to measure their operations. The point of this software is basically to make it really easy for the data analysts (the people who are responsible for measuring that data) to define those metrics, and make it easy for the rest of the business to consume that single correct way to measure that data.”

What is the real problem the metrics layer is looking to solve?

Nick and Drew explained that the metrics layer is motivated by two key ideas: precision and trust.

“It comes down to those two things: productivity and trust. Is it easy to produce the metric, and is it the right metric? And can you put it into whatever application you’re trying to serve?”

Drew: “That’s really good framing. I just look inwards at our organization. The very first metric we ever created was weekly active projects — how many dbt projects were run in the previous seven days? Now we’re about 250 people and we’re measuring so many things across the business with lots of new people around.

“We’re trying to make sure that when someone says ‘weekly active accounts’ or ‘MRR’ or ‘MRR split by manage versus self-service’, we all mean exactly the same thing.”

Drew and Nick also emphasized change management as both a major challenge and use case for the metrics layer.

How should we think about the metrics layer, and how should it interplay with other components of the modern data stack?

Nick broke the metrics layer down into four key components (semantics, performance, querying, and governance), while Drew focused on its role as a network connecting a diverse set of data tools.

“That’s why the idea that we call this the ‘metrics layer’ makes sense. It is a single abstraction layer that everything can interface with so that you can get precise and consistent definitions in every single tool.

“To me, that’s where metadata really shines. Like, this is the metric, this is how it’s defined, this is its provenance, here’s where it’s used. This isn’t actually the data itself. It’s attributes of the data. That’s the information that can synchronize all these different tools together around shared data definitions.”

What metadata should we be tracking about our metrics, and why?

Nick and Drew shared that metadata is key for understanding metrics because companies lose important tribal knowledge about data outages and anomalies over time as staff changes.

“Products change, tables change, everything changes. Even the definitions of these metrics evolve. But most businesses end up tracking the same North Star metrics from the very early days. If you can attach metadata to it, that is incredibly valuable.

“At Airbnb, we tracked nights booked. It was important from the very early days when BI was literally a printed-off graph that they put on the wall, and it’s still the most important metric that the company talks about in the public earnings calls. If we had been tracking important metadata through time of what was happening to that metric, there would be a wealth of knowledge that the company could use.”

“In practice, the people that have been around for the longest time have the most context and probably know more than any of our actual systems do.

“We had a period where we had a little bit of data loss for some events we were tracking. It looked like, I think it was, May 2021 was the worst month ever. But really it was just like, no, we didn’t collect the data.

What are the real use cases for a metrics layer?

Drew and Nick called out a lot of potential applications for the metrics layer — e.g. improving BI and analytics for early-stage data teams, helping business and data people use data models in the same way, and making valuable but time-consuming applications (like experimentation, forecasting, and anomaly detection) possible for all companies.

“Many companies out there are not at the data science and machine learning part of their journeys yet. Things that make business intelligence and reporting better (more precise and more consistent) cover 90% of the problems that they’re trying to solve with data.

“Casting our minds forward, I think that there could be a ton of benefits to leveraging metrics for data science use cases.

“With machine learning, you try and get as close to the raw data sets as possible. With analytical applications, you try and process that information into the clearest and best picture of the world.

“One of the applications that I always think about is experimentation. The reason we built a metrics repo initially was experimentation.

“Basically, we needed some programmatic way to go and construct metrics. It’s a hugely valuable application for companies that do it, but very few companies have the infrastructure or build the tooling to do this. I think that that’s really unfortunate. And it’s probably the thing that I’m most excited about the metrics layer.

“If you think about every data application as having some cost and some benefit — the more you can reduce the cost of pursuing that application, the more clearly the justification becomes to pursue some new application.

Let’s jump into some questions about the metric layer and the modern data stack.

First, let’s talk bundling vs unbundling. Should the metrics layer even be a separate layer, or should it be part of an existing layer in the stack?

As with every debate in the data ecosystem, we ended up just answering, it depends. Drew and Nick explained that how we solve this problem is ultimately more important than how we define that solution.

“The word ‘layer’ makes sense only insofar as it’s a layer of abstraction. But otherwise, I reject the terminology, although I can’t think of anything too much better than that.

“The last thing I’m going to say on bundling and unbundling… For this thing to work, it does need to be an intermediary between a very big network of different tools. Treating it as a boundary like that motivates which tools can build it and provide it. It’s not something you would see from a BI tool, because it’s not really in a BI tool’s interest to provide the layer to every other BI tool — which is like the thing that you want from this.”

“Basically, people have problems, and companies build technologies to solve problems. If people have problems and there is a valuable technology to build, then I think it’s worth taking a shot and trying to build that technology and voicing those opinions.

“Ultimately, I think that there are good points there of the connection to different organizational workflows. This is not something that I think we’ve done a good job of explaining, but I think that the metrics store and the metrics layer are two different concepts.

For a traditional company that already has a data warehouse and BI layer, where does the metrics layer fit into their stack?

Again, the answer is that it depends — sigh. The metrics layer would live between the data warehouse and BI tool. However, every BI tool is different and some are friendlier to this integration than others.

If a company has already defined a ton of metrics within their BI tool, what should they do?

Nick and Drew explained that slow and steady wins the race when you aren’t starting from scratch. Instead of planning a huge overhaul, start with one team or tool, integrate a better metrics layer, and test how it works for your organization.

Can’t a metrics layer just be part of a feature store?

Since Nick has built multiple feature stores and metrics layers, he had a strong opinion on this topic — while the metrics layer and features store are similar, they are too fundamentally different to merge right now.

“Everyone wants features to be specific to their model. Nobody wants metrics to be specific to their team or their consumption. People want metrics to be consistent. People want features to be unique and whatever benefits their model.

“Real-time versus batch — this is a super challenging problem in the feature space. Organizational governance is way important for the metrics layer. The technical definitions are often different. The level of granularity is different for features — you go way finer with features than you do metrics.”

Do you believe a caching layer is critical for a metrics layer?

This was a resounding YES from both Drew and Nick. Caching makes the metrics layer fast, which is critical for ensuring that data practitioners actually use it. However, it’s important that this caching doesn’t replicate data.

“The difference between something taking a minute plus to come back and not coming back at all is negligible in a lot of cases. So, conceptually, I’m very aligned with the idea of caching metric data and being able to serve it up really quickly.

“I will just say — and I think we’ve been open about this in the past — we probably won’t do that for V1 of metrics within dbt. But conceptually, I’m pretty aligned with that being an important part of the system long-term.”

Finally, if you were handed a megaphone and could blast out a message for the entire data world, what would you say?

Drew:

“There are a lot of problems in data that you can solve with technology, but some of the hardest and most important ones you must solve with conversations and people and alignment and sometimes whiteboards. Knowing which kind of problem you’re trying to solve at any given time is going to help you pick the right kind of solution.”

Nick:

“I think the metrics layer is basically a semantic layer with an additional concept of a metric, which is super important. So I would just say, the metrics layer should be backed by a general-purpose semantic layer. The spec and the definition of that semantic layer and the abstractions is so unbelievably important.”

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Prukalpa

Co-founder of Atlan (atlan.com), the active metadata platform for modern data teams | Weekly newsletter for data leaders: metadataweekly.substack.com