Semantic Models for Data Translation

Late last week, Microsoft’s Christian Wade made the announcement that “Datasets” are being renamed to “Semantic models” in Microsoft Fabric. Many business users and developers (myself included) likely discovered this unexpected change last Friday.

The name change helped clarify the purpose of what used to be known as the dataset, which is important for understanding how to best think about and use the product. This article discusses the announcement and explores what it means for analysts using Microsoft Fabric and Power BI.

Dataset vs. Semantic Model

Before we discuss the article, we need to briefly review the differences between a dataset and a semantic model. A dataset, as the name implies, is typically a set of data – often comprised of multiple sources – used primarily to store and manage data for analysis and reporting. A dataset represents the actual data in the repository. A semantic model, on the other hand, is a layer on top of your data that serves as a bridge between the technical details of the data and the business understanding of the data.

In the context of a historical document, you might think of the dataset as the original text, and the semantic model as the translation into English. The content doesn’t change (in a good translation), but the way it’s presented and communicated does in order to make it accessible to new readers.

As becomes clear in the announcement, Microsoft is distinguishing the newly named semantic layer as the key tool for the purpose of data translation.

Too Generic

The main reason for the change is, “In the age of Fabric, the term “dataset” is too generic…” Wade is absolutely right. One of the key features of Fabric is that it consolidates multiple products that used to be available in Azure and Power BI into one platform. Many of these products (data warehouse, data lakehouse, data mart, etc.) are essentially datasets that have unique names representing the nuanced roles they play in the overall ecosystem. The Power BI “dataset” was one of the holdovers that required a more specific name to keep it from being grouped with the numerous other products that could also be considered datasets.

Wade later states, “We acknowledge that this may cause some disruption, but it’s necessary for disambiguation from other Fabric items. Over time, this will make the product clearer and more usable” (emphasis mine). Making clear the distinction between the semantic model and other products helps highlight its unique purpose.

How to Use a Semantic Model

What is that purpose? How should an analyst think about a semantic model? What role should it play?

With the advent of direct lake in Fabric and OneLake, it’s becoming increasingly unnecessary for duplicate copies of data to be stored in a semantic model. Instead, one copy of the data lives in OneLake, and the semantic model provides the translation layer for presentation to the business user. It also extends it with additional capabilities such as calculation groups, DAX queries, and more, some of which I discuss in a previous post.

This isn’t at all different from how datasets have always operated, except that the data is more likely to be stored elsewhere. The main difference lies in how we ought to be thinking about the role of this data layer, given the name change. A semantic model is concerned about the language used to represent the data – language that will likely become increasingly important as AI tools and natural language query begins to fill the reporting landscape.

What I’m suggesting is that we may need to become better at specifying metadata (renaming tables, columns, measures, adding descriptions, etc.) in the semantic model in a way that is easy to understand for both ad-hoc business users and AI tools. What used to be renamed on the report page may need to be made explicit in the model itself. Complex data models may become less valuable than simple, star schema models with clear relationships. Reports that used to be built by Power BI developers may be built by AI copilots that leverage the semantic models specified by the same. The work may be shifting from building the house, so to speak, to building an accurate and clear foundation for the house.

This name change is but another evidence that leads me to reconsider how analytics will shift during the age of AI and self-service. As we focus our efforts on creating semantic models that accurately translate data, we (hopefully) better enable the quick insights that Power BI has always sought to provide to business users.

I’ll be closely following Christian Wade’s session on semantic models this week during Microsoft Ignite to see more details on where the product is heading and any further implications for analysts.

What do you think is the role of a semantic model? How can it best be used to improve the clarity of data for business users?


Posted

in

by

Tags:

Comments

Leave a comment