Enabling data culture in organizations with purposeful documentation

How much documentation do we need to change a light bulb?

A lot, as the end-users may not always understand what you write in the first place.

Data end-users, especially non-technical business users, likely spend most time understanding what kind of data is being presented to them to make sense of the insights being used for their next business plans. In other words, the biggest fear of a business analyst may be hearing this sentence from the BI developer team: “Sorry, you’ve used the wrong metric for that sales report, it’s not what you think it is.” That means a new business campaign with “wrong” insights to benchmark might have to be reviewed.

The problem that non-technical end users are facing is real – the inability to navigate and having little control over which data dimensions are being presented to them. The rooting reason for this problem is how the management team is initially viewing and handling data as assets, leading to an insufficiency of purposeful documentation in the development phase, and the availability of suitable tools to navigate to understand their current data in the launching phase. What could have been done better?

Set documentation as a cornerstone for the data project

Documentation is so often taken rather lightly or not always on top of projects’ priorities especially when it comes to tight deadlines to follow. Skipping documentation could have helped move quickly, but in the long term, creates a nuisance for end-users, who use it on a more frequent basis than developers. As it goes, it takes a while to debunk technical logic along with relevant business requirements whilst a documentation done properly could have saved a lot of cognitive load for users, even for developers who usually are the documentation owners themselves. The documentation process should be performed throughout the lifetime of the data as changes arise, either from business feedback or logic requirement changes, to keep the data relevant.

Setting the right mindset at the beginning of the project will help clear up potential inconveniences and create a smoother user experience sustainably.

Documentation in dbt

Saying that you are using the most modern data stack. You have extracted and loaded all data sources into your data lake, you have applied dbt to transform data and built a beautiful data model, and you have provided a clear and detailed description of each column on dbt that is comprehensible and logical. Why not go a step further pushing these details to your end-users, such as business analysts, who might have limited access to your dbt code, but operate more often on data warehouse platform UI, e.g. Snowflake, Google BigQuery, etc

Try using +persist_docs, either as a config block on the model or your entire dbt project, this will probably help improve the lives of a lot of BI end-users.

Before configuring persist_docs
After configuring persist_docs

Check out the dbt documentation to config persist_docs on your dbt project.

models:
  [<resource-path>](resource-path):
    +persist_docs:
      relation: true
      columns: true
{{ config(
  persist_docs={"relation": true, "columns": true}
) }}

select ...

Documentation in Looker

Different strokes for different folks. Different end-users might expect different depths of detail as to how a metric is created. Users should be best provided with proper tools or training courses to tailor various expectations and needs. For example, Looker has become a strong modern data platform that empowers the entire organization to easily see and analyze their data, at the same time being the playing field for LookML developers with more advanced skills. A business user wanting to explore an insight on the difference between the performance of those who use a specific discount code and who don’t will only need to know where to find that correct dimension among many dimensions discount_code. Meanwhile, a data analyst might want to check the relationship from one dataset to another, whether it helps them to produce the correct result or not. The level of concern varies and should be approached with different ways to resolve it. Several tools available on Looker can be introduced to different users to elevate the data exploring experience: dimension definition, data dictionary, and LookML diagram.

The most basic and frequently used tool is LookML dimension/measure description, which is a default configuration to declare in any Looker model. End-users can find the definition of a field before deciding which one to use. If you have a Looker Developer role, simply declare the description of the dimension or measure in the view and save the changes, the definition for a specific field will appear in Explore.

 

Adding description for order id, source: Looker documentation (1)

 

Meanwhile, a Looker Data Dictionary is a summary of all dimensions’ information in the LookML project with definitions, data type, or how it was created. This is quite helpful in auditing, e.g. whether a field has a description or not, or as a collaborating medium between end users and the data team (through comments). This tool will make sense only when sufficient descriptions are provided in the Looker model in the step above. Reading more on how to enable Looker Data Dictionary on your LookML project and guide your users to use it.

Dictionary Filtered to Fields that Contain the Word lifetime, source: Looker documentation (2)

 

More advanced users can take a look at the LookML diagram to check the relationship between one dataset to another, finally ensuring the precision of their data analysis. More details on setting up and using the LookML diagram can be found here

Elements in LookML diagram, source: Looker documentation (3)

Going beyond

As the business grows with an emerging data culture, more people are involved in the data usage, more data sources come into presence and shared knowledge to ensure trusted and open access becomes more relevant. Data governance outgrows simple documentation and demands a consolidated source of data knowledge built on a collaborative and interactive platform where information is alive. Modern data stack then will likely go hand in hand with more advanced metadata management products, e.g. data catalog tools. In this manner, end-users play the key driver of data transformation and co-create to make data usable.

Vision for active metadata, source: Atlan Data Catalog Primer report (4)

From an IT department with IT personnel to a data analyst and a data team with diversified roles, companies’ vision in improving data availability and data democratization is a long essential walk for data-driven solutions. For that to be realized, starting from small steps, and minding your documentation to serve the most direct users will introduce those changes gradually and insightfully to build a more holistic data culture in your organization.


Reference

Hanh Tran
Like what you read? Share with a friend.

Contact Us!

Do you have a question or need more info? Please enter your information and describe your inquiry, and we’ll get back to you as soon as possible. Thanks!

Check out our Case Study!

We’d love to hear more from you. Tell us more about yourself and we’ll message you with our case studies as soon as we receive your message!