#5 Curate Data and Make It Accessible for Self-Service

May 23

By Pacific Northwest Data Analytics Leadership Board members Thuy Le and Christopher Anderson with case study contributions from Thuy Le and Yvonne Yeung

What is Self-Service Data?

Let’s imagine data analyst Bob, hard at work preparing this month’s sales reports for company leadership. He’s nearly finished, but Nancy in Accounting pings him: “how many widgets did we sell in North Dakota in July? Who were the top five salespeople last quarter? What month saw the highest number of widgets sold in 2021? Need this by 3:00. Thanks.” Given his other responsibilities, this puts Bob in a difficult position. What if Nancy could answer these questions herself?

Self-service data sounds great in theory, but a lot of preparatory work needs to be done to achieve it. Before listing the prerequisites, let’s define the term. “Self-service data” means that non-data experts can leverage data sources to answer business questions without the assistance of a data professional (whether that is a data analyst, data scientist, business intelligence analyst, or similar role).

Self-service data often makes analytics more efficient. A marketing or sales professional can access their data and insight needs without the company having to pay for a data expert, who may be highly compensated, to perform analysis. There is also less potential for the business question to be lost in translation, given the data analyst may fall short on required background knowledge of the business. Importantly, the data analyst can focus on other responsibilities, rather than spending their time answering ad hoc questions.

The benefits of self-service data are not just realized at the end. Monte Carlo, a company that specializes in data reliability, emphasizes that the journey to self-service data includes milestones that are worthwhile in their own right: data documentation, literacy, discovery, and quality. Even if the self-service data experiment proves unsuccessful, the investment should see returns in the form of those other benefits.[1]

Members of the Pacific Northwest (PNW) Data Analytics Board identified four key requirements for successful self-service data implementation. They include:

1. Invest in Trusted Data

2. Leverage Cloud Infrastructure

3. Create a “Data Mindset” in the Organization

4. Choose the Right Business Intelligence Tool for the Job

Invest in Trusted Data

The first step may seem obvious, but it still bears mentioning: organizations where people place high trust in their data may be good candidates for self-service. Conversely, companies with challenges impacting their ability to build trust – e.g., data governance and data quality issues – should first develop a plan for resolving these issues before tackling self-service data. Documentation like data dictionaries, data catalogs, clearly defined parameters, data filters, calculation methods, etc. are musts so that data and reporting nuances are commonly understood and agreed upon and disagreements can be settled.

“Single source of truth” has become a cliché, but it rings true in this instance. It is difficult to trust your data or have non-data experts leverage it if you can’t even agree on which dataset or variables you need to use to answer the question.

Thuy Le, a Senior Manager in Operations and Analytics at Zillow, experienced similar challenges. Business stakeholders in operations, marketing, and sales needed more accurate insight on rental property transactions, the impact of promotions on sales, and tracking of rental cancelations. Operations, data science, and finance teams each frequently provided reports and insight, however each team used different data sources leading to differing data, analysis, and interpretation. In addition to the disparate data sources, parameters, and assumptions used to develop reports and analytics lacked explanation as it became very time consuming. Executives and the stakeholders in operations, marketing, and sales who made decisions for the business had low trust in the insights they received.

To address the above challenges at Zillow, a task force was convened to drive alignment on solutions. The group created a single source of truth for the data that could be used across the operations, data science, and finance teams. The change led to much more trust from stakeholders in the data that now agreed across reports. The company was able to more confidently understand their rental sales, marketing, and operational progress and make decisions and changes with greater confidence. To learn more, visit the case study at Thuy Le Real Estate Case Study (pacnwdataanalytics.net).

Leverage Cloud Infrastructure

A company that has yet to migrate to the cloud should consider cloud transformation as another prerequisite, much like data quality. Trying to deliver self-service data with on-premises data architecture can be costly and ineffective due to the volume and complexity of modern data sets. Conversely, a company that has migrated its data to the cloud is that much closer to realizing the goal of providing self-service data. Self-service works best if data is available anytime, from anywhere, so that employees across business functions have easy, timely access. A cloud data warehouse offers the ability to store vast amounts of data efficiently and can be scaled as infrastructure requirements grow. [2]

Yvonne Yeung, Director of Data Services and Platform Engineering, experienced the challenges inherent in data being stored on-premises in her large retail organization with legacy infrastructure. This led to high costs, slow processing, difficulty scaling, and ultimately challenges delivering insights. She helped create a cloud-based data exchange that brought formerly siloed customer, retail, and employee data together into a single data lake platform with standardized formats. The data was much more easily served to analytic communities as many time-consuming, batch ingest processes were reduced. The cloud also enabled faster tuning of data science models that produced customer purchase recommendations. The cloud data platform drove business value with better insights from analytics and data science. To learn more, visit the case study at Yvonne Yeung Retail Case Study (pacnwdataanalytics.net).

Create a “Data Mindset” in the Organization

The organization needs to be far along with the elements their data journey cited above and its people need adequate training to analyze data themselves. It is up to the organization whether the cost of training is worth the potential benefit of self-service data. Moreover, having a “data mindset” with data critical thinking skills is paramount. This entails knowing what data is available to answer business questions along with a basic understanding of how to combine and manipulate data.

According to another member of the PNW Data Analytics Board, it is also critical that the business stakeholders want self-service data, rather than it being pushed on them. Otherwise, self-service data will be a rarely used feature and the organization will not realize the benefits.

Matillion, a cloud data transformation company, writes that large enterprises can easily have hundreds or thousands of data sources across multiple systems, which could be structured or unstructured data. Employees at large organizations may have a higher bar to clear given the volume of data they need to navigate. Establishing a “data mindset” in support of the data self-service transformation will not happen overnight and takes time and investment.[3]

Choose the Right Business Intelligence Tool for the Job

To assemble a bookshelf, selecting a screwdriver over a saw is an obvious choice for tightening screws. You can buy a screwdriver for a couple dollars, but for a larger sum you can buy a drill and make the job quick and easy. Or perhaps hiring a handyman is the right solution for you.

Likewise, choosing the right business intelligence tool can make a difficult job easy and implementing self-service data is no exception. The tool needs to have the processing power to handle the organization’s volume of data and the queries being processed. It needs to be user-friendly enough to avoid having employees spend weeks in training. The tool needs to be able to answer questions it is likely to be asked. Finally, some companies can afford to spend thousands of dollars on licenses for a wide set of employees while others cannot. The latter can consider low cost or free options.

The organization needs to be thoughtful and do an honest self-assessment to select the right tool. Just because competitor in the same industry is using Tableau doesn’t mean it fits the company considering the selection.

The correct answer may be more than one tool. A one-tool-fits-all strategy could be a formula for low adoption or frustrated users. Mapping the users into different personas by skillset is a useful exercise. One potential outcome is a low-code/no-code tool for the bulk of users and a more code-heavy, sophisticated tool for data analysts, data scientists, and engineers.

Recap

For self-service data to be successful, organizations need to have high quality, centrally stored data with an infrastructure ready to support broad use. Employees need to have a data mindset and the desire to perform data analysis to answer business questions themselves with the right business intelligence tools to do so. Many corporations describe themselves as “data-driven” but the above characteristics separate the ones that purport to be data-driven from those that are.

Implementation of self-service data can start on a small scale as a proof of concept. After this experiment, leadership can decide whether to release full-scale self-service capabilities. Self-service data allows companies to answer questions that come up quickly and efficiently and make decisions based on data, rather than on intuition and business as usual or by waiting weeks to months for an answer.

With self-service data empowering people to pull their own metrics for presentations, data analysts and engineers are freed up to focus on proactive projects that move their companies forward to the next stage of data maturity and business impact!

[1] https://www.montecarlodata.com/blog-is-self-service-datas-biggest-lie/

[2] https://www.matillion.com/resources/blog/data-self-service-5-steps-toward-implementing-it-plus-a-bonus-step

[3] https://www.matillion.com/resources/blog/data-self-service-5-steps-toward-implementing-it-plus-a-bonus-step

Data AI Board