Every data visualization project starts with requirements, and whether the requirements come from a problem or a decision, each project has a specific process. First, every project needs data to be visualized. In every data visualization project, many factors need to be considered to minimize risk and ensure project success.

This article explains many of these concepts as well as some use cases that can be used for specific types of businesses. One of the key themes to explore is risk, as minimizing risk is a key factor in deciding what data to use and how best to represent it for a particular chart type. In addition to the risks, the team may also face some limitations that have nothing to do with the data. People and skills on the team need to be considered, as this may limit the audience to which the visualization can be presented.

When designing a data analysis project, we often wonder where to start first. From data collection, cleansing, exploration, analysis, and visualization, a lot of work goes into gaining actionable and profitable insights into the business.

Step 1: Understand the business problem

At the start of a project, it’s important to have a clear understanding of the overall scope of the effort, the business goals, the information stakeholders are seeking, the type of analytics they want you to use, and the key deliverables. It is important to define these elements before starting the analysis, as it helps to provide better insight. Also, it’s important to be clear at the beginning, because there may not be another opportunity to ask questions until the project is complete.

Step 2: Understand the data set

This phase begins with initial data collection, followed by activities such as data quality checks, data exploration, to discover initial insights into the data, or to detect interesting subsets to form hypotheses for hidden information. We can use a variety of tools to understand data. Depending on the size of the data set, we can use Excel to manage the manageable data set, or use more rigorous tools such as R, Python, Alteryx, Tableau Prep, or Tableau Desktop to explore and prepare the data for further analysis.

The key thing to remember is to identify key variables to study the data, look for errors (missing data, data that doesn’t logically make sense, duplicate lines, or even spelling errors) or any missing variables that need to be modified so we can clean up the data properly.

It is important to note that when working in an enterprise/business environment, it can be helpful to involve people with keen knowledge of the source systems, such as DBAs, who can help understand and extract data.

Step 3: Data preparation

Once we have organized the data and identified all the key variables, we can start cleaning up the data set. Here, we handle missing values (replacing them with means, deleting rows, or replacing them with the most logical value), create new variables to help sort the data, and remove duplicates. Data preparation tasks may be performed multiple times and in no prescribed order. After this step, the final data set is ready for input to a modeling tool for further analysis.

From a business perspective, understanding of data structures, content, relationships, and derivation rules is required throughout the data preparation process. You must verify that the data is in a usable state, that you can manage its defects, and understand what is required to transform it into a useful data set for reporting and visualization. In this case, using data profiling can help explore the actual content and relationships in enterprise source systems. Data analysis can be as simple as writing some SQL statements or as complex as specialized tools. For example, Tableau’s data preparation is an excellent tool for analyzing data from small projects. For enterprises, many ETL vendors offer a variety of tools that you can choose from based on your business needs and budget.

Step 4: Modeling

In this step, we will use various modeling techniques to test the data and look for answers to a given goal. Often, there are multiple techniques for the same type of data mining problem, with specific requirements for data form. Common models include linear regression, decision tree and stochastic modeling.

Step 5: Verify

Once we have built the model (or models) and are ready for final deployment, the model must be thoroughly evaluated and the steps taken to build the model must be reviewed to ensure that it achieves the business goals correctly. Is the model working properly? Does the data need more cleaning? Did you find the results the client wanted to answer? If not, you may need to perform the previous steps again.

In this step, the key is to identify issues, definitions, transformation rules, and data quality challenges and document them for future reference. From a business perspective, such documentation is useful for future users. Maintaining a list of issues and validating new issues faced during data validation can significantly improve project quality and help expand the scope for future improvements and define the infrastructure needs of the business.

Step 6: Visualization

The creation of a model is usually not the end of the project. Even if the purpose of the model is to increase understanding of the data, the derived information needs to be organized and presented in a way that is useful to the customer. Depending on the requirements, this step can be as simple as generating a report or as complex as implementing repeatable data scoring (such as segment allocation) or data mining processes.

In many cases, data visualization is critical to communicating your findings to customers. Not all clients are data-savvy, and interactive visualizations like EasyV and Tableau are very useful for explaining your conclusions to clients, and being able to tell a story with your data helps explain the value of your findings to clients.

As with any other project, it is important to clearly identify business goals. Breaking the process down into multiple steps will ensure that we deliver the best possible deliverables to our customers.

Step 7: Documentation

An important complement to the steps in a data visualization project is documentation. Similar to projects completed in class, this document should provide a brief description of the project, data sources, data profile and quality, data limitations or situations that arise during data use, key transformations and models introduced and their impact or usefulness to improve the quality of visualization. Finally, this document should also be aware of problems encountered when processing data or creating specific visualizations that can be solved in the future.

Data Visualization Project Process Overview:

Before starting any project, the most important thing is to get the right participants involved. These participants can be business owners who commission data visualization projects or key stakeholders who will actively use data visualization. The involvement of business representatives is most important to first identify project requirements and achieve common ground between requirements and definitions of success. Participation and collaboration greatly increase the possibilities for visually addressing business needs as a result. Similarly, your organization’s data users should be involved, especially when discussing the data they are responsible for managing. Creating data visualizations should be a highly iterative and dynamic process.

Looking for insight from data visualization:

Visualization can uncover patterns and insights that may be known and obvious, or new and unexpected. People should seek insights that can be used to tell a story, not just expect the visualization itself to illustrate a story. Insight can represent different things, such as the beginning of a story or an error in the data, so the following steps are helpful and repeatable to ensure that you find effective ways to get insight from data and visualizations.

1. Visualization allows unique processing of data sets and can be done in a number of different ways, such as charts, tables, maps, and graphics. For example, forwarded information should provide valuable insights that help viewers make business decisions. Jon Steel, a leader in account planning, has this to say about viewing and understanding data: “In the context of an advertising agency, the ability of a planner to look at the same information and see different things as everyone else is invaluable. They need to be able to take all sorts of information, arrange it randomly and rearrange it in new patterns until something interesting comes along.” Good data visualization not only conveys actionable information, but also helps you see things that others might not.

**2. Analyze and explain what you see. ** In this step, ask yourself the following questions: What can I see in this picture? Was it my expectation? Are there any interesting patterns? What does this mean in the context of data? These questions not only help you find meaning in the visualization, but also show you that while the visualization looks good, it doesn’t tell you anything about the data.

**3. Document insights and steps. ** The recording of this step can start before you view the data. Usually, we have expectations and assumptions about the data set before we start working with it, and we choose specific data for a reason. These thoughts can be recorded, allowing us to identify our preconceptions and reduce the risk of misreading the data by finding out what we expect. Documentation is the most critical but also the easiest step to skip. Documentation provides a context for creating charts, eliminating any confusion that might arise when viewing multiple sets of charts. Some things to note when recording include: Why did I create this chart? What did I do with the data to create it? What does this graph tell me?

**4. Convert the dataset. ** This step allows more patterns and discoveries to be explored. Based on insights from previous steps, additional questions about the data or findings may arise and may require further examination or analysis. This can be done through transformations such as scaling (aggregating data points into a single group), filtering, and outlier removal.