Analyze Best Practices

Throughout this manual, we've covered the various functionalities of Analyze and how they can assist end-users in identifying potential issues with their datasets. To make the most effective use of the software, we recommend following a general framework of steps. A solid approach can be outlined as follows (which we will discuss in more detail below):

Understanding your source database
Identifying what your organization considers data quality issues
Compiling your findings
Converting findings into actionable insights

Understanding your source database

When working with data, it's crucial to understand how your source database is configured. This doesn't mean knowing the exact data content, but rather understanding the overall design of the database and the purpose behind its tables. For example, when working with legacy databases, some tables may be outdated or irrelevant to the current business processes. While these could be considered data issues, it's likely that they are already known within the organization. Focusing on identifying issues with archived tables can hinder the analysis of more critical, active tables.

Spending some time at the start of a project to identify which tables and schemas are relevant can help complete the project more quickly and ensure the findings are actionable. It's also important to avoid analyzing the entire organization’s data at once. Instead, break the project into manageable chunks, based on factors such as:

Integrating the Analyze step into projects that involve duplicating specific data can lead to smaller, more focused data requirements.
For larger projects, it’s important to break the source database into logical units. For example, if an Analyze project is designed to identify issues with accounts payable or financial data, it becomes easier to engage relevant stakeholders within the organization. These stakeholders can help provide data privacy officers with the necessary information and input on their requirements.

Identifying what your organization considers data quality issues

When using Analyze, the goal is to identify potential issues with the data. However, many organizations may not have a clear understanding of which issues are most important to them. While GDPR compliance is often cited as a key concern, which is a great starting point for establishing concrete requirements, there may be other data elements specific to the organization that are undesirable. These additional issues can also be detected using Analyze.

Converting findings into actionable insights

After analyzing the source database, defining a manageable project, and compiling a solid list of requirements, the next step is to develop profiles and run the Analyze project. However, the job doesn’t end with execution. Once the analysis is complete, you can generate an overview of the findings by selecting Project → Generate Analyze Report. This will create a file that outlines any potential issues detected during the analysis.

With this data in hand, it’s important to create a plan of action to address the identified issues. For example, in the previous scenario where vehicle model names were found in the description field, the next step would be to assess the description field. A specification must then be developed to determine which data is relevant for the duplicate dataset, and which data should be scrubbed or removed.