Skip to main content
Skip table of contents

Analyze Best Practices

Throughout this manual, we've covered the various functionalities of Analyze and how they can assist end-users in identifying potential issues with their datasets. To make the most effective use of the software, we recommend following a general framework of steps. A solid approach can be outlined as follows (which we will discuss in more detail below):

  • Understanding your source database

  • Identifying what your organization considers data quality issues

  • Compiling your findings

  • Converting findings into actionable insights

Understanding your source database

When working with data, it's crucial to understand how your source database is configured. This doesn't mean knowing the exact data content, but rather understanding the overall design of the database and the purpose behind its tables. For example, when working with legacy databases, some tables may be outdated or irrelevant to the current business processes. While these could be considered data issues, it's likely that they are already known within the organization. Focusing on identifying issues with archived tables can hinder the analysis of more critical, active tables.

Spending some time at the start of a project to identify which tables and schemas are relevant can help complete the project more quickly and ensure the findings are actionable. It's also important to avoid analyzing the entire organization’s data at once. Instead, break the project into manageable chunks, based on factors such as:

  • Integrating the Analyze step into projects that involve duplicating specific data can lead to smaller, more focused data requirements.

  • For larger projects, it’s important to break the source database into logical units. For example, if an Analyze project is designed to identify issues with accounts payable or financial data, it becomes easier to engage relevant stakeholders within the organization. These stakeholders can help provide data privacy officers with the necessary information and input on their requirements.

Identifying what your organization considers data quality issues

When using Analyze, the goal is to identify potential issues with the data. However, many organizations may not have a clear understanding of which issues are most important to them. While GDPR compliance is often cited as a key concern, which is a great starting point for establishing concrete requirements, there may be other data elements specific to the organization that are undesirable. These additional issues can also be detected using Analyze.

Example: Car Manufacturer

Let’s consider the software testing department of a car manufacturer. The automotive industry is highly competitive, and new technologies and car models are often closely guarded secrets. If such information is leaked, it could significantly harm the company’s competitive position.

Suppose a duplicate dataset needs to be made available to a group of web developers working on a new website for the manufacturer. The business may decide to grant access to an anonymized version of the MODEL_INFORMATION table. A data privacy officer might then decide that the new model names must be scrubbed from this data, which could be done by manually updating the MODEL_NAME column.

However, during testing, it is discovered that some details about new models and technologies were inadvertently stored in a large description field within the same database, leading to a data leak.

This issue could have been avoided if the organization had clearly communicated its data concerns, defined unambiguous requirements, and developed profiles to detect such manual input errors.

Converting findings into actionable insights

After analyzing the source database, defining a manageable project, and compiling a solid list of requirements, the next step is to develop profiles and run the Analyze project. However, the job doesn’t end with execution. Once the analysis is complete, you can generate an overview of the findings by selecting Project → Generate Analyze Report. This will create a file that outlines any potential issues detected during the analysis.

With this data in hand, it’s important to create a plan of action to address the identified issues. For example, in the previous scenario where vehicle model names were found in the description field, the next step would be to assess the description field. A specification must then be developed to determine which data is relevant for the duplicate dataset, and which data should be scrubbed or removed.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.