Skip to main content
Skip table of contents

Analyze Best Practices

Through the course of this manual we’ve gone over the various functionalities of Analyze, and how they can help end-users identify potential issues with their datasets. In order to most efficiently make use of the software a general framework of steps is recommended to get the most out of the software. In general terms, a solid approach can be described in the following steps (which we’ll handle in more detail below):

  • Knowing your source database

  • Understanding what your organisation deems as data quality issues

  • Compiling your findings

  • Turning findings into actionable points

Knowing your source database

When working with data, it’s important to know how your source database is configured. This doesn’t mean knowing what the actual data looks like, but understanding the general design your database and the intention behind certain tables. When working with legacy databases for instance, it’s possible that certain tables in the database are relics or do not relate to the actual business in any way. While these are all potential issues with the data, it’s likely this is already known within the organisation and that identifying a plethora of issues with an archived table will slow down the analysis of more significant tables.

If some time is spent at the start of a project looking at which tables/schemas are relevant and which are not, projects are completed more quickly and the information gained from them is more likely to be put to use. Another consideration here is not to analyze an entire organisation’s worth of data, but to break down these projects into manageable chunks, based off of for instance:

  • Projects, integrating the Analyze step into any project that requires a duplicate of certain data. Doing this usually results in smaller, more concrete requirements to data.

  • For larger projects departments, making sure that the source database is broken up into logical units. If an Analyze project is specifically designed to identify issues with the data regarding accounts payable or financial data, it becomes easier to find stakeholders within the organisation to supply any data privacy officers with information on their requirements and input.

Understanding what your organisation deems as data quality issues

When we use Analyze, we’re trying to look for issues regarding our data. Often however, organisations don’t have a clear picture of what issues are actually important to them. Things such as GDPR compliance are often named. This is a great starting point which we can express in solid requirements, but it’s possible that to the individual organisation other data elements are also undesirable, which can be detected using Analyze.

Example: Car Manufacture

Let’s take a look at the software testing department of a car manufacturer. Because the car industry is highly competitive and relies on new technology to offer an attractive product to its customers, information about new car models or technologies are often closely guarded secrets which, when leaked, could negatively impact the competitive position of the company.

If a duplicate dataset has to be made available for usage by a group of web-developers working on a new website for this manufacturer, it’s possible the business decides to grant access to an anonymized version of their MODEL_INFORMATION table. A data privacy officer determines that new model names are to be scrubbed from this data, and this is achieved by manually updating the MODEL_NAME column within this table.

Unfortunately, during testing it’s revealed that for certain new models and technologies some information was erroneously stored in a large description field in the same database, which leads to a data-leak.

This scenario could have been prevented by clearly communicating what data issues are for the organisation, expressing those in unambiguous requirements, and developing profiles which can catch these manual input errors.

Compiling your findings and turning them into actionable points

We’ve identified what our source database looks like, determined a workable project, and compiled a solid list of requirements. After this, the development of profiles and execution of an Analyze project can begin. After execution however, the job of interpreting isn’t over quite yet. Once a run has been completed, it’s possible to generate an overview of the findings in Analyze by pressing Project → Generate Analyze report. This generates a file that contains overviews of any potential issues found.

Using this data, it’s wise to establish a plan of action to tackle any identified issues. In our earlier example it’s entirely possible that following the finding that vehicle model names are stored in the description field, the description field has to be assessed and a specification of which data remains relevant must be kept in this duplicate dataset, and what can be scrubbed or removed from it.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.