Data Quality

Data Quality Module

The Data Quality module allows you to run pre-defined and custom checks for data errors and discrepancies. These checks can help you identify data entry errors and values that no longer conform to a field’s requirements (e.g., field validation or “required” status) if the field has been modified.

Within the module, you can identify values that violate your rules and either correct them or exclude them from future checks.

Types of Data Quality Rules

Pre-Existing Rules

The Data Quality module comes with nine pre-defined rules to help you identify common issues.

  • A. Blank values
    • Find all fields that have been left blank (empty), provided fields aren’t hidden by branching logic.
  • B. Blank values (required fields only)
    • Find all required fields that have been left blank (empty), provided fields aren’t hidden by branching logic.
  • C. Field validation errors (incorrect data type)
    • Identify values that conflict with a field’s validation. For example, this rule would flag letters stored in an “integer” field.
  • D. Field validation errors (out of range)
    • Identify values that exceed the minimum and maximum values for a field. For example, if you have an “age” integer field with lower and upper boundaries of 18 and 120, this rule would flag any values less than 18 or greater than 120.
  • E. Outliers for numerical fields. (numbers, integers, sliders, calc fields)
    • Identify values in numeric fields that fall more than two standard deviations from the mean.
  • F. Hidden fields that contain values
    • Fields that are hidden by branching logic are expected to be blank (empty). This rule identifies any fields that should be hidden but contain a value.
  • G. Multiple choice fields with invalid values
    • Multiple-choice fields are expected to contain values that match the defined choice values. This rule identifies values that don’t appear in the list of choices.
  • H. Incorrect values for calculated fields
    • Find values that don’t match the expected result of a calculation formula or a @CALCTEXT or @CALCDATE action tag.
  • I. Fields containing “missing data codes”
    • Missing data codes can be used to document why a field is blank. Use this rule to find missing data codes in your project.

Custom Rules

You may wish to define one or more custom data quality rules. For example, your project may have two versions of a variable, item_v1 and item_v2. The following rule will find records that contain mismatched values in the two fields:

[item_v1]<>'' and [item_v2]<>'' and [item_v1] != [item_v2]

To define a custom rule for your project:

  1. Give the rule a name or label in the “Rule Name” box.
  2. Type the syntax into the “Rule Logic” box.
  3. Decide whether you want to execute the rule in real time on data entry forms. If real-time execution is enabled, users will see a popup if they try to save a form with data that violates the rule.
  4. Click Add.

Steps to Run Data Quality Rules

Ensure Access

Use the User Rights page to define who can create custom rules and who can execute existing rules.

Execute One Rule

Run an individual rule on all records by clicking the Execute button beside the rule.

If REDCap finds any records that violate the rule, the number of matches will appear in the Total Discrepancies column. Click “view” to see a list of matches or “export” to download matches as a CSV file.

Execute All Rules

To execute all rules an once, all rules except the “Blank Values” rules, or all custom rules, click the appropriate button.

Limit to One Record

By default, rules are run on all records. To execute rules on a specific record, select the record ID from the dropdown menu.

Fix Incorrect Values

After you execute one or more rules, click “view” to see a list of values that violate a rule. The list will include the record ID and the details of the issue.

Take followup actions to address each issue:

  1. Click a link in the “Discrepant fields with their values” column to navigate to the corresponding record.
  2. If the discrepancy isn’t a concern and shouldn’t be returned the next time you run the data quality rule, click “exclude”.

Rule H will find discrepancies between saved values and the current results of formulas in calculated fields. This rule may be useful if formulas have changed, if calculated fields were added after records were created, if calculated fields appear on instruments that don’t yet have saved data for a record, or if calculations rely on the “today” and “now” smart variables.

When Rule H identifies discrepancies, you will have the option to fix all issues at once by clicking the Fix calcs now button. To exclude an individual match from the action, first click “exclude.”

Attachments