Claims data is very different than other types of data typically encountered by actuaries, accountants, analysts, and researchers. If not prepared for the unique challenges it presents, it is easy to conclude that something is wrong with a data set. The phrase “data quality” is frequently used as a catch-all to indicate a variety of concerns that may not have anything to do with the actual quality of the data itself but other characteristics that are not readily defined. Often, the term is used when something unexpected, undesired, or seemingly incorrect is discovered – it becomes a “data quality problem.”
Because of the challenges discussed above, CIVHC has developed a definition of data quality that relates to the data collected and processed in the Colorado All Payer Claims Database (CO APCD) and a couple other terms related to the quality of analytics produced with the data. The following are the elements that make up CIVHC’s defining components of “data quality” in the CO APCD.
Submission Quality
Baseline CO APCD data quality is determined by the combination of the condition of the data submitted and effectiveness of the data intake and validation processes. The intake and validation processes have an extremely low threshold for incomplete or incorrect claim submissions. If the files do not meet the necessary criteria, they are rejected and the submitter is required to fix the errors and resubmit.
Business Processing Quality
Once the claims pass the intake validation process, they are processed using a set of complex business processing rules. These rules help make sense of the millions of claims submitted every month by assigning logic to be able to do things like track patients who may have switched insurance, identify providers who bill under multiple facility names, and remove duplicate claims.
Data Accuracy
As mentioned above, the quality of the data submitted and how is gets processed is foundational to the quality of output of any analysis that is conducted using the information. Once CIVHC validates that it is as accurate as possible according to rigorous standards, the team is able to produce data sets that stakeholders can use to conduct analyses. Thus, CIVHC implements quality checks throughout the process and ensures that the output makes sense based on what is expected.
Completeness
The contents of the CO APCD are regulated by the CO APCD Rule and the Data Submission Guide as well as by federal laws. There are certain data elements that cannot be collected without changes to legislation and thus will not be in the CO APCD. In addition, people with certain coverage like ERISA-based self-insured employer plans, TRICARE or VA are not included in the database, so even though the CO APCD is inclusive of the majority of insured Coloradans, it is incomplete in terms of representing all covered Coloradans.
Timeliness
Claims data is, by nature, retroactive. Claims are filed following health care services, then processed and adjudicated by the insurance companies. If all goes well, the claim will move through the system quickly and get submitted to the CO APCD. If the claim is disputed, the process can take longer. Data is submitted to the CO APCD by the payers every month and is incorporated into the data warehouse every two months. On average, the time between when a claim is paid and when it is available for request in the CO APCD is three months. To make the claims available any sooner risks processing errors.
Consistency
The entire data warehouse gets refreshed and processed every other month. This helps CIVHC ensure that if a payer happened to submit incorrect claims or if the business processing rules need to be updated, all historic data is processed in the same manner as the files being added. As a result, output from analytics performed on one data refresh may not match exactly with one that was done two months later, even if the same methods were employed. The database is a living and ever-evolving resource which makes it very impactful in terms of immediate benefit, but caution must be used when comparing analytics using different data refreshes.