The HIV/STI/Viral Hepatitis Data Quality and Informatics (DQI) page is a central location that promotes the work to integrate our data systems and maintain high quality, accurate data.
What is Informatics?
Informatics is the science of how to use data, information and knowledge to improve human health and the delivery of health care services.
Public Health Informatics is the application of informatics in areas of public health, including surveillance, prevention, preparedness, and health promotion. Public health informatics and the related population informatics, work on information and technology issues from the perspective of groups of individuals. Public health is extremely broad and can even touch on the environment, work and living places and more. Generally, the American Medical Informatics Association focuses on those aspects of public health that enable the development and use of interoperable information systems for public health functions such as biosurveillance, outbreak management, electronic laboratory reporting and prevention.
Why is Data Quality so Important?
Decisions about when and where state money and resources, both human and capital, are allocated are based on data that indicates there is a need for a mobilization of those resources to respond to either an emergent need or an initiative aimed at supporting the mission of the Indiana Department of Health to promote, protect, and improve the health and safety of all Hoosiers.
The quality of the data that these decisions are based is a large determinant of whether the resources that are put into these initiatives will be effective. If the data we are including in our analyses is incorrect, or if we are missing large amounts of information due to a reporting issue then we may be looking in the wrong area or missing the beginning of an issue that detected at an earlier stage and thus intervened at a lower cost of money and people’s lives and suffering. Ensuring that the data we are reviewing is correct and complete with at least the necessary minimum information required will enable us to use our resources efficiently with best effect for what the community needs.
Additionally, part of data quality is not just data being correct but also being in a format that allows us to aggregate or combine it together so that the larger picture can be examined for patterns and trends. This type of data cleaning is called standardization. Standardizing the data – making similar answers all match so that they can be aggregated is what allows us to analyze the data at a group level for trends in areas or groups over time. Being able to group the data correctly is also part of data quality so having the correct information to enable us to place a case in the correct group is part of what informs the decision-making process. If the information is missing or incorrect or if the information is unable to be classified due to it being communicated in a format that is readable to the receiving system, it can skew the data incorrectly especially if a large number of cases’ data are missing or incorrect or if the groups are made of a small number of cases if the disease incidence is small.
What are examples of some types of data that our teams continually review and correct or standardize to enable data to be analyzed cleanly?
- Data reporting patterns – are we receiving data regularly from our submitting partners?
- Residency of investigations
- Test result information
- Risk factor/interview information
- De-identified information transmission completion to CDC
- Data standardization for aggregation
- Demographic group completeness information