To provide researchers and report writers with easy access to high quality data, the organization requested a metadata-driven extract, load and transform (ELT) process to stand up an Observational Medical Outcomes Partnership (OMOP) data warehouse using the OMOP Common Data Model (CDM). This provides insights generated using Observational Health Data Sciences and Informatics (OHDSI) standardized tools.  

Concept Mapping and Data Population  

To set the project up for success, Tegria deployed an integrated team of data engineers, Epic data experts, and a data architect to work side-by-side with the internal teams. Using Caboodle’s source data, the project team completed source-to-target mapping to the CDM. We created profiles of the source data to allow senior research data analysts to make informed decisions about conforming Caboodle data to the OMOP standard and extending the CDM to include additional data elements from both Caboodle and Clarity databases.  

The project team selected a metadata-driven ELT architecture to minimize development time and long-term maintenance costs. Business Intelligence Markup Language (Biml) was used in the creation of efficient processes to automatically generate ELT packages for exporting and loading data.  

Staging and Validating Source Data  

Once the scripts were created, data analysts staged and validated source data. Loading the legacy data helped to ensure that the system was configured to work with the existing implementation and informed the development of features that would enable efficient, scalable architecture.  

Large data sets can present performance challenges, so it was important to have an ELT process to load data daily to a staging database along with a persisted source database, OMOP formatting, and ultimately a load into the reporting database. A business rule framework was created to perform a variety of data quality checks and validation across multiple tables.  

Improving Efficiency  

Previously, databases were loaded infrequently due to the large amounts of data, slow loading times, and conflicting reporting times. To solve this issue moving forward, Tegria recommended a multi-threaded approach to run multiple data packages at the same time with built-in prioritization for size and timing.  

To provide efficient access to the data, a dynamically generated set of user access views (UAVs) were created daily for multiple access modes including the standard, protected health information (PHI), and de-identified data models, as well as patient cohorts defined by a researcher. To help facilitate the variety of data models, a complex code that would de-identify dates, elderly patients, and locations was developed. This also included record level exclusion flags to identify records that cannot be included in views, such as patients with a special status (i.e., celebrities, employees, etc.), Community Connect patients, HIV patients, and substance abuse patients.  

Our data engineers also developed and deployed a metadata-driven flat file loader that leverages both dynamic SQL, including tables and stored procedures, and Biml to generate the SSIS packages. Source data is preserved in the persisted source layer, and the process handles file archiving and high-level execution auditing. The generated stored procedures conform to the project’s general design specification and can be configured as type 1 (no change history retained) or type 2 (change history retained through versioning). 

This continues to be an ongoing project with an ever-evolving set of objectives. The next phase includes integrating Digital Imaging and Communications in Medicine (DICOM) data, adding new custom DMCs to identify historical patient IDs, loading additional flowsheet records and expanding the business rule framework.  

How can Tegria help you do more with data?

Learn more