CopLab - The Copenhagen Primary Care Laboratory Database
In the Copenhagen area (Copenhagen Municipality and the former Copenhagen County) with its approx. 1.2 million inhabitants, there was only one laboratory serving general practitioners (GPs) and other private practicing specialists from 2000 through 2015, the Copenhagen General Practitioners’ Laboratory, CGPL (Københavns Praktiserende Lægers Laboratorium). The laboratory served doctors with a broad range of blood, urine, semen, clinical physiological, cardiac, and lung function tests. The Copenhagen Primary Care Laboratory (CopLab) Database contains all results (n=176 million) of these tests and analyses from 1.3 million different individuals.
Over-all idea and ambition
The CopLab Database possesses the strength to unravel important physiological and pathophysiological relations for a plethora of medical conditions. The large number of clinical and administrative variables are being validated to create and maintain a state-of-the-art database infrastructure allowing for correct interpretation of the database content. We will establish an environment consisting of experienced researchers, data managers and statisticians with profound insight into the database variables, and all working with the goal of offering expertise and assistance to researchers presenting potential research projects of high quality using data from the CopLab Database.
Strengths of the consortium
This project brings together a unique partnership of institutions with a long-term dedication to translational research. The project is initiated by and rooted in the Department of Public Health, University of Copenhagen, and involves a multidisciplinary team of leading academic experts within epidemiology, basic science, nutrition, general practice, clinical medicine, clinical pharmacology, health economics, organizational research, and computer science. The commitment of this broad range of experienced partners consolidates the implementation of the database.
Denmark has a long tradition for collecting information on disease incidence, use of health services, socio-economic status and other data describing its population in national registries. For research purposes more detailed data are also available in smaller clinical databases and research databases.  National and research databases may be merged with the personal identification (CPR) number.
The CopLab Database consists of prospectively collected clinical data from primary care patients who were consulting their primary care doctors for health issues from 2000 through 2015. The CopLab population was sampled continuously without any restrictions as to why the analyses were requested by doctors.
Access to data from so many individuals over such a long time period enables the CopLab Database to assess both common and rare disease outcomes. Furthermore, multiple measurements over time enable longitudinal research, also across health sector boarders, and the prognostic value of the many clinical variables may be assessed for clinical outcomes, while adjusting for relevant confounders.
The history, contents and quality assurance
The CGPL was founded in 1922 by the GPs in the Copenhagen Municipality, but in 1994 GPs from the County of Copenhagen joined. During 2000 through 2015, CGPL served approx. 750 GPs and 300 private practicing specialists and performed tests on 1.3 million unique individuals from a dynamic population amounting to approx. 1.2 million inhabitants at a given time. When the CGPL was closed down at the end of 2015 it was one of Europe's largest laboratories with 195 full-time employees. Besides a broad spectrum of biochemical analyses, CGPL offered a comprehensive selection of cardiac and clinical physiological tests as well as allergological, urine and semen analyses. Blood sampling and/or testing took place at CGPL, at its 8 local branches, in doctors’ consultation rooms, and in patients’ own homes or nursing homes.
Clinical biochemistry. This was CGPL’s quantitatively largest production area with some 10 million analyses performed yearly.
Fertility examinations. The CopLab Database contains results of 160,020 semen analyses from 92,335 men from 1950 through 2015, which makes it by far the largest collection of its kind in the world.
Electrocardiograms (ECGs) The ECGs were always recorded by specially trained nurses and interpreted by one of CGPL’s five cardiologists. This ensured high and uniform quality of the 1,050,000 ECGs performed.
Echocardiograms. All of the more than 30,000 echocardiograms were performed by either specially trained echo-technicians or cardiologists who always supervised all examinations and ultimately read and described all tests.
Other relevant clinical analyses were ECG stress tests, event recordings, 24-hour ambulatory blood pressure measurements, pulmonary function tests, distal blood pressure measurements, skin and serological allergy testing, and EEGs.
Quality assurance of CGPL. As the first laboratory in Denmark, the CPGL became fully DANAK accredited in 2001. In addition, CGPL was the first laboratory in the world where cardiac, pulmonary and other analyses were fully accredited. Accreditation according to DANAK's 15189 standard (the former ISO 17025) was the highest form of quality standard and quality assurance that could be achieved. With accreditation, all aspects of information handling, patient data and data analytics, staff qualifications, procurement, IT management and staff education as well as quality goals at the CGPL were subject to strict and clearly defined criteria, and ongoing follow-up from DANAK staff ensured that CGPL constantly met the accreditation requirements.
Besides semen analyses dating back to 1950, only data from the beginning of the millennium, when accreditation was issued, are included in the CopLab Database in order to ensure access to the highest possible quality of clinical data.
Former research using data from the CopLab Database
In 2008, researchers from The Research Unit for General Practice in Copenhagen began working with CGPL data in a pilot project entitled “The Copenhagen Primary Care Differential Count (CopDiff) Database”  in order to prepare for the forthcoming CopLab Database.
The research group included consultants from CGPL with great insight into data and their origin, and most of them are key members of the CopLab Steering Group (see “Organizational Structure”). The CopDiff project prepared the institution’s data managers, statisticians and researchers for the ensuing work with the much larger CopLab Database and emphasized the importance of a robust infrastructure allowing for swift and secure handling of the vast data. The experiences from the construction and analysis of the CopDiff Database are documented in 9 publications [2-10]. In the history of CGPL more limited, and often un-validated, data extracts from CGPL have been used in other research fields [11-14].
Positioning of the CopLab Database
The CopLab Database is in an international context unique:
- It covers a well-defined geographical area where more than one fifth of the Danish population lives, and all age groups are included.
- It covers all examinations and tests ordered by GPs and private practicing specialists.
- The longitudinal data allows for the estimation of dynamic changes over 15 years and enables the assessment of duration and timing of various events.
- The test results derive from a single laboratory with the highest level of quality assurance with extensive documentation of all analyses and workflows. The Clinical Laboratory Information System (LABKA) research database  supplies some of the same biochemical data as the CopLab Database, but the utility of LABKA data is hampered by great variation between the many laboratories in analytical methods and quality assurance.
- Family studies can be carried out with great power since the database contains results from individual family members (Table 2).
The CopLab Database is an open cohort. Only patients for whom the primary care physician initiated testing or blood sampling at the CGPL have records in the database. CGPL served mainly GPs (86 % of requisitions), but also privately practicing specialists (Table 2).
Data are stored at a server at University of Copenhagen and access to this server is limited to a few key statisticians and data managers. Researchers will not be granted direct access to the SQL-database; instead, the data managers select only the required information from the database and export the data to relevant statistical software. Documentation and code files for each project will be organized according to a common structure and all relevant files will be subject to version control, such as GitHub, to facilitate full transparency and reproducibility. Most projects are likely to involve merging CopLab data with national health registers which will be provided with The Public Health Database at the Department of Public Health, University of Copenhagen. Furthermore, an online search tool with anonymized data will be developed for public use.
The open cohort structure of the CopLab Database imposes challenges for the statistical analyses and their interpretation. The specific challenges will depend upon the actual research project, but the following challenges are common:
- Sampling: the data were not sampled for research purposes. The population in the CopLab Database is therefore likely to be more ill than the general population, but at the same time less ill than hospitalized patients. In addition, the sampling mechanisms and rationales may have changed over time.
- Confounding: since data are observational the effect of an exposure on an outcome may be modified by other variables.
- Sample size: the large sample size is a great advantage, yet it can cause challenges like long computer processing time. In addition, statistical significance has limited meaning when the sample size is large.
- Dependency structures: the data often contain multiple observations on the same individual and these observations are not independent.
- Technical inaccuracies: some of the biomarkers have lower and upper detection limits.
For each research project the specific challenges will be identified a priori and an appropriate statistical method selected. However, many of the challenges mentioned above cannot be handled with simple statistical methods (e.g. inverse probability weighting, propensity score matching, statistical machine learning and mixed models) and may require the use of advanced methods, and even the development of new methods. Sampling issues have received limited attention in the statistical literature. Hence, the CopLab Database presents an opportunity to cultivate the existing methodology in this field also for the benefit of other databases facing similar sampling challenges .
An internet portal describing the CopLab Database contents including a public online search tool linked to disease codes and demographic information from national administrative registries will be launched. Researchers irrespective of nationality and affiliation may submit applications which will then be assessed by the Steering Group.
A “News” section on the portal will continuously list proposed and ongoing projects, presentations and publications. Our institution´s “Press and media service” office will facilitate news bulletins and monthly newsletters. A LinkedIn- and Twitter account managed by a dedicated resource will allow for dissemination on social media.
By offering CopLab clinical variables of supreme quality to the national and international research community, high quality evidence will be produced. To maximize the impact, the scientific results will also be disseminated to patient and caregiver organisations, municipalities, regional and national politicians, GPs and the general population.
The CopLab Database is administered by Department of Public Health, University of Copenhagen. A Steering Group has been appointed and bylaws created. Steering Group responsibilities include: ensuring progress of the specific research projects; discussing vision for and structure of the database; assessing all proposed projects for scientific merit and approving access to data; approving budgets and accounts; and appointing members for the Scientific Advisory Board for consultancy.
The Steering Group:
- Christen Lykkegaard Andersen, CopLab project leader, associate professor , MD DMSc PhD
- Bent Lind, Senior consultant leader, MD DMSc
- Volkert Siersma, Head statistician , MSc PhD
- Peter Felding, Senior consultant, MD DMSc
- Frans Waldorff, professor, MD PhD
- Niels de Fine Olivarius, professor MD
- Steffen Loft, professor, MD DMSc.
1. Thygesen LC, Daasnes C, Thaulow I, Bronnum-Hansen H. Introduction to Danish (nationwide) registers on health and social issues: structure, access, legislation, and archiving. Scand J Public Health 2011; 39 (7 suppl):12-16.
2. Andersen CL, Siersma VD, Karlslund W, Hasselbalch HC, Felding P, Bjerrum OW, et al. The Copenhagen Primary Care Differential Count (CopDiff) database. Clin Epidemiol 2014; 6:199-211.
3. Andersen CL, Eskelund CW, Siersma VD, Felding P, Lind B, Palmblad J, et al. Is thrombocytosis a valid indicator of advanced stage and high mortality of gynecological cancer? Gynecol Oncol 2015; 139:312-318.
4. Andersen CL, Lindegaard H, Vestergaard H, Siersma VD, Hasselbalch HC, de Fine Olivarius N, et al. Risk of lymphoma and solid cancer among patients with rheumatoid arthritis in a primary care setting. PLoS One 2014; 9:e99388.
5. Andersen CL, Siersma VD, Hasselbalch HC, Lindegaard H, Vestergaard H, Felding P, et al. Eosinophilia in routine blood samples and the subsequent risk of hematological malignancies and death. Am J Hematol 2013; 88:843-847.
6. Andersen CL, Siersma VD, Hasselbalch HC, Lindegaard H, Vestergaard H, Felding P, et al. Eosinophilia in routine blood samples as a biomarker for solid tumor development - A study based on the Copenhagen Primary Care Differential Count (CopDiff) Database. Acta Oncol 2014; 53:1245-1250.
7. Andersen CL, Siersma VD, Hasselbalch HC, Vestergaard H, Mesa R, Felding P, et al. Association of the blood eosinophil count with hematological malignancies and mortality. Am J Hematol 2015; 90:225-229.
8. Andersen CL, Tesfa D, Siersma VD, Sandholdt H, Hasselbalch H, Bjerrum OW, et al. Prevalence and clinical significance of neutropenia discovered in routine complete blood cell counts: a longitudinal study. J Intern Med 2016.
9. Hansen JW, Sandholdt H, Siersma V, Orskov AD, Holmberg S, Bjerrum OW, et al. Anemia is present years before myelodysplastic syndrome diagnosis: Results from the pre-diagnostic period. Am J Hematol 2017; 92:E130-E132.
10. Andersen CL. Eosinophilia and The Copenhagen Primary Care Differential Count (CopDiff) Database - from cells to cohorts. Doctoral Dissertation - University of Copenhagen 2017.
11. Durup D, Jorgensen HL, Christensen J, Schwarz P, Heegaard AM, Lind B. A reverse J-shaped association of all-cause mortality with serum 25-hydroxyvitamin D in general practice: the CopD study. J Clin Endocrinol Metab 2012; 97:2644-2652.
12. Jensen TK, Jacobsen R, Christensen K, Nielsen NC, Bostofte E. Good semen quality and life expectancy: a cohort study of 43,277 men. Am J Epidemiol 2009; 170:559-565.
13. Nielsen JB, Graff C, Pietersen A, Lind B, Struijk JJ, Olesen MS, et al. J-shaped association between QTc interval duration and the risk of atrial fibrillation: results from the Copenhagen ECG study. J Am Coll Cardiol 2013; 61:2557-2564.
14. Selmer C, Olesen JB, Hansen ML, von Kappelgaard LM, Madsen JC, Hansen PR, et al. Subclinical and overt thyroid dysfunction and risk of all-cause mortality and cardiovascular events: a large population study. J Clin Endocrinol Metab 2014; 99:2372-2382.
15. Grann AF, Erichsen R, Nielsen AG, Froslev T, Thomsen RW. Existing data sources for clinical epidemiology: The clinical laboratory information system (LABKA) research database at Aarhus University, Denmark. Clinical epidemiology 2011; 3:133-138.
16. Haneuse S, Daniels M. A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why? EGEMS (Wash DC) 2016; 4:1203.