Electronic Medical Records and Medical Research Databases - Can They Be Synonymous?

Register or Login to View PDF Permissions
Permissions× For commercial reprint enquiries please contact Springer Healthcare:

For permissions and non-commercial reprint enquiries, please visit to start a request.

For author reprints, please email
Average (ratings)
No ratings
Your rating
Copyright Statement:

The copyright in this work belongs to Radcliffe Medical Media. Only articles clearly marked with the CC BY-NC logo are published with the Creative Commons by Attribution Licence. The CC BY-NC option was not available for Radcliffe journals before 1 January 2019. Articles marked ‘Open Access’ but not marked ‘CC BY-NC’ are made freely accessible at the time of publication but are subject to standard copyright law regarding reproduction and distribution. Permission is required for reuse of this content.

It is well known that major efforts are currently under way in the US, UK, and other countries to construct entirely new systems for the management of electronic medical records. The thesis of this article is that, in principle, it is possible to design these systems in such a manner that they serve not only as an information resource for routine patient care but simultaneously serve as the primary backbone for medical research at little or no additional expense. This article presents a conceptual framework for an electronic system in which every medical record would be immediately available for research, thereby providing critical feedback on the efficacy and cost of medical procedures. Furthermore, the proposed system is designed in such a way that medical data are made available without the need for complicated and expensive security precautions or the need to obtain permission from multiple parties to utilize the medical data. The time window to adapt such as system, however, is limited. Once the nationwide system has been constructed it will be impractical to retrospectively integrate this fundamental feature.

Defining the Problem

What single item cost US$1.5 trillion or 14.9% of the gross domestic product (GDP) of the US in 2002?1 For what service does every man, woman, and child pay over US$5,000 each year to receive?2 What has grown faster than the rate of inflation for over 30 years?3 What does the Congressional Budget Office estimate will cost the US government three times more in 2050 than it did in 2000, even when calculated as a percentage of an increasing GDP?4 The answer to each of these questions is healthcare.

The problem of escalating costs for medical care has been recognized for years. However, despite several large-scale efforts to address the problem, including the Clinton administrationÔÇÖs historic initiative in the mid 1990s, medical care costs continue to grow unabated. Exacerbating this issue is a population demographic for which the number of elderly people in the US is expected to double between 2000 and 2030.5 While many aspects of this issue may be debated, it is a mathematical certainty that current rates of growth in healthcare spending cannot and will not be sustained. Arresting the rate of growth in healthcare costs can be achieved in one of two ways:

  • reduction in the quality and/or quantity of healthcare; or
  • improvements in efficiency.

The latter is more attractive, and few would deny there is plenty of room for improvement. One has only to attempt to have their medical records transferred from one medical care facility to another to recognize that there is more computerized technology at a supermarket checkout counter, an online banking system, or an airline flight reservation system than there is for communicating the most basic components of medical information. Indeed, this issue has received so much attention that President Bush has declared that the US should ensure that medical records are available electronically for most patients within the next 10 years.6

In stark contrast to the attention paid to medical records in support of patient care, however, medical research databases are generally dismissed as an entirely separate subject. Yet only through systematic analysis of the data generated by routine medical care can the real value of new and existing diagnostic tests and treatment options be objectively measured.

Creating an electronic medical record system that not only serves the patient but provides feedback on the pulse of the healthcare system itself should not be the focus of academic research alone. Once created, such a system would be expected to provide a continual return on investment by providing information essential to recognizing what is working, what is not working, and how things might be improved. Is it possible to create an electronic medical record (EMR) that simultaneously serves as a research database? How much more would it cost?

Database Design

The possibility that a nationwide EMR might simultaneously serve as a medical research database relates directly to the underlying system design. The heart of any information system is a database, and perhaps the most common form is the 'relational databaseÔÇÖ. Indeed, relational databases underlie virtually every banking, retail, and airline reservation system in the world. Conceptually, relational databases are little more than two-dimensional arrays organized into columns, such as 'patient nameÔÇÖ, and rows such as 'Doe, JohnÔÇÖ and 'Smith, RobertÔÇÖ. These name-value pairs can be easily manipulated to form, for example, a diagnostic medical report. Because relational databases are widely used in other fields, such as banking and retail sales, the path of least resistance will be to create a nationwide EMR based on relational databases.

Organizing medical data into name-value pairs, however, has fundamental disadvantages. For example, in banking systems, the number of unique columns such as 'debitÔÇÖ and 'creditÔÇÖ is relatively small and, perhaps more importantly, there is a limited need to create new columns in the future. Medical information, conversely, often involves large numbers of terms unique to each medical test, and new tests are continuously developed that require even more unique 'name-valueÔÇÖ pairs. Accordingly, the cost of a medical record system based on one or more relational databases might ultimately prove surprisingly expensive. In this setting, one might even consider rejecting the relational database model altogether in favor of an 'object-oriented databaseÔÇÖ, i.e. one in which electronic documents such as medical reports remain unstructured and the EMR is designed solely to standardize the document exchange mechanism, much like the office fax machine standardizes the exchange mechanism for the free-formatted paper documents in use today. In the parlance of medical record system design, it is not difficult for discussions of such issues to consume collective energies.

There is, however, a more fundamental issue. The number of database columns and rows, the type of organizational unit, and even whether or not to use the traditional relational database model at all are relatively minor concerns. The main issue is how to store the patientÔÇÖs identity.

For routine clinical care the identity of the patient is of obvious importance. Associating a magnetic resonance imaging (MRI) scan showing a malignant tumor with the wrong patient, for example, is among the most dangerous of errors that can be made in medicine. For the purpose of patient care alone, therefore, there is considerable incentive to tightly integrate the patientÔÇÖs identity with the rest of the medical record.

For medical research, conversely, the patientÔÇÖs identity is theoretically irrelevant. The only reason a researcher might need to know the identity of the patient is to check for related records, i.e. did the patient with this MRI scan also have a cardiac catheterization? If one could design a system for which medical records could be associated with an individual and yet not require that the true identity of that individual be known, the patientÔÇÖs name would never be needed. The single factor that separates information needed for routine patient care from information needed for medical research, therefore, is the identity of the patient. In all other respects, the needs for information are identical.

The key to designing a nationwide electronic medical record system that simultaneously serves as a research database, therefore, lies in designing the system in such a way that two 'viewsÔÇÖ into the underlying data are providedÔÇöa public view and a private view.

In such a system only one physical copy of the data, stored on a single computer hard disk, would be all that is needed even when the data are to be used for multiple purposes. The first step toward achieving such a system design, therefore, is to very carefully consider the issue of patient privacy.

The Role of the Health Insurance Portability and Accountability Act

The Health Insurance Portability and Accountability Act (HIPAA) specifies exactly what does and does not constitute private health information (PHI). As shown in Table 1 there are 18 identifiers that, when fully removed, render a health record officially 'de-identifiedÔÇÖ. Data that are stripped of these 18 identifiers are regarded as de-identified, unless the covered entity has actual knowledge that it would be possible to use the remaining information alone or in combination with other information to identify the subject. Removing these 18 identifiers is referred to as the 'safe-harborÔÇÖ method of de-identification; other methods are available but are subject to restrictions.

The importance of the safe-harbor de-identification method is difficult to overemphasize. Consider what the Department of Health and Human Services has to say about health records that have been de-identified via the safe-harbor method:7

ÔÇ£The Privacy Rule [HIPAA] permits covered entities [eg. hospitals] to release data that have been de-identified without obtaining an Authorization [e.g. the patientÔÇÖs permission] and without further restrictions upon use or disclosure because de-identified data is not PHI [protected health information] and, therefore, not subject to the Privacy Rule.ÔÇØ

This statement underscores the fact that once medical records have been sufficiently de-identified they are no longer subject to federal regulations. For de-identified data, therefore, there is no need to ask the patient for their permission to utilize their data, no need to protect the computer that serves the data by housing it inside a computer network firewall, no need for user accounts or passwords, no need to encrypt the data during transmission. In short, the position of the federal government is that once medical records are de-identified the data can be used by anyone for any purpose.

If and Only If

The key to designing an EMR for which all medical records are available for research, therefore, is to ensure that the data are stored in two distinct logical units:

  • private; and
  • public.

Importantly, this design criteria does not imply that a specific underlying database structure must be used. For example, Table 1 shows how a traditional 'relational databaseÔÇÖ could be organized into private and public 'viewsÔÇÖ that effectively filter out all HIPAA-protected information from public access. Table 2 shows how the same concept might be applied to an 'object-oriented databaseÔÇÖ; both private and public views are feasible.

It is important to note, however, that such a system requires that these distinct logical units are an integral part of the underlying system design. For the relational database model of Table 1, programmers must ensure that it is not possible for physicians and other healthcare providers to enter unstructured data such as 'Mr JonesÔÇÖ test resultÔÇÖ into a comment field that may appear via public access. In Table 2, the 'freeÔÇÖ text of the diagnostic report must nevertheless be divided a priori into protected and unprotected blocks. If these issues are not addressed in the original system design it will be practically impossible to identify and remove protected information retrospectively.

It is also important to note that, in principle, such a system would not be limited to textual information. Figures 1 and 2 demonstrate how the same concept could be extended to medical images in addition to textual reports. Why not apply the same concepts to genetic information, mortality statistics, and pharmaceutical transactions?

The Potential

It is difficult to overstate the potential benefits of an EMR designed from the ground up with distinct private and public logical units. Every medical image, blood test, diagnosis, treatment, complication, death, and associated cost for every person in the nation would immediately become a part of the aggregated public research record. Imagine the potential implications of a federal mandate that required medical care providers such as hospitals to post the entire public portion of their medical records on the World Wide Web.

Every night, and with no additional governmental investment in infrastructure, existing Internet search engines such as Google and Yahoo! could automatically trawl the available medical data and silently index the dayÔÇÖs events. At any given time, from any hotel room, airport, or coffee shop, any individual could open a Web browser and type 'MRI North Carolina 2012ÔÇÖ into a Google Web page and count how many MRI procedures were performed that year. With a single additional mouse click, that same person could view any one of the literally hundreds of millions of magnetic resonance images on the resulting Google 'hitÔÇÖ list (see Figure 2).With nothing more than a third mouse click, that same person could read the physicianÔÇÖs interpretation of the images (see Table 2), inspect the common procedural terminology (CPT) and International Classification of Diseases, Ninth Revision (ICD-9) codes, and learn the amounts billed and the amounts paidÔÇöwithout approval from an institutional review board.

An Orwellian Risk?

Would such a system create a circumstance where 'big brother is watching youÔÇÖ?8 Absolutely not. The aggregated public record would, by definition, not contain the identities of individuals and, therefore, could not be used to violate individual privacy. In fact, the proposed system could easily be modified specifically to allow a person to 'watch big brotherÔÇÖ. For example, under the current system, every prescription drug in the US must first undergo testing by the drugÔÇÖs manufacturer and the data must be submitted to the US Food and Drug Administration (FDA) via a new drug application (NDA).9 Conspiracy theorists might argue that this process could be abused by officials at the FDA who, at least theoretically, have the power to inappropriately approve a drug based on limited supporting data. An aggregated public record would make it a simple matter to legislate a requirement whereby the FDA must publish the Web addresses of all medical records submitted in support of new drugs, thereby allowing direct inspection of the raw data by the individual. Similarly, medical journals could insist that academic investigators publish the Web addresses of the raw data for their research studies so that anyone can inspect them or health insurance companies could be required to publish the scientific basis for the rates that they charge so that the individual can decide whether they are reasonable. What is proposed is not a risk to privacy, it is the medical equivalent of the Freedom of Information Act.10

Cost Implications

Because the public data would be, by definition, exempt from HIPAA regulations, there would be no need to pay for encryption technologies and no need to pay for firewalls or for password management. Because the data were originally created for the purpose of clinical care, the additional expense associated with its use in research would be US$0.00. Also, because the data would be served over the Web, there would be no need to purchase another hard disk to store copies of the data no need to purchase an additional computer, another desk, another IT staff member with a fixed salary and fringe benefits. There would be no need to wait for the hospital administrator to submit a budget request, no need for the National Institutes of Health (NIH) study section to assign a priority score to a grant application, and no need for the hospital chief executive officer (CEO) to make a funding decision.

This can all happen if, and only if, the databases that underlie these new nationwide EMR are designed with these goals in mind a priori.

The Challenge

A nationwide EMR will be created in the US within the next 10 years. Existing federal laws establish what constitutes public and private information. Providing both public and private 'facesÔÇÖ on the medical data produced during routine patient care, however, will not be possible unless the system is specifically designed that way. The question, therefore, is not whether electronic medical records and medical research databases can be synonymous. The question is how much future generations will pay if the opportunity that now presents itself is not recognized and acted upon. Ôûá

Conflict Disclosure

The authors are founders of Heart Imaging Technologies, a company that manufactures computing systems for medical information management and owns a US patent on related methods.


  1. Centers for Medicare and Medicaid Services, Office of the Actuary, National Health Statistics Group; and US Department of Commerce, Bureau of Economic Analysis and Bureau of the Census, 2003/t1.asp
  2. Centers for Medicare and Medicaid Services, Office of the Actuary, National Health Statistics Group; and US Department of Commerce, Bureau of Economic Analysis and Bureau of the Census. 2003/t3.asp
  3. Centers for Medicare and Medicaid Services, Office of the Actuary; Bureau of Labor Statistics (CPI-U, U.S. city average, annual figures).
  4. A 125-Year Picture of the Federal Government's Share of the Economy, 1950 to 2075, A summary from the Congressional Budget Office. No. 1 June 14, 2002; revised July 3, 2002.
  5. Congressional Budget Office based on Bureau of the Census, US Interim Projections by Age, Sex, Race, and Hispanic Origin, Table 2a, Projected Population of the United States, by Age and Sex: 2000 to 2050 (March 2004).
  6. Transforming Health Care: The President's Health Information Technology Plan, infocus/technology/economic_policy200404/chap3.html
  7. Research Repositories, Databases, and the HIPAA Privacy Rule, US Department of Health and Human Services, National Institutes of Health,
  8. Nineteen Eighty-Four, by George Orwell, 1949.
  9. New Drug Application Process, US Food and Drug Administration, Center for Drug Evaluation and Research,
  10. The Freedom of Information Act, United States Code, Section 552.