Electronic Medical Records and Medical Research Databases—Can They Be Synonymous?

Robert M Judd; Raymond J Kim

It is well known that major efforts are currently under way in the US, UK, and other countries to construct entirely new systems for the management of electronic medical records. The thesis of this article is that, in principle, it is possible to design these systems in such a manner that they serve not only as an information resource for routine patient care but simultaneously serve as the primary backbone for medical research at little or no additional expense. This article presents a conceptual framework for an electronic system in which every medical record would be immediately available for research, thereby providing critical feedback on the efficacy and cost of medical procedures. Furthermore, the proposed system is designed in such a way that medical data are made available without the need for complicated and expensive security precautions or the need to obtain permission from multiple parties to utilize the medical data.The time window to adapt such as system, however, is limited. Once the nationwide system has been constructed it will be impractical to retrospectively integrate this fundamental feature.

Defining the Problem

What single item cost US$1.5 trillion or 14.9% of the gross domestic product (GDP) of the US in 2002? For what service does every man, woman, and child pay over US$5,000 each year to receive? What has grown faster than the rate of inflation for over 30 years? What does the Congressional Budget Office estimate will cost the US government three times more in 2050 than it did in 2000, even when calculated as a percentage of an increasing GDP? The answer to each of these questions is healthcare.

The problem of escalating costs for medical care has been recognized for years. However, despite several large-scale efforts to address the problem, including the Clinton administration's historic initiative in the mid 1990s, medical care costs continue to grow unabated. Exacerbating this issue is a population demographic for which the number of elderly people in the US is expected to double between 2000 and 2030. While many aspects of this issue may be debated, it is a mathematical certainty that current rates of growth in healthcare spending cannot and will not be sustained. Arresting the rate of growth in healthcare costs can be achieved in one of two ways:

reduction in the quality and/or quantity of healthcare; or
improvements in efficiency.

The latter is more attractive, and few would deny there is plenty of room for improvement. One has only to attempt to have their medical records transferred from one medical care facility to another to recognize that there is more computerized technology at a supermarket checkout counter, an online banking system, or an airline flight reservation system than there is for communicating the most basic components of medical information. Indeed, this issue has received so much attention that President Bush has declared that the US should ensure that medical records are available electronically for most patients within the next 10 years.

In stark contrast to the attention paid to medical records in support of patient care, however, medical research databases are generally dismissed as an entirely separate subject. Yet only through systematic analysis of the data generated by routine medical care can the real value of new and existing diagnostic tests and treatment options be objectively measured.

Creating an electronic medical record system that not only serves the patient but provides feedback on the pulse of the healthcare system itself should not be the focus of academic research alone. Once created, such a system would be expected to provide a continual return on investment by providing information essential to recognizing what is working, what is not working, and how things might be improved. Is it possible to create an electronic medical record (EMR) that simultaneously serves as a research database? How much more would it cost?

Database Design

The possibility that a nationwide EMR might simultaneously serve as a medical research database relates directly to the underlying system design. The heart of any information system is a database, and perhaps the most common form is the 'relational database'. Indeed, relational databases underlie virtually every banking, retail, and airline reservation system in the world. Conceptually, relational databases are little more than two-dimensional arrays organized into columns, such as 'patient name', and rows such as 'Doe, John' and 'Smith, Robert'. These name-value pairs can be easily manipulated to form, for example, a diagnostic medical report. Because relational databases are widely used in other fields, such as banking and retail sales, the path of least resistance will be to create a nationwide EMR based on relational databases.

Organizing medical data into name-value pairs, however, has fundamental disadvantages. For example, in banking systems, the number of unique columns such as 'debit' and 'credit' is relatively small and, perhaps more importantly, there is a limited need to create new columns in the future. Medical information, conversely, often involves large numbers of terms unique to each medical test, and new tests are continuously developed that require even more unique 'name-value' pairs. Accordingly, the cost of a medical record system based on one or more relational databases might ultimately prove surprisingly expensive. In this setting, one might even consider rejecting the relational database model altogether in favor of an 'object-oriented database', i.e. one in which electronic documents such as medical reports remain unstructured and the EMR is designed solely to standardize the document exchange mechanism, much like the office fax machine standardizes the exchange mechanism for the free-formatted paper documents in use today. In the parlance of medical record system design, it is not difficult for discussions of such issues to consume collective energies.

There is, however, a more fundamental issue. The number of database columns and rows, the type of organizational unit, and even whether or not to use the traditional relational database model at all are relatively minor concerns. The main issue is how to store the patient's identity.

For routine clinical care the identity of the patient is of obvious importance. Associating a magnetic resonance imaging (MRI) scan showing a malignant tumor with the wrong patient, for example, is among the most dangerous of errors that can be made in medicine. For the purpose of patient care alone, therefore, there is considerable incentive to tightly integrate the patient's identity with the rest of the medical record.

For medical research, conversely, the patient's identity is theoretically irrelevant. The only reason a researcher might need to know the identity of the patient is to check for related records, i.e. did the patient with this MRI scan also have a cardiac catheterization? If one could design a system for which medical records could be associated with an individual and yet not require that the true identity of that individual be known, the patient's name would never be needed. The single factor that separates information needed for routine patient care from information needed for medical research, therefore, is the identity of the patient. In all other respects, the needs for information are identical.

The key to designing a nationwide electronic medical record system that simultaneously serves as a research database, therefore, lies in designing the system in such a way that two 'views' into the underlying data are provided – a public view and a private view.

In such a system only one physical copy of the data, stored on a single computer hard disk, would be all that is needed even when the data are to be used for multiple purposes.The first step toward achieving such a system design, therefore, is to very carefully consider the issue of patient privacy.

The Role of the Health Insurance Portability and Account ability Act

The Health Insurance Portability and Accountability Act (HIPAA) specifies exactly what does and does not constitute private health information (PHI). As shown in Table 1 there are 18 identifiers that, when fully removed, render a health record officially 'de-identified'. Data that are stripped of these 18 identifiers are regarded as de-identified, unless the covered entity has actual knowledge that it would be possible to use the remaining information alone or in combination with other information to identify the subject. Removing these 18 identifiers is referred to as the 'safe-harbor' method of de-identification; other methods are available but are subject to restrictions.

The importance of the safe-harbor de-identification method is difficult to overemphasize. Consider what the Department of Health and Human Services has to say about health records that have been de-identified via the safe-harbor method:

"The Privacy Rule [HIPAA] permits covered entities [eg. hospitals] to release data that have been de-identified without obtaining an Authorization [e.g. the patient's permission] and without further restrictions upon use or disclosure because de-identified data is not PHI [protected health information] and, therefore, not subject to the Privacy Rule."

This statement underscores the fact that once medical records have been sufficiently de-identified they are no longer subject to federal regulations. For de-identified data, therefore, there is no need to ask the patient for their permission to utilize their data, no need to protect the computer that serves the data by housing it inside a computer network firewall, no need for user accounts or passwords, no need to encrypt the data during transmission. In short, the position of the federal government is that once medical records are de-identified the data can be used by anyone for any purpose. If and Only If The key to designing an EMR for which all medical records are available for research, therefore, is to ensure that the data are stored in two distinct logical units:

private; and
public.

Importantly, this design criteria does not imply that a specific underlying database structure must be used. For example, Table 1 shows how a traditional 'relational database' could be organized into private and public 'views' that effectively filter out all HIPAA-protected information from public access. Table 2 shows how the same concept might be applied to an 'object-oriented database'; both private and public views are feasible.