Stephanie A. Wankowicz

Science and medicine is dependent on data. Data is why we can state something as a fact or theory rather than an opinion or hypothesis. The more data you have proving your theory, the evidence you have, and thus the more confident one can be in recommending a treatment or outcome. Often people talk about evidence based medicine. This evidence comes from data generated from clinical trials.

While a lot of data has been amassed thus far, there is much more that we do not know in medicine and science. This is especially true in the cancer genetics field, as this field is still new. To obtain the real benefits from precision medicine, we need more data to generate new evidence.

There are over 80 million single base variants that can occur in DNA. If we included additional variants that involve more than one base, there are countless. Due to the number of different variants that can occur in one cancer, there is a lot more data that is needed. In addition, there are thousands of different clinical courses that one can take, which once again increases the amount of data we need.

Increasingly, cancer patients are getting their tumor sequenced as part of their clinical care. But, often that data is not assessed into public or research databases. In the world of cancer risk, where patients get tested for BRCA1 or BRCA2, this data has started to be pooled into public databases, such as ClinVar. While these databases are still growing, they have provided an incredible resource to geneticists, which helps patients, family, health care providers, and payers. This database grew out of multiple parties donating data. This database is continuing to grow with different parties donating information, including Atena Insurance, which requires any test they pay for clinically to be donated to ClinVar. In addition, only recently has there been any consensus on how to classify these variants, such as pathogenic, potentially pathogenic, ect.

However, there is no central public database in cancer genetics. While there are efforts to share data, such as the Cancer Genome Atlas, the International Cancer Genome Consortium, Foundation Medicine Database, and the Cancer Gene Census by GA4GH, this data has limited clinical characteristics. For us to know how changes in a tumor’s DNA determine clinical outcome, we must have data from both the genetics and the clinical data. There is no way for patients, payers, or providers to donate data. Nor is there a way to systematically label these patients, although there are multiple groups working on this.

According to Article two of the 1948 Universal Declaration of Human Rights, every individual has the right to ‘share in share in scientific advancement and its benefits’. Moreover, healthcare data is a common good. Common good is something that benefits society as a whole, compared to a private good that only benefits individuals or a section of society. Everyone has alterations in their DNA. Everyone will have some illness at some point in their lives. If we pool and use medical and genetic data, this is going to benefit everyone in our society.

As patients, as a society, as payers, as healthcare providers, we need to move the field to share and pool cancer genetic and clinical data. We need to #FreetheData.

Free the Data!