CPIC’s Biostatisticians: Captains in a sea of “big data”

The world is catching up to what CPIC has known for a long time, that the bigger and richer the data, the better. CPIC collects many “big data” sets mindboggling in their scope when it comes to describing cancers, including reams and reams of genetic sequencing data. It would be very easy to be overwhelmed by the sheer quantity of information, but thankfully, we have people who can make sense of it, captains to navigate you to the answers within the data. At CPIC those people are called biostatisticians, and the work they do would make most of our heads spin with information overload.

David NelsonAccording to David Nelson, Ph.D., principal biostatistician at CPIC, the need for quality data analysis has grown as access to data, and the quantity and complexity of that data has grown. “There has always been a tremendous volume of data available, but until the arrival of easily accessible websites, it was not in a very usable form. Now we just go online and download the stuff.”

While Dr. Nelson is involved in his own research — primarily environmental exposure assessment involving large data sets (of course) — but also he provides CPIC’s research scientists with assistance in designing their studies — “making sure that the study design and the analysis plan make sense.” This is particularly important when dealing with long sequences of data.

In fact, CPIC’s research scientists rely on Dr. Nelson and his colleagues for more than that. “Most of the research scientists don’t have time to do their own analysis,” according to Clayton Schupp, Ph.D., supervisor of the Surveillance Research department and an epidemiologist with an extensive background in biostatistics, “they are busy writing grants, so they turn that task over to our department.”

Clayton SchuppDr. Schupp and his department — composed primarily of staff with master’s degrees in public health, policy or statistics — work specifically with the “big data” of the Greater Bay Area Cancer Registry operated by CPIC and provide statistical analysis to a number of additional research studies. “Many of the analysts also are involved in the arts of interpreting results and writing of studies as well.”

Outside of Dr. Schupp’s department are staff doing biostatistics work for non-surveillance research. One such biostatistician is Alison Canchola. With a master’s degree in biostatistics, Alison works very closely with Pamela Horn-Ross, Ph.D., and Rudy Rull, Ph.D., M.P.H., on the California Teachers Study.

“This is my dream job,” Alison said. “When I left graduate school I took a job with UCSF doing AIDS research, but kept hearing about CPIC and tried to find a way to get in here, but there are rarely openings.” With almost 11 years at CPIC now, Alison clearly found a way.

“I spend a lot of time doing data cleaning and management, getting it ready for analysis, and then doing the analysis,” she said. “I really like working with the data, for me that’s the fun part.”

If it seems that the work of the biostatisticians is a bit solitary, that’s because it is. According to Alison, however, they are working on that as well. “We’re putting together an SAS users group to connect with other people in the office.” (SAS is the statistical software the group uses for analysis). “This is pretty solitary work requiring a lot of deep thinking.”

© Cancer Prevention Institute of California