NEW (updated March 4, 2020) Letter of Solicitation for Validation Studies: Using California Breast Cancer Registry Data with Synthetic Census Tracts (2006-2016)

  • 1.  NEW (updated March 4, 2020) Letter of Solicitation for Validation Studies: Using California Breast Cancer Registry Data with Synthetic Census Tracts (2006-2016)

    Posted 04-01-2020 15:28

    The NCI's SEER program has announced an updated letter to solicit applications of validation studies using California breast cancer registry data with synthetic census tracts (2006-2016). The due date for submitting study proposals is May 31, 2020 and may be considered for extension under special circumstances.  


    Highlights of the updated SynCan database include 1) time-series of cancer incidence data from 2006-2016 and 2) inclusion of Oncotype-dx scores. The NCI also intent to make up to 5 fixed-priced awards for this project. More information about this solicitation is below as well as on the NCI website (


    Please contact the NCI SEER Synthetic Data Staff (, if you have any questions.


    Thank you.


    Mandi Yu, Ph.D.

    Surveillance Research Program

    Division of Cancer Control and Population Sciences

    National Cancer Institute






    NEW (updated March 4, 2020) Solicitation of Study Proposals: Using California Breast Cancer Registry Data with Synthetic Census Tracts


    The National Cancer Institute (NCI)'s Surveillance, Epidemiology, and End Results (SEER) Program collects and releases de-identified data for individual cancer diagnoses and outcomes in the United States. The geographic locations of patients at the time of diagnosis are extremely valuable for studying associations between characteristics of an area and cancer rates, as well as for detecting cancer clusters and monitoring geospatial disparities in cancer burden.


    Census tracts are a very useful geographic unit to work with but are not publicly available. The Synthetic California Breast Cancer Registry Data (SynCan) is a pilot data product that utilizes statistical models to synthesize census tracts of residence for each breast cancer patient diagnosed in California from 2006 to 2016. A selected set of variables, such as patients' demographics, tumor characteristics, and census tract attributes, were modeled in a manner that potentially changes the census tracts of all patients within the county boundaries while preserving the covariate relationships between the census tracts and the selected variables. The purpose of the SynCan is to provide external users with access to census tract data that are not publicly available because of confidentiality concerns. Without synthetic census tracts, much of the research that requires census tracts would be logically difficult or impossible, as data permission must be obtained separately from each cancer registry and most (if not all) permissions require an IRB review. While most states within SEER catchment areas are supported by one Central Cancer Registry (CCR), the state of California is supported by four CCRs. For straightforward descriptive analyses, the SynCan has been shown to produce similar cancer statistics by census tract-based socioeconomic variables. The usefulness of SynCan for complex studies has yet to be established. More details about an earlier version of the SynCan, the synthesis methodology, and the utility of SynCan are documented in Yu et al. (2017).


    This announcement solicits proposals for publishable substantive and geospatial methodological studies of breast cancer in California that require the use of census tract-level data using SynCan. A secondary purpose is to evaluate the usefulness of synthetic census tract data in supporting real-world studies. The NCI will select a few proposals that represent a wide range of analyses types. The proposals will be judged solely on the feasibility of the proposed analysis. Once a proposal is accepted, the user will be given an account to SEER*Stat, through which the data can be accessed and analyzed. The SynCan includes all variables that are currently released in the SEER research dataset and several census tract-level ecologic variables, such as median household income, median house value, median rent, percent below poverty, and education level. Users may be allowed to export individual data to be analyzed by any methods deemed appropriate for the study of interest. At the end of the study, the investigators will submit code for all statistical analyses (preferably written in SAS or R) to the NCI. NCI will run the analyses using the actual confidential data and report the results back to the investigators for publication (including all analyses to be reported). NCI will review publications for any confidentiality issues prior to submission. NCI will compile the results across the studies and plans to publish comparisons of SynCan results and validated results in a peer-reviewed publication. This publication will probably be a summary across completed analyses from all investigators. Interested investigators will have the option to be collaborators on this publication.


    The government's intent is to make up to 5 awards for this project. If you have interest in participating submit a proposal priced per deliverable (please see section 9.0 of the Statement of Work) by 3:00 p.m. ET May 31, 2020. Selection decision notifications will be sent by June 1, 2020 with awards made by June 30, 2020. Please see the Statement of Work (PDF, 236 KB) for additional details.


    Under this agreement, investigators are required to submit analysis plans and table shells of analytic results to the NCI for approval by August 31, 2020 or two months after the award (whichever is later), and to submit codes for approved analyses to the NCI no later than December 31, 2020 or six months after the award (whichever is later). NCI agrees not to submit its validation study until all selected investigators have had the opportunity to publish their results independently, although no later than December 31, 2021 or 18 months after the award (whichever is later). Investigators are also expected to sign the Data Use Agreement (PDF, 81 KB) before the validation study can proceed. Please use the attached application form (PDF, 111 KB) to prepare the proposal.


    To submit an application or send inquiries, please contact:


    NCI SEER Synthetic Data Staff
    Surveillance Research Program
    Division of Cancer Control and Population Sciences
    National Cancer Institute
    National Institutes of Health



    Yu M, Reiter JP, Zhu L, Liu B, Cronin KA, Feuer EJR. Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers. Am J Epidemiol 2017 Jul 1;186(1):83-91. [View Abstract]



    Follow SRP on Twitter!  



    U.S. Department of Health and Human Services
    National Institutes of Health
    National Cancer Institute  |

    NIH... Turning Discovery Into Health®