I did some investigation and anyone that wants to obtain a synthetic v18 XML dataset for evaluation or training, containing about 500,000 records, can contact Recinda Sherman at NAACCR (rsherman at naaccr.org) with the following information in your email:
1. Explanation of what the data will be used for and whether you want record type I or C – the data cannot be shared outside of your stated use.
2. Filled out "Data Confidentiality Agreement for Researchers" document from this url: https://www.naaccr.org/irb-information-for-cina/#IRBFORMS
This synthetic dataset is not just junk data, Recinda can explain the characteristics best, but the data values are meaningful with respect to the distribution of values from actual cancer datasets and have been appropriately anonymized.