I want to make the case in writing that a different approach is needed to get data out of a central registry database into analytical tools like SAS, GenEdits, InterRecordEdits, SEER*PATH, Match*PRO, etc.
The NAACCR Volume 2 has the title 'Data Standards and Data Dictionary.' Then the next piece is called the 'XML Data Exchange Standard.' The primary goal of the data exchange standard is to ensure seamless transmission between registries be it a hospital registry or a central registry. Nowhere is it written that the XML Data Exchange Record has to be read by any of the analytical tools. Almost all of our analytical tools do not read or cannot read XML documents very well.
Plan B: A secondary standard is needed that allows for a pipe-delimited formatted ASCII file to be exported from a central registry database to be input into an analytical tool. The two models for this are SEER*STAT and MATCH*PRO.
The primary assumption I am making is that right now a pipe ('|') is not contained in any names, addresses, or coding schemes collected by a registry or imported into a registry software system. If that assumption is violated, then we need to find another delimiter.
I would like to see developed analytical file formats that consist of selected data items needed for normal work. For instance, prior to calls for data, a list of data items would be developed along with the order that could be brought into GenEdits and InterRecordEdits. That file format would be installed in the registry vendor software to output the subsets of necessary cases.
In New Hampshire, because we are so small, we would seek all cases 1995-2017 that meet the required criteria. The output file would contain approximately 136 reportable data items along with a few confidential data items to facilitate editing of cases. The resulting file would be smaller than an XML file, faster to output and faster to read in the analytical software.
I would strongly prefer that the header use NAACCR Item numbers as the variables names (N18_20, N18_390, N18_400, etc.) to make manipulation easier.
Rarely does a central registry need to output the entire NAACCR record. It would be necessary for inter-state data exchange, for archive purposes, and for transmission to some authorities.
Other pipe delimited file formats could be used for submission to the NAACCR Geocoder. Match*PRO, etc. SAS PROC IMPORT can easily read into a pipe-delimited file with a header.
The use of analytical pipe-delimited does not diminish the value of the XML Standard for Data Exchange. At a certain size, a delimited file becomes unwieldy and cumbersome.