Forum - XML Standard

 View Only

Discussion: The case for preserving naaccrNum

  • 1.  Discussion: The case for preserving naaccrNum

    Posted 05-06-2021 03:04 PM
    Kathleen Beaumont

    Note: I argued with all my strength in the NAACCR XML Work Group about the choice to eliminate the data item number (naaccrNum) from NAACCR XML data files, and lost. Now the news comes in over my transom that a campaign is afoot to eliminate the item number from the entire XML specification and from Volume II. I am going to make one last appeal here to the community at large not to let that happen.

    The EDITS tools perform data validation processing referring to NAACCR Volume II and custom data items using their unique number identifiers. There are two leading reasons why this design was implemented and remains in practice:

    * Volume II has historically made changes to data item names. The numbers are immutable.
    * Computers prefer numbers to strings.

    The argument was made to me that the naaccrId is now the "immutable" identifier. By design, it repackages the data item name so that a casual reader of a data file knows all it contains about a patient's cancer. The naaccrId is already out of synch with the (modified) data item name in a number of cases, but mostly seems to suggest what the item is. (Does countyAtDxGeocode1990 mean the same thing as countyAtDxGeocode19708090? The latter represents how the NAACCR XML specification's algorithm would translate the current name of data item 94.)

    But more important to any programmer who has to listen to somebody complain that his software doesn't run fast enough is that second bullet. When you use a string as an identifier, your processing includes running your language's equivalent of string.compare() to determine whether your search for "phase1RadiationExternalBeamPlanningTech" is found in the XML record (it has to evaluate every byte of that string to be sure). And it has to run this process every time you want that item.

    Or you could simply ask it to find data item 1502. Pop quiz: Which look-up happens faster?

    I am unpersuaded of the value of naaccrId at all, but can accept it as a property of the specification since it seems to be fundamental to at least one vendor's implementation. But let's not rush to throw out the naaccrNum. All that accomplishes is diminished performance for high-volume batch processing such as EDITS.

    Kathleen

    Bruce Riddle

    I agree that NAACCR Number is very important to operations. Almost all by
    code uses the NAACCR number to refer to variables. The number gives me an
    exact name. I cannot imagine working with on a day to day with the longer names
    in the XML specification writing code. It is just a great deal of typing and chances
    to make errors.

    Bruce

    Bestro Pattrix

    Almost all by
    code uses the NAACCR number to refer to variables. The number gives me an
    exact name. I cannot imagine working with on a day to day with the longer names
    in the XML specification writing code. It is just a great deal of typing and chances
    to make errors. case

    Fabian Depry

    To the best of my knowledge, there is no plan (short term or long term) to eliminate the NAACCR numbers from the NAACCR XML specifications.

    It is true that the numbers are optional in the data files, but they are required in the dictionaries; meaning it is not possible to define a data item without defining a unique number for it.

    The reasoning for introducing unique NAACCR XML IDs was that they are human readable, they introduce less potential conflicts (if an organization adds a proper prefix to its own data item, it basically eliminates any possible conflict with other data item IDs), and they are valid programming variables (for most languages), allowing some developers to use them as-is in their programs (it wouldn't make sense to call a variable 400 but it can be called primarySite).

    The reasoning for making the numbers optional in the data file was that the IDs uniquely identify the data items, and so technically a software shouldn't need them to consume a data file (if a software prefers to deal with the numbers, then the dictionary provides a one-to-one mapping between the IDs and the numbers).

    At the end, some people/organizations prefer to deal with numbers the way they always have, and some people/organizations have embraced the new IDs, and stopped using the numbers completely. Both approaches are perfectly fine.