Forum - XML Standard

 View Only

Tip #1: Running EDITS efficiently on NAACCR XML structures

  • 1.  Tip #1: Running EDITS efficiently on NAACCR XML structures

    Posted 05-06-2021 01:10 PM
    Edited by Tricia Kulmacz 05-06-2021 01:25 PM
    • Kathleen Beaumont

      Now that NAACCR XML has begun to change how we think about patient/tumor data, where the nesting looks like this:

      <Patient>
      	<Tumor> </Tumor>
      	<Tumor> </Tumor>
      </Patient>

      …it seems like it may be time to mention ways to make the editing process more efficient.

      My first concept requires some assistance from the EDITS metafile, and how individual edits are organized into edit sets. Put simply, the edits should be organized into Patient and Tumor levels. The Patient edit set would contain edits dealing only with data items where parentXmlElement is "NaaccrData" or "PatientData". The Tumor edit set would contain edits where at least one of the referenced data items has parentXmlElement="TumorData".

      As explained in the XMLExchange Plus Help file, the Edit Engine still expects a flat data buffer, which the client application can generate dynamically from the NAACCR XML data.

      Consider the scenario outlined above, where two tumor records are nested within a single patient record. You need run the Patient-level edits only once for both tumors, but you want to run the Tumor-level edits once for each tumor.

      Pseudo-code for processing would look something like this:

      /* Prepare the flat data buffer according to the selected Layout from the metafile (refer to XMLExchange Plus documentation for details) */
      
      /* For Patient 1 to Patient <n> */
      {
          /* Load the NaaccrData and PatientData items into the data buffer */
          run Patient edit set
      
          /* For Tumor 1 to Tumor <n> */
          {
              /* Load the TumorData into the data buffer */
              run Tumor edit set
      
              /* on to the next Tumor */
           }
      
           /* on to the next Patient */
      }

      This process yields two advantages, IMO:

      * Edits are not unnecessarily run multiple times on the same Patient data, so…
      * Error reports do not fill up with a lot of noise reporting a Patient-level error more than once

      If this concept is adopted for the current NaaccrData, PatientData, TumorData structure, it should be expandable to future nesting structures (hospital/admission, treatment, etc.) when those come along.

      I hope this makes sense to everyone.

      Kathleen

      Fabian Depry

      In my opinion, this makes total sense!

      In theory the same concept can be applied to "NaaccrData" edits vs "Patient" edits where an edit on the registry ID item (for example) would only fail one for an entire data file. But I completely understand why supporting that would be much more difficult and there wouldn't be much gain anyway (there are so little data items at that root level).

      Jeff Reed

      My view is similar, each tumor instance needs a check, so if the header patient info fails the edit should skip tumor data and skip to the next patient, is that something that is going to be in EDITS50?

      Kathleen Beaumont

      Hi Jeff,

      My view is similar, each tumor instance needs a check, so if the header patient info fails the edit should skip tumor data and skip to the next patient, is that something that is going to be in EDITS50?

      Yes, this is in EDITS50 (for what it's worth, it was in EDITS40, too). As you'll see when you start working with the EDITS50 API, your application will be notified of edit errors as they occur through the callback mechanism. So a decision about how to handle the discovery of an edit error being raised is in your control.

      Caveat: The Engine will run all of the edits in an edit set for the currently-loaded case record. If you catch an error upon running the Patient edit set, the Engine will run the remaining edits in that edit set and then return control to your processing loop. You may take whatever action you deem appropriate at that time… bail on that Patient, continue on to find out whether/how many Tumor level errors may exist, whatever.