Forum - XML Standard

 View Only
  • 1.  Editor Tool for XML files

    Posted 05-06-2021 02:54 PM
    Bruce Riddle

    I've done a number of exercises with XML this spring. They have not gone well. We
    will need a number of very powerful and robust tools to make this work.

    a) A NAACCR XML Editor that would read a file as a flow that would display by patient all the patient and tumor information labeled. At the backend of the tool, you could choose which data items are displayed. Then in the tool, be able to:
    a. Make mass data changes on items like hospital number, data case report received (#2111), date case completed (#2090), or, in theory, any data item in a received file. We receive files without facility codes and we need to add them.
    b. Edit any patient or tumor record to make corrections or additions prior to loading into the database.
    b) Need a tool to separate out rapids or a definitive from a transmission prior to loading. By New Hampshire law, we receive two kinds of reports from reports, a 45-day rapid report and a 180 definitive report. We need to separate those out before loading into a database.
    c) We will need a tool that will let us take non-standard reports from clinics, path labs, smaller
    hospitals and make an XML file to load into a registry database.

    As I have written before, many smaller registries rely on SAS to do the above.

    Fabian Depry

    Hello Bruce,

    Those are very specific requirements, and you might not be able to resolve all of them with a single tool.

    I know the SEER Data Viewer (https://seer.cancer.gov/tools/dataviewer/) can be used to filter data, recode variables and re-create data files. The current version only supports the fixed-columns format, but it will support XML in the near future and you would be able to apply the same processing on XML data. I am not sure that tool will be able to handle all those requirements, but it might be worth for you to investigate it now and see how well it fits your processes.

    It is possible that somebody will come up with a way to read and write large XML data files with SAS, but so far those attempts have not been very successful and so looking into other available tools might be a good idea at this point.

    Bruce Riddle

    As I said in the beginning, "a number of powerful and robust tools." Many smaller registries do not
    have an IT staff so they have not really thought about XML and the impact it will have on registry operations. If many tools are not present, the move to XML will be very difficult.

    Isaac Hands

    Bruce, thank you for the specific requirements you outlined above. I am curious what database you are using at your registry? In b) and c) of your original post you wrote about the need to load XML data into a registry database, can you tell me what database technology you are using?

    Bruce Riddle

    We use RMCDS as the registry database. NH Rules and Regs require reporters to send us a rapid report within
    45 days of diagnosis. Almost all reporter transmissions contain a mix of rapid and definitive or complete reports. I use SAS to separate out rapids from definitives. In that step, I can also correct for missing or incomplete data.

    RMCDS only lets us load NAACCR records. I use SAS to take reports from non-hospital reporters –pathology cases, death clearance only records, clinic records–to create a NAACCR record to load into RMCDS.

    Bruce

    Isaac Hands

    OK, so the rapid reports are NAACCR records with a lot of empty/missing data? What record type do they use?

    And I assume the pathology/DC/clinic records are not a well defined format, but you use SAS to normalize them into a NAACCR record?

    Bruce Riddle

    We expect we will start to receive XML files in January or February. Although the flat file option will exist, I expect some IT people will make the choice for the registrars. Almost immediately I will need tools to go into the file to make changes such as missing hospital numbers or missing dates. I will also need to figure out a way to separate out rapid reports (within 45 days of diagnosis) and definitive reports (within 180 days of diagnosis). Please, can anyone suggest an XML text editor? I assume given the size of the files, they will arrive Zipped so it would be nice if the editor read and saved ZIP files.

    Thanks.

    Isaac Hands

    Bruce, the best general purpose XML editor I have found is Oxygen XML Editor (https://www.oxygenxml.com) which ended up costing us about $100 initial purchase and $22 / year for maintenance/updates. It will open .gz files natively and has a nice feature that will transform/change XML documents according to the XQuery standard (if you are willing to learn XQuery transformations): https://www.w3schools.com/xml/xquery_intro.asp
    Just to be clear, I do not use an XML editor for NAACCR XML since we parse all of our incoming XML files into a relational database before making changes.
    Also, I think you are following the other forums threads, but the NAACCR XML SAS macro seems to be working for people that need to handle XML natively in SAS: https://github.com/imsweb/naaccr-xml/wiki/7:-NAACCR-XML-and-SAS

    Joseph Rogers

    Bruce and others,

    CDC/NPCR plans on updating XML Exchange Plus https://www.cdc.gov/cancer/npcr/tools/registryplus/xml-exchange-plus.htm to allow the user options in creating the output files. Here are a few options we plan on including:
    -Output files that look like the current NAACCR flat file standard. The sequence of the
    fields will be based upon the NAACCR item number.
    -The output file can include all NAACCR fields or a subset of fields
    -The user can select column or other character delimiters
    -Checks for not standard characters
    -Record level editing options

    Please let me know if you think this will cover your needs.

    Bruce Riddle

    Joe,
    I think is a wonderful idea. I have used XML Exchange Plus
    in my experiments and I like the tool.

    Bruce

    HAVIP LAW

    OK, so the rapid reports are NAACCR records with a lot of empty/missing data? What record type do they use?

    And I assume the pathology/DC/clinic records are not a well defined format, but you use SAS to normalize them into a NAACCR record?