omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

omniparser

CI codecov Go Report Card PkgGoDev Mentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.

Golang Version: 1.14

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the .snapshots sub directory, you'll find their corresponding output files.

Online Playground

Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.

Requirements

  • Golang 1.14

Recent Major Feature Additions/Changes

  • Added Transform.RawRecord() for caller of omniparser to access the raw ingested record.
  • Deprecated custom_parse in favor of custom_func (custom_parse is still usable for back-compatibility, it is just removed from all public docs and samples).
  • Added NonValidatingReader EDI segment reader.
  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
      • 'result_type' -> 'type'
      • 'ignore_error_and_return_empty_str -> 'ignore_error'
      • 'keep_leading_trailing_space' -> 'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
    • Changed the way how we package custom functions for extensions: previously we collect custom functions from all extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom functions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • Introduced IDR for in-memory data representation.
  • Added trie based high performance times.SmartParse.
  • Command line interface (one-off transform cmd or long-running http server mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

Comments
  • [EDI] Huge memory leak when parsing ~20MB EDI file

    [EDI] Huge memory leak when parsing ~20MB EDI file

    Hi, we found a huge memory leak on parsing the EDI file with a size of over 20Mb

    • EDI is a sample INVOIC type with millions of LIN items.
    • attaching SVG of export --memprofile of benchmark,
    • it allocates over 5GB memory for the single ~23MB EDI file

    is there any alternative to rather than calling ingester.Parse() for the large files?

  • Return unique hash for input

    Return unique hash for input

    So i'm busy ingesting shipments, they arrive as either csv, json, xml or edi

    The interface I'm working should take an array of shipments, divide that into individual shipments, hash those and store the original input for success/audit/retry/failure tracking reasons. This would make it easier to ingest 99/100 shipments and retry (after localizing and fixing the issue) that one shipment that's invalid for whatever reason.

    In order to decide whether something has been ingested correctly I thought a solution could be hashing it 'unit' of input and storing the original input somewhere as well

    Quite easy for csv

    Weird python-and-bash-esque psuedocode:

    for line in csv:
      process(line) && hash(line) && gzip(line) -> store result, hash, line in db
    

    It becomes less so for json and xml, even marshal and unmarshal is not 100% identical to the input

    Even worse is EDI

    So, even though I liked the idea of storing the original it quickly becomse cumbersome. A decent alternative is is hashing and storing the output of transform.Read()

    But that comes with several issues

    • I can change the output and thus the hash using the schema (not really an issue)
    • its not original (but it is more consistent (all json)), so kind of bug/feature
    • I don't see what I haven't told omniparser to see, so new fields that might have been added

    None of these are a major issue, but part of hashing a new representation of the input, not the input itself

    I was wondering how hard would it be to hash the input of whatever generates the output would be? So: hash, data, err := transform.Read

    Is your internal data stable enough? That you could say 'for loop' the IDR input through the sha256 encoder (it supports streaming) and return a stable/unchanging hash?

    As in, in theory ["a", "b", "c"] should return the same hash for a, b and c regardless of ordering

    Also, I imagine being able to verify whether a file has been fully processed is interesting for more than one usecase

  • [EDI] Handle segment compression

    [EDI] Handle segment compression

    Disclaimer: I only assume this is segment compression, as defined in the manual

    7.1 Exclusion of segments Conditional segments containing no data shall be omitted (including their segment tags).

    This is what I encountered in the schema, basically a mandatory/conditional sandwich.

    SG25 R 99
    43 NAD M 1
    44 LOC Orts 9 O
    
    SG25 R 99
    45 NAD M 1
    46 LOC O 9
        SG29 C 9
        47 RFF M 1
    
    SG25 O 99
    48 NAD M 1
    
    SG25 D 99
    49 NAD M 1
    
    SG25 D 99
    50 NAD M 1
    
    SG25 O 99
    51 NAD M 1
    
    SG25 M 99
    52 NAD M 1
        SG29 C 9
        53 RFF M 1
    
    SG25 D 99
    54 NAD M 1
    
    SG25 R 99
    55 NAD M 1
    
    SG25 R 99
    56 NAD M 1
        SG26 C 9
        57 CTA O 1
        58 COM O 9
    

    None of the conditional statements were present in the data I was trying to parse, ended up fixing it using:

                        "name": "SG25-SENDER",
                        "min": 1,
                        "type": "segment_group",
                        "child_segments": [
                          {
                            "name": "NAD",
                            "min": 1,
                            "elements": [
                              { "name": "cityName", "index": 1 },
                              { "name": "provinceCode", "index": 2 },
                              { "name": "postalCode", "index": 3 },
                              { "name": "countryCode", "index": 4 }
                            ]
                          },
                          { "name": "LOC", "min": 0 }
                        ]
                      },
                      {
                        "name": "SG25-RECEIVER",
                        "min": 1,
                        "type": "segment_group",
                        "child_segments": [
                          { "name": "NAD", "min": 1 },
                          { "name": "LOC", "min": 0 },
                          {
                            "name": "SG29",
                            "min": 0,
                            "type": "segment_group",
                            "child_segments": [{ "name": "RFF", "min": 1 }]
                          }
                        ]
                      },
                      {
                        "name": "SG25-OTHERS",
                        "min": 0,
                        "max": 99,
                        "type": "segment_group",
                        "child_segments": [
                          {
                            "name": "SG26",
                            "min": 0,
                            "type": "segment_group",
                            "child_segments": [
                              { "name": "CTA", "min": 0 },
                              { "name": "COM", "min": 0, "max": -1 }
                            ]
                          },
                          { "name": "NAD", "min": 0, "max": -1 },
                          { "name": "LOC", "min": 0 },
                          {
                            "name": "SG29",
                            "min": 0,
                            "type": "segment_group",
                            "child_segments": [{ "name": "RFF", "min": 1 }]
                          }
                        ]
                      },
    

    The message I'm trying to parse

    NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
    NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
    CTA+CN+AS:NL01001 Foo'
    COM+0031765140344:TE'
    [email protected]:EM'
    NAD+LP+04900000250'
    

    Which basically means, grab the two explicit ones (luckily at top), and do as you wish with the others in whatever order you encounter them. I'm not sure how I would have handled it if I did care about NAD+LP

    Also had to use min/max 1 instead of the specified 99, as it only considers NAD, not NAD+FIRSTVALUE when 'collapsing' similar but not same segments.

    Basically, the EDI specification has a lot of implicitness which I think is quite hard to easily parse.

  • supports a

    supports a "segment_prefix" in the edi parser file declaration

    I'm working with a non-standard EDI format that includes a segment prefix. For example, a message might be:

    |HDR|1|2|3|
    |DAT|X|
    |EOF|
    

    where every segment begins with a pipe. I thought that I could get around this by making the segment delimiter include the next pipe (ie |\n|), but this doesn't catch the very first pipe.

    I propose including a new (optional) "segment_prefix" field in the file_declaration to catch segment prefixes.

  • Edi parsing failing, with error segment needs min occur 1, but only got 0

    Edi parsing failing, with error segment needs min occur 1, but only got 0

  • [EDI] Handle (or ignore?) line endings

    [EDI] Handle (or ignore?) line endings

    Consider the header

    UNA:+.? '

    This wouldn't be

        "component_delimiter": ":",
        "element_delimiter": "+",
        "segment_delimiter": "'",
        "release_character": "?",
    

    But instead is

        "component_delimiter": ":",
        "element_delimiter": "+",
        "segment_delimiter": "'\n", <--
        "release_character": "?",
    

    If the file you're trying to read contains line endings. Maybe the safest/easiest option would be removing all line endings?

  • JSON/XML to EDI conversion

    JSON/XML to EDI conversion

    Could you please clarify if JSON / XML data can be converted to EDI EDIFACT data? If feasible, could you please provide a small example? Thank you in advance.

  • Ignore blank rows CSV

    Ignore blank rows CSV

    Hi,

    encountered a csv (converted from xlsx) where one line was all blank (,,,,,,)

    Errored on 'cannot convert "" to int'

    Now I fixed it at the source, but in general, it might be useful to add the option 'ignore blank row' or be able to set a default (like you can in EDI) for empty values? Especially when casting? (in this case, setting '' to 0)

  • EDIFACT parser segment skip

    EDIFACT parser segment skip

    Could you please let me know if it's possible to skip a segment if it's not declared on schema but it's present on the input and vice versa?

    Error generated: bad request: transform failed. err: input 'test-input' at segment no.8 (char[247,247]): segment 'details/NAD' needs min occur 1, but only got 0 Input: UNA:+.? ' UNB+UNOC:3+9999999999999:14+9999999999998:14+210419:1622+446047262+ORDERS' UNH+1+ORDERS:D:96A:UN:EAN008' BGM+220::9+6666666666+9' DTM+137:20210419:102' DTM+2:20210518:102' FTX+PUR+3++STORE ORDER:DR01' RFF+PUR+3++STORE ORDER PLEN:DR01' NAD+BY+9999999999999::9'

    Schema: { "parser_settings":{ "version":"omni.2.1", "file_format_type":"edi" }, "file_declaration":{ "segment_delimiter":"'", "element_delimiter":"+", "component_delimiter":":", "ignore_crlf":true, "segment_declarations":[ { "name":"details", "is_target":true, "type":"segment_group", "min":0, "max":-1, "child_segments":[ { "name":"UNA", "elements":[ { "name":"random1", "index":1 } ] }, { "name":"UNB", "elements":[ { "name":"syntaxIdentifier", "index":1 }, { "name":"buyerGln", "index":2 }, { "name":"sellerGln", "index":3 }, { "name":"docDate", "index":4 }, { "name":"transferNumber", "index":5 }, { "name":"documentType", "index":6 } ] }, { "name":"UNH", "elements":[ { "name":"documentType2", "index":1 }, { "name":"fileFormatType", "index":2 } ] }, { "name":"BGM", "elements":[ { "name":"orderType", "index":1 }, { "name":"orderNumber", "index":2 }, { "name":"SignatureForOriginal", "index":3 } ] }, { "name":"DTM", "elements":[ { "name":"qualifierDocDate", "index":1, "component_index":1 }, { "name":"docDate", "index":1, "component_index":2 }, { "name":"formatDate", "index":1, "component_index":3 } ] }, { "name":"DTM", "elements":[ { "name":"qualifierDeliveryDate", "index":1, "component_index":1 }, { "name":"deliveryDate", "index":1, "component_index":2 }, { "name":"deliveryformatDate", "index":1, "component_index":3 } ] }, { "name":"FTX", "elements":[ { "name":"containPurchaseInformation", "index":1 }, { "name":"defaultValue", "index":2 }, { "name":"freeText1", "index":4, "component_index":1 }, { "name":"freeText2", "index":4, "component_index":2 } ] }, { "name":"NAD", "elements":[ { "name":"partyQualifier1", "index":1 }, { "name":"partyGln1", "index":2, "component_index":1 }, { "name":"partyIDcode1", "index":2, "component_index":3 } ] } ] } ] }, "transform_declarations":{ "FINAL_OUTPUT":{ "object":{ "una_elem1":{ "xpath":"UNA/random1" }, "header1":{ "object":{ "syntaxIdentifier":{ "xpath":"UNB/syntaxIdentifier" }, "buyerGln":{ "xpath":"UNB/buyerGln" }, "sellerGln":{ "xpath":"UNB/sellerGln" }, "docDate":{ "xpath":"UNB/docDate" }, "transferNumber":{ "xpath":"UNB/transferNumber" }, "documentType":{ "xpath":"UNB/documentType" } } }, "heade2":{ "object":{ "documentType2":{ "xpath":"UNH/documentType2" }, "fileFormatType":{ "xpath":"UNH/fileFormatType" } } }, "document":{ "object":{ "documentType2":{ "xpath":"BGM/orderType" }, "orderNumber":{ "xpath":"BGM/orderNumber" }, "SignatureForOriginal":{ "xpath":"BGM/SignatureForOriginal" } } }, "docDate":{ "object":{ "qualifierDocDate":{ "xpath":"DTM/qualifierDocDate" }, "docDate":{ "xpath":"DTM/docDate" }, "formatDate":{ "xpath":"DTM/formatDate" } } }, "deliveryDate":{ "object":{ "qualifierDeliveryDate":{ "xpath":"DTM/qualifierDeliveryDate" }, "deliveryDate":{ "xpath":"DTM/deliveryDate" }, "deliveryformatDate":{ "xpath":"DTM/deliveryformatDate" } } }, "freeText":{ "object":{ "containPurchaseInformation":{ "xpath":"FTX/containPurchaseInformation" }, "defaultValue":{ "xpath":"FTX/defaultValue" }, "freeText1":{ "xpath":"FTX/freeText1" }, "freeText2":{ "xpath":"FTX/freeText2" } } }, "PartyInformation1":{ "object":{ "partyQualifier":{ "xpath":"NAD/partyQualifier1" }, "partyGln":{ "xpath":"NAD/partyGln1" }, "partyIDcode":{ "xpath":"NAD/partyIDcode1" } } } } } } }

  • Parser only loops 10 times

    Parser only loops 10 times

    I can not seem to get this to loop EB more than 10 times. Any help would be appreciated .

    {
        "parser_settings": {
            "version": "omni.2.1",
            "file_format_type": "edi"
        },
        "file_declaration": {
            "segment_delimiter": "~",
            "element_delimiter": "*",
            "component_delimiter": "|",
            "ignore_crlf": true,
            "segment_declarations": [
                {
                    "name": "ISA",
                    "child_segments": [
                        {
                            "name": "GS",
                            "child_segments": [
                                {
                                    "name": "transaction_set_id",
                                    "type": "segment_group",
                                    "is_target": true,
                                    "child_segments": [
                                        {
                                            "name": "ST",
                                            "elements": [
                                                {
                                                    "name": "X12Form",
                                                    "index": 1
                                                },
                                                {
                                                    "name": "TransactionSetControlNumber ",
                                                    "index": 2
                                                },
                                                {
                                                    "name": "ImplementationConventionReference",
                                                    "index": 3
                                                }
                                            ]
                                        },
                                        {
                                            "name": "BHT",
                                            "min": 0,
                                            "elements": [
                                                {
                                                    "name": "BHT01",
                                                    "index": 1
                                                },
                                                {
                                                    "name": "BHT04",
                                                    "index": 3
                                                },
                                                {
                                                    "name": "BHT05",
                                                    "index": 4
                                                },
                                                {
                                                    "name": "BHT06",
                                                    "index": 5
                                                }
                                            ]
                                        },
                                        {
                                            "name": "HL",
                                            "type": "segment_group",
                                            "min": 0,
                                            "max": -1,
                                            "child_segments": [
                                                {
                                                    "name": "HL",
                                                    "elements": [
                                                        {
                                                            "name": "HL1",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "HL4",
                                                            "index": 4
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "TRN",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "TRN00",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "TRN01",
                                                            "index": 2
                                                        },
                                                        {
                                                            "name": "TRN02",
                                                            "index": 3
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "NM1",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "NM1101",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "NM1102",
                                                            "index": 2
                                                        },
                                                        {
                                                            "name": "NM1103",
                                                            "index": 3
                                                        },
                                                        {
                                                            "name": "NM1108 ",
                                                            "index": 8
                                                        },
                                                        {
                                                            "name": "NM1109",
                                                            "index": 9
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "N3",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "N301",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "N302",
                                                            "index": 2
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "N4",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "N401",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "N402",
                                                            "index": 2
                                                        },
                                                        {
                                                            "name": "N403",
                                                            "index": 3
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "DMG",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "DMG01",
                                                            "index": 1
                                                        },
                                                        {
                                                            "name": "DMG02",
                                                            "index": 2
                                                        },
                                                        {
                                                            "name": "DMG03",
                                                            "index": 3
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "DTP",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "DTP01",
                                                            "index": 1
                                                        }
                                                    ]
                                                }
                                            ]
                                        },
                                        {
                                            "name": "EB",
                                            "min": 0,
                                            "max": -1,
                                            "type": "segment_group",
                                            "child_segments": [
                                                {
                                                    "name": "EB",
                                                    "elements": [
                                                        {
                                                            "name": "EB01",
                                                            "index": 1,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB02",
                                                            "index": 2,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB03",
                                                            "index": 3,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB04",
                                                            "index": 4,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB05",
                                                            "index": 5,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB06",
                                                            "index": 6,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB07",
                                                            "index": 7,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "EB08",
                                                            "index": 8,
                                                            "default": ""
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "DTP",
                                                    "min": 0, "max": -1,
                                                    "elements": [
                                                        {
                                                            "name": "DTP01",
                                                            "index": 1,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "DTP02",
                                                            "index": 2,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "DTP03",
                                                            "index": 3,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "DTP04",
                                                            "index": 4,
                                                            "default": ""
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "LS",
                                                    "min": 0, "max": -1,
                                                    "elements": [
                                                        {
                                                            "name": "LS01",
                                                            "index": 1,
                                                            "default": ""
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "HSD",
                                                    "min": 0,
                                                    "max": -1,
                                                    "type": "segment_group",
                                                    "child_segments": [
                                                        {
                                                            "name": "HSD",
                                                            "min": 0,
                                                            "elements": [
                                                                {
                                                                    "name": "HSD01",
                                                                    "index": 1,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD02",
                                                                    "index": 2,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD03",
                                                                    "index": 3,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD04",
                                                                    "index": 4,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD05",
                                                                    "index": 5,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD06",
                                                                    "index": 6,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD07",
                                                                    "index": 7,
                                                                    "default": ""
                                                                },
                                                                {
                                                                    "name": "HSD08",
                                                                    "index": 8,
                                                                    "default": ""
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "name": "MSG",
                                                    "min": 0,
                                                    "elements": [
                                                        {
                                                            "name": "MSG01",
                                                            "index": 1,
                                                            "default": ""
                                                        },
                                                        {
                                                            "name": "MSG02",
                                                            "index": 2,
                                                            "default": ""
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                },
                {
                    "name": "SE",
                    "min": 0
                },
                {
                    "name": "IEA",
                    "min": 0
                }
            ]
        },
        "transform_declarations": {
            "FINAL_OUTPUT": {
                "object": {
                    "transaction_set_id1": {
                        "xpath": "ST/X12Form"
                    },
                    "transaction_set_id2": {
                        "xpath": "ST/TransactionSetControlNumber"
                    },
                    "transaction_set_id3": {
                        "xpath": "ST/ImplementationConventionReference"
                    },
                    "BHT01": {
                        "xpath": "BHT/BHT01"
                    },
                    "BHT04": {
                        "xpath": "BHT/BHT04"
                    },
                    "BHT05": {
                        "xpath": "BHT/BHT05"
                    },
                    "BHT06": {
                        "xpath": "BHT/BHT06"
                    },
                    "HL": {
                        "array": [
                            {
                                "xpath": "HL",
                                "object": {
                                    "HL1": {
                                        "xpath": "HL/HL1"
                                    },
                                    "HL4": {
                                        "xpath": "HL/HL4"
                                    },
                                    "TRN": {
                                        "xpath": "TRN/TRN00"
                                    },
                                    "TRN1": {
                                        "xpath": "TRN/TRN01"
                                    },
                                    "TRN2": {
                                        "xpath": "TRN/TRN02"
                                    },
                                    "NM1": {
                                        "xpath": "NM1/NM1101"
                                    },
                                    "NM2": {
                                        "xpath": "NM1/NM1102"
                                    },
                                    "NM3": {
                                        "xpath": "NM1/NM1103"
                                    },
                                    "NM8": {
                                        "xpath": "NM1/NM1108"
                                    },
                                    "NM9": {
                                        "xpath": "NM1/NM1109"
                                    },
                                    "N3": {
                                        "xpath": "N3/N301"
                                    },
                                    "N302": {
                                        "xpath": "N3/N302"
                                    },
                                    "N4": {
                                        "xpath": "N4/N401"
                                    },
                                    "N402": {
                                        "xpath": "N4/N402"
                                    },
                                    "N403": {
                                        "xpath": "N4/N403"
                                    },
                                    "DMG01": {
                                        "xpath": "DMG01/DMG01"
                                    },
                                    "DM02": {
                                        "xpath": "DMG02/DMG02"
                                    },
                                    "DMG03": {
                                        "xpath": "DMG03/DMG03"
                                    },
                                    "DMG04": {
                                        "xpath": "DMG04/DMG04"
                                    }
                                }
                            }
                        ]
                    },
                    "EB": {
                        "array": [
                            {
                                "xpath": "EB",
                                "object": {
                                    "EB": {
                                        "xpath": "EB/EB01"
                                    },
                                    "EB2": {
                                        "xpath": "EB/EB02"
                                    },
                                    "EB3": {
                                        "xpath": "EB/EB03"
                                    },
                                    "EB4": {
                                        "xpath": "EB/EB04"
                                    },
                                    "EB5": {
                                        "xpath": "EB/EB05"
                                    },
                                    "DTP1": {
                                        "xpath": "DTP/DTP01"
                                    },
                                    "DTP2": {
                                        "xpath": "DTP/DTP02"
                                    },
                                    "DTP3": {
                                        "xpath": "DTP/DTP03"
                                    },
                                    "HSD": {
                                        "array": [
                                            {
                                                "xpath": "HSD",
                                                "object": {
                                                    "HSD1": {
                                                        "xpath": "HSD/HSD01"
                                                    },
                                                    "HSD2": {
                                                        "xpath": "HSD/HSD02"
                                                    },
                                                    "HSD3": {
                                                        "xpath": "HSD/HSD03"
                                                    },
                                                    "HSD4": {
                                                        "xpath": "HSD/HSD04"
                                                    },
                                                    "HSD5": {
                                                        "xpath": "HSD/HSD05"
                                                    },
                                                    "HSD6": {
                                                        "xpath": "HSD/HSD06"
                                                    },
                                                    "HSD7": {
                                                        "xpath": "HSD/HSD07"
                                                    },
                                                    "HSD8": {
                                                        "xpath": "HSD/HSD08"
                                                    }
                                                }
                                            }
                                        ]
                                    },
                                    "MSG1": {
                                        "xpath": "MSG/MSG01"
                                    },
                                    "MSG2": {
                                        "xpath": "MSG/MSG02"
                                    },
                                    "LS": {
                                        "xpath": "LS/LS01"
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
    
    ISA*00* *00* *ZZ*CMS *ZZ*SUBMITTERID *171104*0734*^*00501*111111111*0*P*|~
    GS*HB*CMS*SUBMITTERID*20171104*07340000*1*X*005010X279A1~
    ST*271*0001*005010X279A1~
    BHT*0022*11*TRANSA*20171104*07342355~
    HL*1**20*1~
    NM1*PR*2*CMS*****PI*CMS~
    HL*2*1*21*1~
    NM1*1P*2*IRNAME*****XX*1234567893~
    HL*3*2*22*0~
    TRN*2*TRACKNUM*ABCDEFGHIJ~
    NM1*IL*1*LNAME*FNAME*M***MI*123456789A~
    N3*ADDRESSLINE1*ADDRESSLINE2~
    N4*CITY*ST*ZIPCODE~
    DMG*D8*19400401*F~
    DTP*307*RD8*20170101-20171204~
    EB*6**30~
    DTP*307*RD8*20170101-20170108~
    EB*I**41^54~
    EB*1**88~
    EB*1**30^10^42^45^48^49^69^76^83^A5^A7^AG^BT^BU^BV*MA~
    DTP*291*D8*20050401~
    EB*D**30*MA~
    DTP*292*RD8*20170116-20170120~
    EB*C**30*MA**26*1316~
    DTP*291*RD8*20170101-20171231~
    EB*C**30*MA**29*1316~
    DTP*291*RD8*20170101-20171231~
    EB*C**30*MA**29*0~
    DTP*291*RD8*20170116-20170120~
    EB*C**42^45*MA**26*0~
    DTP*292*RD8*20170101-20171231~
    EB*B**30*MA**26*0~
    HSD***DA**30*0~
    HSD***DA**31*60~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**30*MA**7*329~
    HSD***DA**30*60~
    HSD***DA**31*90~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**30*MA**26*0~
    HSD***DA**29*60~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**30*MA**7*329~
    HSD***DA**29*30~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**30*MA**26*0~
    HSD***DA**29*56~
    HSD*****26*1~
    DTP*435*RD8*20170116-20170120~
    EB*B**30*MA**7*329~
    HSD***DA**29*30~
    HSD*****26*1~
    DTP*435*RD8*20170116-20170120~
    EB*B**AG*MA**26*0~
    HSD***DA**30*0~
    HSD***DA**31*20~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**AG*MA**7*164.50~
    HSD***DA**30*20~
    HSD***DA**31*100~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**AG*MA**26*0~
    HSD***DA**29*20~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**AG*MA**7*164.50~
    HSD***DA**29*80~
    HSD*****26*1~
    DTP*435*RD8*20170101-20171231~
    EB*B**AG*MA**26*0~
    HSD***DA**29*16~
    HSD*****26*1~
    DTP*435*RD8*20170116-20170120~
    EB*B**AG*MA**7*164.50~
    HSD***DA**29*80~
    HSD*****26*1~
    DTP*435*RD8*20170116-20170120~
    EB*K**30*MA**32***DY*60~
    EB*K**30*MA**33***DY*58~
    EB*K**30*MA**7*658~
    DTP*435*RD8*20170101-20171231~
    EB*K**A7*MA**32***DY*190~
    EB*K**A7*MA**33***DY*180~
    EB*D**45*MA**26***99*1~
    EB*1**30^2^3^5^10^14^23^24^25^26^27^28^33^36^37^38^39^40^42^50^51^52^53^67^69^73^76^83^86^98^A4^A6^
    A8^AD^AE^AF^AI^AJ^AK^AL^BF^BG^BT^BU^BV^DM^UC*MB~
    DTP*291*D8*20050401~
    EB*C**30*MB**23*183~
    DTP*291*RD8*20170101-20171231~
    EB*C**30*MB**29*0~
    DTP*291*RD8*20170101-20171231~
    EB*A**30*MB**27**.2~
    DTP*291*RD8*20170101-20171231~
    EB*C**42^67^AJ*MB**23*0~
    DTP*292*RD8*20170101-20171231~
    EB*A**42^67^AJ*MB**27**0~
    DTP*292*RD8*20170101-20171231~
    EB*C***MB**23*0******HC|80061~
    DTP*292*D8*20171104~
    EB*A***MB**27**0*****HC|80061~
    DTP*292*D8*20171104~
    EB*D***MB*********HC|80061~
    DTP*348*D8*20130105~
    EB*D***MB*********HC|G0117~
    DTP*348*D8*20120107~
    EB*F**67*MB**22***VS*8~
    HSD*VS*6***29~
    EB*D**AD*MB***200~
    DTP*292*RD8*20170101-20171231~
    MSG*USED AMOUNT~
    EB*D**AE*MB***0~
    DTP*292*RD8*20170101-20171231~
    MSG*USED AMOUNT~
    EB*F**BF*MB**29***CA*72~
    MSG*Professional~
    EB*F**BF*MB**29***CA*72~
    MSG*Technical~
    EB*F**BG*MB*****99*0~
    MSG*Professional~
    EB*F**BG*MB*****99*0~
    MSG*Technical~
    EB*F**BG*MB*****99*15~
    MSG*Intensive Cardiac Rehabilitation – Professional~
    EB*F**BG*MB*****99*15~
    MSG*Intensive Cardiac Rehabilitation – Technical~
    EB*X**42***26~
    DTP*472*RD8*20161222-20170116~
    LS*2120~
    NM1*PR*2*ORGNAME*****PI*CONTR~
    NM1*1P*2******XX*1234567890~
    LE*2120~
    EB*X************HC|G0180~
    DTP*193*D8*20140101~
    EB*X************HC|G0179~
    DTP*193*D8*20140501~
    DTP*193*D8*20140301~
    EB*X**45*MA**26~
    DTP*292*RD8*20170201-20170301~
    MSG*Revocation Code – 1~
    LS*2120~
    NM1*1P*2******XX*1234567890~
    LE*2120~
    EB*D**14*MB~
    DTP*356*D8*20110601~
    DTP*096*D8*20130105~
    EB*E**10***23***DB*3~
    HSD*FL*2***29~
    DTP*292*RD8*20170101-20171231~
    EB*R**88*OT~
    REF*18*S0000 999~
    DTP*292*D8*20130101~
    LS*2120~
    NM1*PRP*2*ORGNAME~
    N3*ADDRESSLINE1*ADDRESSLINE2~
    N4*CITY*ST*ZIPCODE~
    PER*IC**TE*AAABBBCCCC*UR*www.website.com~
    LE*2120~
    EB*R**30*IN~
    REF*18*H0000 999~
    DTP*290*D8*20090101~
    MSG*MCO Bill Option Code- C~
    LS*2120~
    NM1*PRP*2*ORGNAME~
    N3*ADDRESSLINE1*ADDRESSLINE2~
    N4*CITY*ST*ZIPCODE~
    PER*IC**TE*AAABBBCCCC*UR*www.website.com~
    LE*2120~
    EB*R**30*13~
    REF*IG*GROUPCOVERAGEPLANPOLICYNUMBER~
    DTP*290*RD8*20110601-20170601~
    LS*2120~
    NM1*PRP*2*ORGNAME~
    N3*ADDRESSLINE1*ADDRESSLINE2~
    N4*CITY*ST*ZIPCODE~
    LE*2120~
    SE*181*0001~
    GE*1*1~
    IEA*1*111111111~
    
  • Debug mode

    Debug mode

    Hi,

    First off, thanks for this parser. Recently found out I needed to parse some EDI and this helped out, well, eventually. Being new to omniparser and EDI made the learning curve pretty much vertical.

    What didn't help was the nontrivial nonstandard message I needed to parse (it comes with a 102 page manual) Only after giving up and moving to a different library, giving up again and moving to a javascript library, days of trial and error and finally getting a vague grasp of EDI did I realize what I was doing wrong**. Came back to omniparser and managed to create a schema that could handle both test files I have.

    Anyway, what helped tremendously was adding these lines to the output:

    diff --git a/extensions/omniv21/fileformat/edi/seg.go b/extensions/omniv21/fileformat/edi/seg.go
    index cefc213..99a3833 100644
    --- a/extensions/omniv21/fileformat/edi/seg.go
    +++ b/extensions/omniv21/fileformat/edi/seg.go
    @@ -1,6 +1,8 @@
     package edi
    
     import (
    +       "fmt"
    +
            "github.com/jf-tech/go-corelib/maths"
     )
    
    @@ -95,8 +97,19 @@ func (d *segDecl) matchSegName(segName string) bool {
                    //    "...loop is optional, but if any segment in the loop is used, the first segment
                    //    within the loop becomes mandatory..."
                    //  - https://github.com/smooks/smooks-edi-cartridge/blob/54f97e89156114e13e1acd3b3c46fe9a4234918c/edi-sax/src/main/java/org/smooks/edi/edisax/model/internal/SegmentGroup.java#L68
    +               if len(d.Children) > 0 {
    +                       children := make([]string, len(d.Children))
    +                       for i, c := range d.Children {
    +                               children[i] = c.Name
    +                       }
    +                       fmt.Printf("group "+d.Name+" children %v\n", children)
    +               }
                    return len(d.Children) > 0 && d.Children[0].matchSegName(segName)
            default:
    +               fmt.Printf("node %s found: %v\n", d.fqdn, d.Name == segName)
    +               if d.Name != segName {
    +                       fmt.Printf("unexpected node %s \n", segName)
    +               }
                    return d.Name == segName
            }
     }
    

    It helped figuring out the last known 'good state', what the parser saw, where I was, etc. I don't expect you to add that exact code as its pretty ugly/messy. But I'd like to suggest adding some kind of verbose mode

    **I'm not sure, but I don't think any of the parsers out there handle EDI segment compression well. Was trying to strictly implement the specification I had, but had to loosen it up a bit.

Quick and simple parser for PFSense XML configuration files, good for auditing firewall rules

pfcfg-parser version 0.0.1 : 13 January 2022 A quick and simple parser for PFSense XML configuration files to generate a plain text file of the main c

Jan 13, 2022
Decode / encode XML to/from map[string]interface{} (or JSON); extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.

mxj - to/from maps, XML and JSON Decode/encode XML to/from map[string]interface{} (or JSON) values, and extract/modify values from maps by key or key-

Aug 7, 2022
Convert xml and json to go struct

xj2go The goal is to convert xml or json file to go struct file. Usage Download and install it: $ go get -u -v github.com/wk30/xj2go/cmd/... $ xj [-t

Jul 21, 2022
wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format.

wikipedia-jsonl wikipedia-jsonl is a CLI that converts Wikipedia dump XML to JSON Lines format. How to use At first, download the XML dump from Wikime

Feb 13, 2022
This package provides Go (golang) types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most famously used by app.diagrams.net, the new name of draw.io.

Go Draw - Golang MX This package provides types and helper functions to do some basic but useful things with mxGraph diagrams in XML, which is most fa

Nov 30, 2021
A simple json parser built using golang

jsonparser A simple json parser built using golang Installation: go get -u githu

Dec 29, 2021
xmlquery is Golang XPath package for XML query.

xmlquery Overview xmlquery is an XPath query package for XML documents, allowing you to extract data or evaluate from XML documents with an XPath expr

Aug 1, 2022
parse and generate XML easily in go

etree The etree package is a lightweight, pure go package that expresses XML in the form of an element tree. Its design was inspired by the Python Ele

Aug 7, 2022
'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.
'go test' runner with output optimized for humans, JUnit XML for CI integration, and a summary of the test results.

gotestsum gotestsum runs tests using go test --json, prints formatted test output, and a summary of the test run. It is designed to work well for both

Aug 8, 2022
Go XML sitemap and sitemap index generator

Install go get github.com/turk/go-sitemap Example for sitemapindex func () main(c *gin.Context) { s := sitemap.NewSitemapIndex(c.Writer, true)

Jun 29, 2022
Sqly - An easy-to-use extension for sqlx, base on xml files and named query/exec

sqly An easy-to-use extension for sqlx ,base on xml files and named query/exec t

Jun 12, 2022
Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec)

go_policyExtractor Extraction politique de conformité : xlsx (fichier de suivi) -> xml (format AlgoSec). Le programme suivant se base sur les intitulé

Nov 4, 2021
axmlfmt is an opinionated formatter for Android XML resources

axmlfmt axmlfmt is an opinionated formatter for Android XML resources. It takes XML that looks like <?xml version="1.0" encoding="utf-8"?> <LinearLayo

May 14, 2022
🧑‍💻 Go XML generator without Structs™

exml ??‍?? Go XML generator without Structs™ Package exml allows XML documents to be generated without the usage of structs or maps. It is not intende

May 16, 2022
A fast, easy-of-use and dependency free custom mapping from .csv data into Golang structs

csvparser This package provides a fast and easy-of-use custom mapping from .csv data into Golang structs. Index Pre-requisites Installation Examples C

May 10, 2022
Your CSV pocket-knife (golang)

csvutil - Your CSV pocket-knife (golang) #WARNING I would advise against using this package. It was a language learning exercise from a time before "e

Feb 7, 2022
:book: A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

prose prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech

Aug 1, 2022
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

csvplus Package csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream processing operations, indices and joins. The

Apr 9, 2022
A general purpose application and library for aligning text.

align A general purpose application that aligns text The focus of this application is to provide a fast, efficient, and useful tool for aligning text.

Jul 31, 2022