Skip to content

Specify PICA serialization XML format #9

@nichtich

Description

@nichtich

Well, you created yet another PICA+ serialization format, so I would like to add its documentation to http://format.gbv.de/pica and support it in PICA::Data (see gbv/PICA-Data#83).

As far as I understand the script, PICA+ records are first transformed to XML with scripts/pica2xml.pl. There are examples of this XML format in scripts/test and in test. As far as I could analyze it, the format includes

  • root element collection with (optional?) attribute count
    • repeatable element record
      • element header with mandatory attribute status, having one of the values deleted or upsert
        • element identifier with the PPN
      • element metadata
        • repeatable element datafield with attributes tag, fulltag (mandatory) and occurrence (optional)
          • element subfield with mandatory attribute code
        • repeatable optional element item with mandatory attribute epn

Some files use a slightly different form

  • root element collection with (optional?) attribute count
    • repeatable element record
      • element status having one of the values deleted or upsert
      • element hrid with the PPN
      • element metadata
        • repeatable element datafield with attributes tag, fulltag (mandatory) and occurrence (optional)
          • element subfield with mandatory attribute code
        • repeatable optional element item with mandatory attribute epn
    • optional (?) element rawrecord with full record (syntax of this is another issue)

Questions:

  • why not PPXML or an extension (well I guess it's too late now)
  • why two variants? could both at least be consolidated?
  • what happens when a record contains multiple level 1 records? can datafield and item be mixed or is the format limited to one ILN?
  • why are x-occurrences not included in fulltag (e.g. "209Ax00/01" for field 209Ax/01 with $x=00). For some fields on level 2 subfield $x is crucial to distinguish the meaning of the field, see formal specification at https://format.gbv.de/schema/avram/specification#field-identifier
  • last but not least: what would be a proper name for the format? How about PICA Import XML (PIXML)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions