Merge imported files to existing records
|Category:||-||Estimated time:||100.00 hours|
This is a new feature that would take a XML import file and see whether the
record(s) it contains are already in the dbase. If so, it assumes that the
XML import contains updates/changes that must be merged to the existing
record(s) rather than saving the imported file as a new record(s). This new
feature is related to the ability to save previous versions of a record
(see #2861). The newly merged record would be saved as the next
version of the record so that the import merging can be reviewed by an
[g] Legacy categories: Import/Export
#1 Updated by David Juhasz almost 11 years ago
- Priority changed from High to Low
How can you determine whether a record is already in the database? Is there some unique field that can guarantee this, or would the user need to manually compare values?
[g] Labels added: Priority-Low, Milestone-Release-Post-1.2
[g] Labels removed: Priority-High, Milestone-Release-1.1
#3 Updated by Anonymous over 10 years ago
Two example use cases we need at Ghent University:
I have an archival description and want to add items. I export the EAD file. This file contains identifiers to reingest the EAD and update the record.
I have an archival description with an unique identifier. A programmer can prepare a batch of EAD files using the unique identifier. The programmer issues a batch upload. The items are added to the archival description using the unique identifier.
Both use cases require no manual input by a user.
#6 Updated by Anonymous about 9 years ago
- Target version changed from Release 1.3 to Release 2.1.0
In EAD we will merge if we find <arch desc>ID matches a description identifier. Otherwise we will create a new record.
[CCAD-34: Routine maintenance of imports]
[g] Labels added: Milestone-Release-2.0
[g] Labels removed: Component-Versioning, Milestone-Release-1.3
[g] New owner: David Juhasz
#7 Updated by Tim Hutchinson about 9 years ago
I'm glad to see this is being addressed but I'd like to suggest a refinement:
- on the EAD side, I'd use <eadid identifier="xxx">. <archdesc id="xxx"> should really map to the main identifier. This field (identifier) is also not used consistently at least in Canadian practice, since it's not in RAD - so it's not necessarily stable, if it's populated at all.
- ideally, I think the field used in ICA-AtoM shouldn't be editable (e.g. a new field sourceID or something). But if this is used for periodic feeds (e.g. from provincial networks into Archives Canada), maybe that's not a critical issue.
This way, <eadid identifier=xxx> can be used for a unique/system identifier if there is one (or an appropriate combination of fields generated by the contributing system); and then <archdesc id=xxx> can be used for the information actually needed for the end user, which could be edited if necessary.
#8 Updated by Tim Hutchinson about 9 years ago
I just realized my explanation above is partly inaccurate - I was mixing up <archdesc id="xxx"> and <did><unitid>. <unitid> is the element that maps to the main identifier. However, I would still argue that <eadid identifier="xxx"> is a better location for the unique identifier to be used for matching and merging. <archdesc id="xxx"> is in fact intended to be used as a linking attribute, and technically it only needs to be unique within a given EAD instance.
#13 Updated by Tim Hutchinson over 7 years ago
I was just noticing that the opening description links to the wrong issue, i.e. it uses the qubit number in the current system. The correct issue (re saving previous versions of records) is #2861.
That said, I don't think #2861 is necessarily a prerequisite for the current issue. For any import, you're going to need to test carefully before doing it in production, so I'm not sure having the ability to revert to an earlier record is a deal breaker.
It would be great to see this one developed, but clearly it's not simple :)