Feature #3910

Export routine (MARC) for Archives Canada

Added by Tim Hutchinson over 10 years ago. Updated almost 6 years ago.

Status:NewStart date:
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-Estimated time:40.00 hours
Target version:-
Sponsored:No Tested version:

Description

This is required for any provincial networks using ICA-AtoM. Currently the "top level" descriptions are submitted to Archives Canada, i.e. fonds-level; or series-level for government records using the series system, e.g. Ontario and Manitoba

[g] Legacy categories: Import/Export

CAIN-marc21-template-REVISION2-2.doc (66 KB) Tim Hutchinson, 12/01/2012 04:20 AM


Related issues

Duplicated by Access to Memory (AtoM) - Bug #2631: MARC XML export Duplicate

History

#1 Updated by Tim Hutchinson over 10 years ago

David and I discussed this in December 2010:

E-mail 10 Dec 2010, Tim to David:

I wanted to touch base about some work I'm doing on a batch export routine to send provincial data to Archives Canada / CAIN. Since this will be potentially useful for other provinces using ica-atom for their provincial networks, I want to make sure anything I come up with can be integrated/adapted easily enough, and that I'm not going off on the wrong path...

Currently, Archives Canada requires traditional MARC 21 (as opposed to MARC XML); and it's a complete file every time (no incremental updates). We are currently able to provide fonds-level descriptions or series-level descriptions where the series is the highest level.

What I've done so far in terms of ica-atom programming is extend the sfRadPlugin module to allow export as marc21, e.g.:
http://saindev.usask.ca/ica-atom-fonds/index.php/justice-and-webb-landscape-architects-fonds;rad?sf_format=marc21
(view the source to see the correct formatting; but not all the fields are there yet)
I had to add an empty file apps/qubit/templates/layout.marc21.php to make this work, so I may not have done that correctly.

So my first question is: should this in fact be a separate plugin? At the very least I was thinking the action should be called something like cainmarc rather than marc21, so that it's clear that this is specifically for CAIN, i.e. following those tagging guidelines.

I then tried to loop through the database to generate a consolidated export file. Not surprisingly, I ran out of memory. While it might be nice to have this functionality built into ica-atom, since an export is required fairly regularly (we currently transfer a file, via crontab, once a month), I'm thinking that depending on a large amount of memory/execution time is best to be avoided. And there is not really a need to preview this file via the interface; it just needs to be ftp'd to LAC periodically. Instead, I turned to my new favourite tool, curl...

To generate the consolidated export file, I'm using a perl script to query the database and loop through top-level descriptions (where parent_id=1), and then do a curl call to the relevant url. This routine seems to work, but I'm wondering if you have any thoughts about a better approach, especially for longer-term integration/flexibility.

#2 Updated by Tim Hutchinson over 10 years ago

Hide
Permalink Edit Delete
Hutchinson, Timothy added a comment - 10/Feb/11 2:05 PM
Response from David, 15 Dec 2010:

Yes, it would be best to use a separate plugin for MARC as we are doing for the Dublin core (plugins/sfDcPlugin) or EAD (plugins/sfEadPlugin) templates, or any of our other meta-data standards based templates. EAD is probably the closest analogue to MARC 21 in that it is a text-based representation of an archival description, though the content is very different.

I don't know MARC 21 at all well, but my intuition is that it would be best to use one "sfMarcPlugin" for all MARC implementations (including XML) and then use a different template file (e.g. plugins/sfMarcPlugin/module/template/cain.marc21.php) to differentiate between the different standards. The sfDcPlugin provides one example of how to handle displaying the same standard (DC) in two different contexts (HTML and XML).

As for your CURL implementation and how to make it more integrated with the system, I would use the symfony task system [1]. Unfortunately the symfony documentation is not up-to-date with symfony 1.4, but most of the documentation for 1.2 is still relevant. We use tasks for migrating data (lib/task/migrate) as well as the built in symfony tasks for data-load from a yaml file, building table schema from the config files, etc [2] (Note: many of the "generate" tasks and form builders do NOT work with ICA-AtoM due to changes we've made to the framework).

The advantages I see to using the symfony task interface are:
  • Avoid web application overhead while allowing direct access to the ORM and other assets
  • One access/failure point as opposed to using PERL + CURL + PHP
  • No problems with client/server timeouts that may occur with the HTTP transport, so scripts can run for many hours (we've had data-loads run for days)
  • A robust and simple interface for run-time configuration (e.g. --turn-on-function)
  • Can write an export file directly or through stdout
  • Is CLI, so it can be called via crontab

The main hurdle with the using tasks for your export is you may still have memory problems due to the way the Propel ORM works (specifically using circular references) and how PHP does garbage collection [3]. However, version 5.3 of PHP has an "experimental" new garbage collector [4] that is supposed to work for cleaning up circular referenced objects. We haven't had a chance to try out the new garbage collector yet, but you may want to give it a try.

[1] => http://www.symfony-project.org/cookbook/1_2/en/cli
[2] => http://www.symfony-project.org/reference/1_4/en/16-Tasks
[3] => http://bugs.php.net/bug.php?id=33595
[4] => http://ca.php.net/gc_enable

#3 Updated by Tim Hutchinson over 10 years ago

And here (attached) is the current CAIN-MARC mapping.

There are some more technical details based on the current SK/MB export that we'll be able to provide.

#6 Updated by David Juhasz about 10 years ago

  • Priority changed from Medium to High
  • Target version set to Release 1.2

Planned for Relase 1.2

[g] Labels added: Priority-High, Component-Import-Export, Milestone-Release-1.2
[g] Labels removed: Priority-Medium
[g] New owner: MJ Suhonos

#7 Updated by David Juhasz about 10 years ago

  • Priority set to Medium

[g] Labels added: Priority-Medium

#8 Updated by David Juhasz over 9 years ago

  • Target version set to Release 1.3

Roll over to Release 1.3

[g] Labels added: Milestone-Release-1.3

#9 Updated by MJ Suhonos over 9 years ago

  • Target version changed from Release 1.3 to Release 2.1.0

Roll over to ArchivesCanada 2.0 deployment.

[g] Labels added: Milestone-Release-2.0
[g] Labels removed: Milestone-Release-1.3

#10 Updated by MJ Suhonos over 9 years ago

  • Priority changed from Medium to Critical

[g] Labels added: Priority-Critical
[g] Labels removed: Priority-Medium

#11 Updated by Jesús García Crespo about 9 years ago

[g] New owner: David Juhasz

#12 Updated by David Juhasz about 9 years ago

Reassign to new account.

[g] New owner: David Juhasz

#13 Updated by Jesús García Crespo over 8 years ago

  • Priority changed from Critical to High

#14 Updated by David Juhasz over 8 years ago

  • Tracker changed from Bug to Feature

#15 Updated by David Juhasz over 8 years ago

  • Category set to Import/Export
  • Estimated time set to 40.00
  • Sponsored set to No

#16 Updated by David Juhasz over 8 years ago

  • Assignee changed from David Juhasz to Jesús García Crespo

#17 Updated by Dan Gillean almost 7 years ago

  • Target version changed from Release 2.1.0 to Release 2.2.0

#18 Updated by Dan Gillean almost 7 years ago

  • Target version deleted (Release 2.2.0)

Note: Now that ArchivesCanada will be using AtoM, the urgency to create a MARC21 export routine for AtoM is signicantly less. If anything, I might recommend that we create a MARCXML one rather than MARC21. Removing milestone for now - as this did not happen as part of AC development, I think that it is a major enough feature that we would likely need development sponsorship to be able to undertake it.

#19 Updated by Dan Gillean about 6 years ago

  • Project changed from Access to Memory (AtoM) to AtoM Wishlist
  • Category deleted (Import/Export)

Moved to AtoM wishlist until sponsored for inclusion.

#20 Updated by Jesús García Crespo almost 6 years ago

  • Assignee deleted (Jesús García Crespo)

Also available in: Atom PDF