Export routine (MARC) for Archives Canada
|Category:||-||Estimated time:||40.00 hours|
This is required for any provincial networks using ICA-AtoM. Currently the "top level" descriptions are submitted to Archives Canada, i.e. fonds-level; or series-level for government records using the series system, e.g. Ontario and Manitoba
[g] Legacy categories: Import/Export
#1 Updated by Tim Hutchinson over 10 years ago
David and I discussed this in December 2010:
E-mail 10 Dec 2010, Tim to David:
I wanted to touch base about some work I'm doing on a batch export routine to send provincial data to Archives Canada / CAIN. Since this will be potentially useful for other provinces using ica-atom for their provincial networks, I want to make sure anything I come up with can be integrated/adapted easily enough, and that I'm not going off on the wrong path...
Currently, Archives Canada requires traditional MARC 21 (as opposed to MARC XML); and it's a complete file every time (no incremental updates). We are currently able to provide fonds-level descriptions or series-level descriptions where the series is the highest level.
What I've done so far in terms of ica-atom programming is extend the sfRadPlugin module to allow export as marc21, e.g.:
(view the source to see the correct formatting; but not all the fields are there yet)
I had to add an empty file apps/qubit/templates/layout.marc21.php to make this work, so I may not have done that correctly.
So my first question is: should this in fact be a separate plugin? At the very least I was thinking the action should be called something like cainmarc rather than marc21, so that it's clear that this is specifically for CAIN, i.e. following those tagging guidelines.
I then tried to loop through the database to generate a consolidated export file. Not surprisingly, I ran out of memory. While it might be nice to have this functionality built into ica-atom, since an export is required fairly regularly (we currently transfer a file, via crontab, once a month), I'm thinking that depending on a large amount of memory/execution time is best to be avoided. And there is not really a need to preview this file via the interface; it just needs to be ftp'd to LAC periodically. Instead, I turned to my new favourite tool, curl...
To generate the consolidated export file, I'm using a perl script to query the database and loop through top-level descriptions (where parent_id=1), and then do a curl call to the relevant url. This routine seems to work, but I'm wondering if you have any thoughts about a better approach, especially for longer-term integration/flexibility.
#2 Updated by Tim Hutchinson over 10 years ago
Permalink Edit Delete
Hutchinson, Timothy added a comment - 10/Feb/11 2:05 PM
Response from David, 15 Dec 2010:
Yes, it would be best to use a separate plugin for MARC as we are doing for the Dublin core (plugins/sfDcPlugin) or EAD (plugins/sfEadPlugin) templates, or any of our other meta-data standards based templates. EAD is probably the closest analogue to MARC 21 in that it is a text-based representation of an archival description, though the content is very different.
I don't know MARC 21 at all well, but my intuition is that it would be best to use one "sfMarcPlugin" for all MARC implementations (including XML) and then use a different template file (e.g. plugins/sfMarcPlugin/module/template/cain.marc21.php) to differentiate between the different standards. The sfDcPlugin provides one example of how to handle displaying the same standard (DC) in two different contexts (HTML and XML).
As for your CURL implementation and how to make it more integrated with the system, I would use the symfony task system . Unfortunately the symfony documentation is not up-to-date with symfony 1.4, but most of the documentation for 1.2 is still relevant. We use tasks for migrating data (lib/task/migrate) as well as the built in symfony tasks for data-load from a yaml file, building table schema from the config files, etc  (Note: many of the "generate" tasks and form builders do NOT work with ICA-AtoM due to changes we've made to the framework).The advantages I see to using the symfony task interface are:
- Avoid web application overhead while allowing direct access to the ORM and other assets
- One access/failure point as opposed to using PERL + CURL + PHP
- No problems with client/server timeouts that may occur with the HTTP transport, so scripts can run for many hours (we've had data-loads run for days)
- A robust and simple interface for run-time configuration (e.g. --turn-on-function)
- Can write an export file directly or through stdout
- Is CLI, so it can be called via crontab
The main hurdle with the using tasks for your export is you may still have memory problems due to the way the Propel ORM works (specifically using circular references) and how PHP does garbage collection . However, version 5.3 of PHP has an "experimental" new garbage collector  that is supposed to work for cleaning up circular referenced objects. We haven't had a chance to try out the new garbage collector yet, but you may want to give it a try.
#4 Updated by Tim Hutchinson about 10 years ago
#18 Updated by Dan Gillean almost 7 years ago
- Target version deleted (
Note: Now that ArchivesCanada will be using AtoM, the urgency to create a MARC21 export routine for AtoM is signicantly less. If anything, I might recommend that we create a MARCXML one rather than MARC21. Removing milestone for now - as this did not happen as part of AC development, I think that it is a major enough feature that we would likely need development sponsorship to be able to undertake it.