Write METS metadata to database
|Category:||Performance / scalability|
|Target version:||Release 2.4.0|
|Google Code Legacy ID:||Tested version:|
To avoid keeping the METS files from DIP uploads in the AtoM folder and parsing them each time the search index is populated AtoM will parse the METS file once on the DIP upload process and it will save the data into the database.This changes will include a task to parse and write to the DB the metadata from the METS files of already uploaded DIPs.
#6 Updated by Dan Gillean over 5 years ago
- Status changed from QA/Review to Feedback
- Assignee changed from Dan Gillean to José Raddaoui Marín
I wasn't testing this, but I think this work just borked my VM.
I was trying to load some test data into 2.4. First I purged, then ran SQL-upgrade task, cc'ed, restarted services, populated search index. Then I tried to import the same corpus of EAD XML data I've been using for testing for a while. The import went fine, but when I tried to re-index afterwards, the search index repopulation hit an error and aborted:
PHP Fatal error: Call to undefined method arElasticSearchInformationObjectPdo::getMetsData() in /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObjectPdo.class.php on line 1290 Fatal error: Call to undefined method arElasticSearchInformationObjectPdo::getMetsData() in /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObjectPdo.class.php on line 1290
#10 Updated by José Raddaoui Marín almost 5 years ago
Notes for testing:
- Create AtoM stable/2.3.x instance with some inf. objects. The SWORD plugin and the AtoM worker must be enabled.
- Perform some DIP uploads to that instance (script and DIP to simulate it without AM in #11276, but more DIPs should be tested, the tester can ask me for some).
- Check the METS metadata in the ES index (query directly to ES or use ES head, Kibana or similar ES plugins to browse the index).
- Check that the METS files are stored in the AtoM uploads directory.
- Upgrade that instance to qa/2.4.x, there is a new task needed between the upgrade SQl and sear populate task:
- php symfony tools:upgrade-sql
- php symfony tools:mets2db
- php symfony search:populate
- Check that all the METS metadata is still in the ES index and in the database (premis_object table and IO properties with "premisData" scope).
- The METS files from the uploads directory could be removed if everything went well.
Let me know if you need more info.
#11 Updated by José Raddaoui Marín almost 5 years ago
Notes for documentation:
In the 2.4 upgrade docs we need to add a note/section for instances where DIPs were uploaded. In this section we could say that if they are not sure about it, they could check the uploads directory and, if there is an "aips" folder inside, they most likely uploaded DIPs to that instance. This section should be between the upgrade SQL and the rebuild the search index tasks and it must tell to execute the "php symfony tools:mets2db" task.
#12 Updated by José Raddaoui Marín almost 5 years ago
Actually, there is not a problem if you run the new task without having uploaded DIPs, so we could just add it under the "php symfony tools:upgrade-sql" execution. Or try to include the new task inside the upgrade SQL, but that will require some extra development.
#15 Updated by Nick Wilkinson almost 5 years ago
- Status changed from Feedback to QA/Review
- Assignee changed from Nick Wilkinson to José Raddaoui Marín
Hi Radda, if you haven't already, can you run through the test procedure you outlined on this issue to make sure it behaves as expected? It would be helpful for Dan if you could take screenshots (and notes) along the way that you think would be helpful for documentation. Otherwise, let me know if you've already run through the test process.
#16 Updated by José Raddaoui Marín almost 5 years ago
- Status changed from QA/Review to Verified
- Requires documentation changed from Yes to No
In the end, METS files were not being saved in the uploads folder in previous public AtoM versions and its metadata was not being added to the ES index, therefore this task is not needed for upgrades to 2.4.x. I've removed it from the public qa/2.4.x branch and no documentation is needed either.
I've tested that everything else is working in qa/2.4.x, where the METS metadata is being added to the database and the Elasticsearch index.