Task #10250

Write METS metadata to database

Added by José Raddaoui Marín almost 4 years ago. Updated almost 3 years ago.

Status:VerifiedStart date:08/31/2016
Priority:MediumDue date:
Assignee:-% Done:

0%

Category:Performance / scalability
Target version:Release 2.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:Yes

Description

To avoid keeping the METS files from DIP uploads in the AtoM folder and parsing them each time the search index is populated AtoM will parse the METS file once on the DIP upload process and it will save the data into the database.This changes will include a task to parse and write to the DB the metadata from the METS files of already uploaded DIPs.


Related issues

Related to Access to Memory (AtoM) - Bug #13381: Elasticsearch error adding [metsData] to AIP document New 07/08/2020

History

#1 Updated by José Raddaoui Marín almost 4 years ago

  • Status changed from In progress to Code Review
  • Assignee changed from José Raddaoui Marín to Nick Wilkinson

#2 Updated by Nick Wilkinson almost 4 years ago

  • Assignee changed from Nick Wilkinson to Mike Gale

#3 Updated by Nick Wilkinson almost 4 years ago

  • Assignee changed from Mike Gale to Jesús García Crespo

#4 Updated by Jesús García Crespo almost 4 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Jesús García Crespo to José Raddaoui Marín

#5 Updated by José Raddaoui Marín almost 4 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean
  • Requires documentation set to Yes

Merged in qa/2.4.x. I'll add some notes for testing and the documentation later.

#6 Updated by Dan Gillean almost 4 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to José Raddaoui Marín

I wasn't testing this, but I think this work just borked my VM.

I was trying to load some test data into 2.4. First I purged, then ran SQL-upgrade task, cc'ed, restarted services, populated search index. Then I tried to import the same corpus of EAD XML data I've been using for testing for a while. The import went fine, but when I tried to re-index afterwards, the search index repopulation hit an error and aborted:

PHP Fatal error:  Call to undefined method arElasticSearchInformationObjectPdo::getMetsData() in /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObjectPdo.class.php on line 1290

Fatal error: Call to undefined method arElasticSearchInformationObjectPdo::getMetsData() in /usr/share/nginx/atom/plugins/arElasticSearchPlugin/lib/model/arElasticSearchInformationObjectPdo.class.php on line 1290

#7 Updated by José Raddaoui Marín almost 4 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Sorry, my bad. It should be fixed now.

#8 Updated by Dan Gillean about 3 years ago

  • Assignee deleted (Dan Gillean)

#9 Updated by José Raddaoui Marín almost 3 years ago

  • Assignee set to Nick Wilkinson

Hi Nick, this is going to be hard to test as we don't display this metadata in any part of the AtoM GUI yet. I think it will be better if a developer reviews it.

#10 Updated by José Raddaoui Marín almost 3 years ago

Notes for testing:

- Create AtoM stable/2.3.x instance with some inf. objects. The SWORD plugin and the AtoM worker must be enabled.
- Perform some DIP uploads to that instance (script and DIP to simulate it without AM in #11276, but more DIPs should be tested, the tester can ask me for some).
- Check the METS metadata in the ES index (query directly to ES or use ES head, Kibana or similar ES plugins to browse the index).
- Check that the METS files are stored in the AtoM uploads directory.
- Upgrade that instance to qa/2.4.x, there is a new task needed between the upgrade SQl and sear populate task:

- php symfony tools:upgrade-sql
- php symfony tools:mets2db
- php symfony search:populate

- Check that all the METS metadata is still in the ES index and in the database (premis_object table and IO properties with "premisData" scope).
- The METS files from the uploads directory could be removed if everything went well.

Let me know if you need more info.

#11 Updated by José Raddaoui Marín almost 3 years ago

Notes for documentation:

In the 2.4 upgrade docs we need to add a note/section for instances where DIPs were uploaded. In this section we could say that if they are not sure about it, they could check the uploads directory and, if there is an "aips" folder inside, they most likely uploaded DIPs to that instance. This section should be between the upgrade SQL and the rebuild the search index tasks and it must tell to execute the "php symfony tools:mets2db" task.

#12 Updated by José Raddaoui Marín almost 3 years ago

Actually, there is not a problem if you run the new task without having uploaded DIPs, so we could just add it under the "php symfony tools:upgrade-sql" execution. Or try to include the new task inside the upgrade SQL, but that will require some extra development.

#13 Updated by Nick Wilkinson almost 3 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Nick Wilkinson to Dan Gillean

Hi Dan, what are your thoughts on this (starting from Radda's comment # 9)?

#14 Updated by Dan Gillean almost 3 years ago

  • Assignee changed from Dan Gillean to Nick Wilkinson

yes, i'd love it if a dev could test this. should be someone familiar with Archivematica as well as AtoM.

#15 Updated by Nick Wilkinson almost 3 years ago

  • Status changed from Feedback to QA/Review
  • Assignee changed from Nick Wilkinson to José Raddaoui Marín

Hi Radda, if you haven't already, can you run through the test procedure you outlined on this issue to make sure it behaves as expected? It would be helpful for Dan if you could take screenshots (and notes) along the way that you think would be helpful for documentation. Otherwise, let me know if you've already run through the test process.

#16 Updated by José Raddaoui Marín almost 3 years ago

  • Status changed from QA/Review to Verified
  • Requires documentation changed from Yes to No

In the end, METS files were not being saved in the uploads folder in previous public AtoM versions and its metadata was not being added to the ES index, therefore this task is not needed for upgrades to 2.4.x. I've removed it from the public qa/2.4.x branch and no documentation is needed either.

I've tested that everything else is working in qa/2.4.x, where the METS metadata is being added to the database and the Elasticsearch index.

#17 Updated by Nick Wilkinson almost 3 years ago

  • Assignee deleted (José Raddaoui Marín)
  • Requires documentation changed from No to Yes

#18 Updated by David Juhasz 6 days ago

  • Related to Bug #13381: Elasticsearch error adding [metsData] to AIP document added

Also available in: Atom PDF