Bug #11197

XML representation caching for EAD: drafts are getting cached

Added by Mike Cantelon about 3 years ago. Updated almost 3 years ago.

Status:VerifiedStart date:05/25/2017
Priority:MediumDue date:
Assignee:Mike Cantelon% Done:

0%

Category:XML import / export
Target version:Release 2.4.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

Description

We've added functionality to cache EAD XML upon saving a description, but the EAD XML getting cached includes drafts.

OAI relies on EAD XML cache files. There should, at the least, be a "public" EAD XML cache file created for unauthenticated OAI requests for EAD. As for authenticated OAI requests, we can either use the existing EAD XML cache files that include drafts or we can generate EAD dynamically (which could have performance issues, but given users can have different permissions rendering of OAI could possibly vary per user).


Related issues

Related to Access to Memory (AtoM) - Bug #11191: XML representation caching for EAD: changing a child reco... Feedback 05/25/2017
Related to Access to Memory (AtoM) - Bug #12998: Cached XML is not being used for EAD downloads when expected Verified 05/03/2019

History

#1 Updated by Dan Gillean about 3 years ago

  • Related to Bug #11191: XML representation caching for EAD: changing a child record to draft deletes cached EAD XML even if the parent is published added

#2 Updated by Dan Gillean about 3 years ago

Hi Mike,

Again, I wonder if using the --public option might solve both use cases? Take a look at how we've implemented the setting for "Generate finding aid as a public user" - go to Admin > Settings > Finding aid. Essentially, clicking "yes" means that the EAD used to generate the finding aid first uses the --public option to exclude drafts. We could do something similar here - add a boolean setting for whether or not admins want the cached XML to be generated for all descriptions, or generated as a public user...?

#3 Updated by Mike Cantelon about 3 years ago

Hi Dan... yeah, I figured out how to export a public version (thanks for pointing out that bulk option), so that's doable, but that leads to some observations/questions:

1) Authenticated OAI requests should probably include drafts... if so, then we're going to end up with 4 cache files per EAD description tree which means more rendering time/disk usage:

  • EAD authenticated
  • EAD authenticated (XML content only, no XML wrapper)
  • EAD public
  • EAD public (XML content only, no XML wraper

2) Authenticated users might not have the same permissions so it seems possible that EAD could render differently depending on who's authenticated (due to ACLs). Do we need to accommodate this level of complexity or just assume that EAD's going to render pretty much the same for all authenticated users?

#4 Updated by Dan Gillean about 3 years ago

Hi Mike,

I want to keep this as simple as possible, and having 4 different cached versions of the EAD sounds like overkill to me - and will mean the job scheduler will be running almost constantly.

My suggestion for a setting is to try to avoid this - there are definitely use cases where users who use the OAI authentication will still only want the harvester to expose public records. It is a much rarer case, but there may also be reasons to want drafts in the OAI.

If we add a global setting, then we just store 1 file - with or without drafts. It is up to the user to decide which version they want. By default we can make the setting turned on. Custom user permissions can then be ignored - if you are authenticated, you have access to the EAD - whatever version of the EAD the admin has decided to make available. We can just make sure this is well documented. I think 95% of the time people are just going to want the --public version. The main reason you might want the versions with drafts will be if you are migrating out of AtoM.

re: different versions - so... if we have the EAD pre-generated, then it is not possible to add the XML wrapper for the OAI response on the fly, so we don't need to 2 different versions cached? Not sure if this is possible, or how different it is from what is currently implemented.

In any case, let's talk to David before we go in on this work or decide how it related to the current sponsored project.

#5 Updated by Mike Cantelon about 3 years ago

Global setting sounds good. Will wait on Dave's take.

re: different versions - so... if we have the EAD pre-generated, then it is not possible to add the XML wrapper for the OAI response on the fly, so we don't need to 2 different versions cached? Not sure if this is possible, or how different it is from what is currently implemented.

Right now we just have 2 versions of the EAD: full and with XML wrapper removed. This would say the same if we implement your global setting solution.

#6 Updated by David Juhasz about 3 years ago

I can't see any use case for exposing Draft records via OAI. The whole point of OAI is syndication, my assumption is that means public records only.

#7 Updated by Mike Cantelon about 3 years ago

Only use I could see for syndicating drafts was to allow another institution to see what another's working on, but hasn't published. So, yeah, pretty edge case.

I'll change EAD caching so drafts aren't included.

#9 Updated by Mike Cantelon about 3 years ago

  • Status changed from New to Code Review
  • Assignee changed from Mike Cantelon to Nick Wilkinson

#10 Updated by Nick Wilkinson about 3 years ago

  • Assignee changed from Nick Wilkinson to Mike Gale

Hi Mike, assigning to you for CR.

#11 Updated by Mike Gale about 3 years ago

  • Status changed from Code Review to Feedback
  • Assignee changed from Mike Gale to Mike Cantelon

Looks good to me Mike. Merge that beast.

#12 Updated by Dan Gillean almost 3 years ago

  • Status changed from Feedback to Verified
  • Target version set to Release 2.4.0

#13 Updated by Dan Gillean 9 months ago

  • Related to Bug #12998: Cached XML is not being used for EAD downloads when expected added

Also available in: Atom PDF