Bug #8992

CSV export: --single-slug option should export child records

Added by Dan Gillean over 5 years ago. Updated about 5 years ago.

Status:VerifiedStart date:09/23/2015
Priority:MediumDue date:
Assignee:Mike Gale% Done:

0%

Category:CSV export
Target version:Release 2.3.0
Google Code Legacy ID: Tested version:2.3
Sponsored:No Requires documentation:Yes

Description

The CSV export CLI task added in 2.3 has a number of options, including (descriptions from the task help text):

  • --single-slug: Exprt a single fonds or collection based on slug
  • --public: Do not export draft physical locations or child descriptions
  • --current-level-only: Do not export child descriptions of exported items

For the --single-slug option to be most useful, it should be possible to export one specific fonds/collection, plus all of its children, using just this option. Because of the other 2 options listed, users can still refine the task in further ways to achieve the same effects. Adding --public would export all published children, but exclude draft records; adding --current-level-only would only export the top-level description for that fonds/collection.

Right now, however, the --single-slug option ALWAYS limits the export to only the current level. This makes it a less useful option than it could otherwise be; and if a user wants to export just a single hierarchy, then they must know enough SQL to construct a query using the --criteria option - many archival users will not know how to do this.

To reproduce
  • Copy the slug of an existing fonds with many children
  • Run the CSV export command, using the --single-slug option
  • Look at the resulting CSV
Error encountered
  • No child records included in CSV export
Expected result
  • CSV export includes child records
  • User can add "--current-level-only" if only the top-level is desired

Related issues

Related to Access to Memory (AtoM) - Bug #8993: CSV export: --public option does not exclude physical sto... Verified 09/23/2015
Related to Access to Memory (AtoM) - Bug #9560: Add ability to export EAD, with children, when single slu... Verified 03/14/2016

History

#1 Updated by Dan Gillean over 5 years ago

  • Related to Bug #8993: CSV export: --public option does not exclude physical storage information, even if hidden via visible elements added

#2 Updated by Dan Gillean about 5 years ago

  • Duplicated by Bug #9560: Add ability to export EAD, with children, when single slug specified in CLI export task added

#3 Updated by Dan Gillean about 5 years ago

  • Assignee changed from Mike Gale to Mike Cantelon
  • Target version set to Release 2.3.0
  • Requires documentation set to Yes

This has been raised recently in #9560 - which I have now marked as a duplicate, as this ticket has more details. Marking documentation needed so I can check out the current status of the documentation and make updates as needed.

#4 Updated by Mike Cantelon about 5 years ago

  • Status changed from New to Code Review
  • Assignee changed from Mike Cantelon to Mike Gale

PR with fix is here for code review: https://github.com/artefactual/atom/pull/288

#5 Updated by Mike Cantelon about 5 years ago

  • Status changed from Code Review to QA/Review
  • Assignee changed from Mike Gale to Dan Gillean

I've merged the PR into qa/2.3.x.

#6 Updated by Dan Gillean about 5 years ago

  • Duplicated by deleted (Bug #9560: Add ability to export EAD, with children, when single slug specified in CLI export task)

#7 Updated by Dan Gillean about 5 years ago

  • Related to Bug #9560: Add ability to export EAD, with children, when single slug specified in CLI export task added

#8 Updated by Dan Gillean about 5 years ago

  • Status changed from QA/Review to Feedback
  • Assignee changed from Dan Gillean to Mike Cantelon

So I have reopened #8992 cuz i realize now the difference was EAD XML vs CSV exports. However, Mike G has reported that the fix added here does NOT work for EAD. I have not yet tested it for CSV.

#9 Updated by Mike Cantelon about 5 years ago

  • Assignee changed from Mike Cantelon to Mike Gale

How is it failing?

It work for me using:

./symfony export:bulk --format="ead" --single-slug="some-subfonds" EAD/ead.xml

#10 Updated by Mike Gale about 5 years ago

  • Assignee changed from Mike Gale to Mike Cantelon

Sorry, to clarify, it seems broken when using the --criteria option to select lower level descriptions to export.

I think the problem is the code here https://github.com/artefactual/atom/blob/qa/2.3.x/lib/task/export/exportBulkBaseTask.class.php#L173
Which seems to be saying "if we're exporting ead, ensure the parent_id=1" which of course will filter out any non-top level descriptions from the query.

#11 Updated by Mike Gale about 5 years ago

I was using something like --criteria=" i.id=(SELECT object_id FROM slug WHERE slug='test-series')"

#12 Updated by Mike Cantelon about 5 years ago

  • Assignee changed from Mike Cantelon to Mike Gale

#13 Updated by Mike Cantelon about 5 years ago

  • Assignee changed from Mike Gale to Mike Cantelon

#14 Updated by Mike Cantelon about 5 years ago

  • Status changed from Feedback to Code Review
  • Assignee changed from Mike Cantelon to Mike Gale

I've created a PR that allows custom criteria to completely override the default criteria (rather than augment it):

https://github.com/artefactual/atom/pull/291

The way EAD exports is different than other bulk export methods. The default criteria that restrict the export query to top-level descriptions are there to prevent weird EAD export results by default.

The bulk exporter creates a file for each row returned by the export query. When a description is exported to an EAD file all child descriptions get included when the EAD export logic creates the resulting XML file. This means description children don't have to be returned by the export query. If children are included, each row returned by the export query will result in a redundant XML file containing the child description and any child descriptions it might have.

For example, let's say the export query doesn't specify a parent ID and we get results like this:

ID: 1 (top level description)
ID: 2 (child description of 1)

This would result in these EAD XML files:

ead_1.xml (contains EAD for description ID 1 and EAD for child description ID 2)
ead_2.xml (contains EAD for description ID 2)

Note that this results in ead_2.xml, which is redundant.

#15 Updated by Mike Gale about 5 years ago

  • Assignee changed from Mike Gale to Mike Cantelon

It LGTM for now, since we should just get --criteria fixed (we refer to using it in the docs a few times).

As a side note, I feel like the --criteria option is kind of hacky and un-user-friendly in general, and that export:bulk should possibly be broken up into multiple tasks more specific to the format being exported, or at least have some different code handlers instead of how we're doing it now where we're trying to handle all the cases in one area. I'd much rather have more ad hoc tasks that keep it simple than having generic tasks that can do 100 things. Anyway, that's beyond the scope of this bug so I think this fix is OK, I just wanted to write my thoughts on the current situation with that code.

Thanks for looking at it

#16 Updated by Mike Cantelon about 5 years ago

NP... thanks for reviewing the code.

Yeah, --criteria is meant for power users and any use-cases that are likely to be common should be supported in a more user-friendly way.

As it stands, by fixing the issue with single-slug you can now use that instead of criteria for your use-case.

Breaking up the bulk export task into separate, use-case-specific tasks is definitely something to consider given EAD works differently than MODS, etc.

#17 Updated by Mike Cantelon about 5 years ago

  • Status changed from Code Review to QA/Review
  • Assignee changed from Mike Cantelon to Mike Gale

(I've merged the PR into qa/2.3.x. I've assigned to Mike G for QA.)

#18 Updated by Mike Gale about 5 years ago

  • Status changed from QA/Review to Verified

Attempted to export a series level description via both --single-slug and --criteria. They both seem to work correctly now.

Also available in: Atom PDF