Bug #8147

Contents from external text documents are not being indexed

Added by José Raddaoui Marín about 7 years ago. Updated about 7 years ago.

Status:VerifiedStart date:03/26/2015
Priority:MediumDue date:
Assignee:Dan Gillean% Done:

0%

Category:Digital object
Target version:Release 2.2.0
Google Code Legacy ID: Tested version:
Sponsored:No Requires documentation:

History

#1 Updated by José Raddaoui Marín about 7 years ago

  • Status changed from New to Code Review
  • Assignee changed from José Raddaoui Marín to Mike Gale

PR 131

#3 Updated by Mike Gale about 7 years ago

  • Assignee changed from Mike Gale to José Raddaoui Marín

Looks fine.

#4 Updated by Jesús García Crespo about 7 years ago

  • Status changed from Code Review to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Merged.

#5 Updated by José Raddaoui Marín about 7 years ago

  • Status changed from QA/Review to Code Review
  • Assignee changed from Dan Gillean to Mike Gale

I've added more fixes to the DO extractText function, they are ready for code review in PR 142

I'll add some test notes when it gets merged.

#6 Updated by Mike Gale about 7 years ago

  • Assignee changed from Mike Gale to José Raddaoui Marín

looks fine

#7 Updated by José Raddaoui Marín about 7 years ago

  • Status changed from Code Review to QA/Review
  • Assignee changed from José Raddaoui Marín to Dan Gillean

Merged. This fixes will allow to extract the text from external documents while using the command line task:

php symfony digitalobject:extract-text

A new temporary copy will be downloaded for the external resources before the text is extracted with 'pdftotext'. If you experience timeout problems, the timeout limit for downloads can be increased in:

https://github.com/artefactual/atom/blob/qa/2.2.x/config/app.yml#L13

#8 Updated by Dan Gillean about 7 years ago

  • Status changed from QA/Review to Verified

Added a note in the 2.2 docs for this task as well about the new behavior:

Also available in: Atom PDF