Bug #8147
Contents from external text documents are not being indexed
Status: | Verified | Start date: | 03/26/2015 | |
---|---|---|---|---|
Priority: | Medium | Due date: | ||
Assignee: | Dan Gillean | % Done: | 0% | |
Category: | Digital object | |||
Target version: | Release 2.2.0 | |||
Google Code Legacy ID: | Tested version: | |||
Sponsored: | No | Requires documentation: |
Description
Reported in the user forum:
History
#1 Updated by José Raddaoui Marín about 7 years ago
- Status changed from New to Code Review
- Assignee changed from José Raddaoui Marín to Mike Gale
PR 131
#3 Updated by Mike Gale about 7 years ago
- Assignee changed from Mike Gale to José Raddaoui Marín
Looks fine.
#4 Updated by Jesús García Crespo about 7 years ago
- Status changed from Code Review to QA/Review
- Assignee changed from José Raddaoui Marín to Dan Gillean
Merged.
#5 Updated by José Raddaoui Marín about 7 years ago
- Status changed from QA/Review to Code Review
- Assignee changed from Dan Gillean to Mike Gale
I've added more fixes to the DO extractText function, they are ready for code review in PR 142
I'll add some test notes when it gets merged.
#6 Updated by Mike Gale about 7 years ago
- Assignee changed from Mike Gale to José Raddaoui Marín
looks fine
#7 Updated by José Raddaoui Marín about 7 years ago
- Status changed from Code Review to QA/Review
- Assignee changed from José Raddaoui Marín to Dan Gillean
Merged. This fixes will allow to extract the text from external documents while using the command line task:
php symfony digitalobject:extract-text
A new temporary copy will be downloaded for the external resources before the text is extracted with 'pdftotext'. If you experience timeout problems, the timeout limit for downloads can be increased in:
https://github.com/artefactual/atom/blob/qa/2.2.x/config/app.yml#L13
#8 Updated by Dan Gillean about 7 years ago
- Status changed from QA/Review to Verified
Added a note in the 2.2 docs for this task as well about the new behavior: