Bug #7167

Storage names that share a name, location and institution should merge

Added by Sarah Romkey over 7 years ago.

Status:NewStart date:08/29/2014
Priority:MediumDue date:
Assignee:Mike Gale% Done:


Category:Physical storage
Target version:-
Google Code Legacy ID: Tested version:2.1
Sponsored:No Requires documentation:


Physical storage objects that share a name, location and institution should be merged into one when browsing Physical Storage. See chat log below.

(10:22:26 AM) sromkey: fiver: Misty I feel like if the Name and Location are the same, it should be treated as one in browse physical storage. Thoughts?
(10:23:32 AM) fiver: sromkey: i think the problem was multi-repo instances. since there is no easy way to check for a repo relationship (i don't think)? you risk have too many merges when there are physical storage details coming in from many repos, with similar names.
(10:23:58 AM) sromkey: fiver: Ah, I see.
(10:24:12 AM) fiver: sromkey: it's also like - a box on a shelf might share space with 2+ fonds, but a folder in a box is unlikely. i wonder if we can make it more granular?
(10:24:45 AM) sromkey: fiver: yeah...
(10:25:12 AM) fiver: sromkey: it's hard having to think of so many use cases!
(10:25:54 AM) fiver: sromkey Misty: there is also the setting in Admin for multi-repository. one possibility might be to make the behavior different depending on a settings check?
(10:26:20 AM) sromkey: fiver: That would be nice
(10:27:08 AM) sromkey: fiver Misty another much much more complicated (I'm guessing) solution would be to add unique ID's to physical storage objects
(10:28:27 AM) sromkey: fiver: I wonder if the institution ID could be prepended to the location or something?
(10:29:11 AM) fiver: sromkey: unique IDs, and institution IDs would be a great approach!
(10:29:38 AM) fiver: sromkey: what if there is no institution record? seems like many single-repo places don't bother making one and linking everything
(10:29:51 AM) fiver: (which is bad for the EAD etc but whatevs)
(10:29:58 AM) sromkey: fiver: true.. maybe we do need separate behaviour for both
(10:33:58 AM) fiver: MikeG-: Misty and sromkey are still encountering duplicates, and we're thinking of solutions that won't mess up multi-repo instances...

(10:45:32 AM) MikeG-: Misty: so still duplicates? What physical object name, e.g.?
(10:45:56 AM) Misty: MikeG-: The duplicates happen only when the objects are in different collections.
(10:46:13 AM) Misty: So identical name, location, etc.
(10:46:25 AM) Misty: That seems to be expected behaviour, from the explicit collection checking in the method it uses?

(10:46:38 AM) MikeG-: Yeah.
(10:47:04 AM) MikeG-: We felt trying to merge physical objects across collections was a really bad idea for multiple repo institutions
(10:47:20 AM) MikeG-: Or wait sorry
(10:47:28 AM) MikeG-: Just across collections, not repos necessarily, heh.
(10:47:41 AM) MikeG-: Does it make sense to merge them by repo? That's an archivist question I guess.
(10:48:36 AM) sromkey: MikeG-: Misty fiver to me it makes sense to merge all collections in one name/location within one repo

(10:49:40 AM) sromkey: Because the idea (to me) behind browsing physical storage is you can go to one physical location, and know everything that's there- for physical control reasons.
(10:50:48 AM) sromkey: I think it should be a recommended use that institutions give a name that will differentiate it from other collections (eg CollectionA-Box 1 vs CollectionB-Box 1) but that's really up to them
(10:51:42 AM) MikeG-: sromkey: that makes sense. I think the reason we only matched on a collection basis originally for [institution] was because they didn't do that or something.
(10:51:54 AM) MikeG-: Anyway, if it's decided to go by repo instead of by collection, that's an easy enough change to make.
(10:52:00 AM) MikeG-: It'd actually be easier I think code-wise.
(10:52:09 AM) sromkey: fiver: your thoughts?
(10:52:26 AM) Misty: MikeG-, sromkey: That makes sense to me.

(10:55:20 AM) sromkey: haha, MikeG- fiver Misty I just stumbled on this: #4153
(10:55:21 AM) qubot: Links: https://projects.artefactual.com/issues/4153
(10:56:12 AM) Misty: Haha!
(10:56:25 AM) fiver: sromkey: sorry, reading now
(10:57:53 AM) fiver: sromkey MikeG- what about those where there is no repo? i'm thinking of single institution instances that don't bother to make a repo record
(10:58:30 AM) sromkey: Then it could default to a different behaviour?
(10:59:37 AM) fiver: sromkey MikeG- I think we need to think of what happens when there are records with no repo associated in a multi-repo instance too. bc ppl are sending all sorts of data to ArchivesCanada, for instance - and while it should technically all have a repo, we might want to consider the ideal no-repo behavior
MikeG- Misty
(11:00:16 AM) sromkey: fiver: MikeG- Misty ugh, so true, we can't count on everyone submitting good data

(11:02:24 AM) sromkey: ... thinking...
(11:02:46 AM) fiver: i noticed that Sevein added an estimate of 24h on #4153 a couple years back - that seems out of reach for right now...
(11:02:47 AM) qubot: Links: https://projects.artefactual.com/issues/4153
(11:03:01 AM) sromkey: fiver: ooh, yeah. Big time.
(11:03:24 AM) sromkey: fiver: I guess we should really be thinking about a single repository at this exact moment
(11:03:30 AM) fiver: sromkey: we should probably make a physical storage dev page at some point with issues and possible fixes/improvments/wish-list items, like the treeview page... when we have time...
(11:06:55 AM) fiver: sromkey MikeG- well merging by repo is still useful... but will that still lead to dupes within a repo more often than we'd like?

(11:11:32 AM) sromkey: MikeG-: fiver my thought is, if storage name and location are the same, merge them
(11:11:44 AM) sromkey: MikeG-: fiver if they're not the same, or if location is null, don't
(11:12:38 AM) fiver: sromkey: that, plus repo check? sounds like a plan to me. what about no-repo? just check for a match on both storage name and location?
(11:12:55 AM) Misty: sromkey: If location is null will give us some problems for [institution], who have multiple physical objects that do have a null description
MikeG- Misty
(11:13:54 AM) sromkey: Misty: right, thanks for the reminder..
(11:14:33 AM) sromkey: fiver: yeah, this is what I would do for single-repo
(11:14:48 AM) sromkey: multi-repo needs development that we can't do right now
(11:16:26 AM) sromkey: MikeG-: Misty I'm guessing this would be somewhat complicated to implement based on single repo or no
(11:16:33 AM) fiver: well, i think that use case might still be the best case for multi-repo we can come up with... we might have to refine it a bit for edge cases, but still, if we do the repo check, it should help in most cases
(11:17:42 AM) sromkey: fiver: true
(11:18:28 AM) sromkey: fiver: it seems somewhat unlikely that 2 repos will have the exact same combo of name and location
(11:20:32 AM) fiver: sromkey: i don't know about that - seems like ppl give things really generic names like shelf 1 and 1 often (in some of the data i've seen), but still, i think that if we can check for repo as well (even in single repo cases), that would be best case for now
(11:21:07 AM) sromkey: fiver: yeah, this is true I suppose
(11:21:54 AM) sromkey: fiver: I feel like an ideal would be for the institution id, the location and the name to be mashed together into a unique ID for that storage name, at that particular repo
(11:22:20 AM) fiver: sromkey: yeah that would be good
(11:26:53 AM) sromkey: MikeG-: can you give fiver and I an idea of our options for the above? ^

(11:28:32 AM) MikeG-: I think matching by repository and then merging physical objects with the same name&location sounds fine.

Also available in: Atom PDF