Self-Referential Many-To-Many Relationships
I am working on DataMgr 2.1 and I have run into an issue with which I would love some advice. This doesn't require any knowledge of DataMgr itself.
This problem deals with many-to-many joins. For example a user could be a member of multiple groups and each group could have multiple users as members. This would generally be handled by a join table. For example, if users were stored in a "users" table and groups were stored in a groups table, the relationships between them might be stored in a "users2groups" table.
A relationship could also be self-referential. For example, products (represented with a "products" table) could reference related products (also in the products) table. This could also be handled with a join table.
The question I have is how to handle the bidirectional quality of these relationships.
In the case of users and groups, if a user is a member of the group then the group has the user as a member (yes, I realize that was tautologous). This is because a row exists in "users2groups" to represent the relationship between the two. More specifically, this row represents a bidirectional relationship between the two tables.
In the case of a self-referential table, however, this is not necessarily the case. Each row of the join table only represents a unidirectional relationship between the product and the related product.
So, if "products2relatedproducts" had the fields "product_id" and "relatedproduct_id" (each referencing the "product_id" field of the "products" table) then the row would indicate that one-way relationship. If product "A" had product "B" as a related product in this table, then the row would not show that "B" had "A" as a related product.
This scenario is inconsistent with how many-to-many relationships generally work.
So, finally for the question:
If such a relationship is managed through DataMgr, would it make sense (and be helpful) if DataMgr enforces the bidirectional nature of the relationship (making self-referential relationships more consistent with other relationships)? Would it be better to have this as an option?
Thanks in advance for any and all thoughts.
Truly bidirectional arrangements are typically an edge case and I can imagine cases where "related products" for business reasons would NOT want to be bi-directional (although I'm guessing most of the time they would wnat to work both ways).
This is a really interesting problem and one I haven't had to implement before. I think my votr would be yes - to support an option for managing bidirectional arrangements. The only thing is that you'd have to do it for all of your methods, so when you "get" relationships, you'd get both ways round, but then if you deleted relationships as a precursor to re-inserting them (or looped through to find which records to add or remove when managing associations) they'd also have to all pull both ways round hich would have the interesting (but I think it'd be unimportant) side effect that the direction of relationships between two products would depende upon which last had its relationships edited.
Yep. I think this would be a nice feature for the edge cases that need it, and I'd love to chat to confirm how you'd implement this and se if we'd both take the same approach!
Will be interesting to see what everyone else has to say . . .
Well, I'm not sure that truly bidirectional relationships are really an edge case. Given that any many-to-many relationship that uses a join table and isn't self-referential is bidirectional (see the users and groups example above).
Still, your point is well made that just because product "A" has "B" listed as related doesn't mean "B" should have "A" listed as related. In that way, at least, the bidirectional nature of a self-referential relationship may not be the norm.
As to implementation, I haven't decided completely yet. I think that I would set it up so that (if a many-to-many self-referential relations is bidirectional) every time a row is created or removed, its reverse would also be created or removed.
For example, if an action causes "A" to have "B" listed as a related product, then "B" would automatically have "A" listed. Thus two rows would be created instead of one. Similarly, if "A" has "B" removed from being related then "B" would also have "A" removed.
This example certainly seems to argue that self-referential relationships should not always be bidirectional. My thoughts on this were started by seeing an example where I thought it made sense that a self-referential relationship would be. I wish I could remember what it was.
Perhaps I am misunderstanding the issue a bit, but perhaps it is best to think of the self referential portion as a separate element. For example, in this case my product has related products where a "related product" is *type of* product. Or, using the adjecency list example that is a common self-referential table (though one to many), my category has a subcategory which is a *type of* category.
I guess my point is, if I were getting a "related product" it would in fact have a product or my subcategory has a category (it is a parent child relationship thing and does work both ways). The fact that "related products" are stored in the products table (and can also serve as the parent side of the relationship) is simply a matter of how they are persisted and not how they are defined as "objects".
Am I making any sense at all here?
My concern isn't really about about showing bad data, but that duplicate work would be required (either by a user or elsewhere in the code) to make a self-referential (in terms of tables, not objects) relationship behave in the bidirectional manner that it would if two different tables were involved.
I don't think this is really a question of objects, but more one of database interaction. The database layer could be used to persist objects, but at that level I don't think the object modeling is relevant - it is more about the data storage.
Although the question of how objects are defined are important at the object level, I don't think it is important at the database level.
The real question, I think, is whether it would make sense to have DataMgr enforce the bidirectional nature of a self-referential table relationship where that relationship should be bidirectional (as would be the case when the relationship is self-referential).
Does that make sense?
In your reply to Peter you mention that "any many-to-many relationship that uses a join table and isn't self-referential is bidirectional". I guess you could also say that any many-to-many relationship that IS self-referential is unidirectional (since getting the relationship to behave bidirectionally will require two rows for each relationship and some extra management in your code).
The analogy that first occurred to me when thinking about this is the concept of a "friend" in a social network application. For some apps that I've seen, I can declare that I'm your friend (the first row), then you can either ignore me or also declare that you're my friend (the second row). In other apps one of us might offer the other friendship, and as long as the other accepts, the two rows are created together (there's a sample edge case Peter :).
All that said, I would suggest having DataMgr treat self-referential relationships as unidirectional by default (since that's how the database handles them by nature), but add the option to support bidirectional relationships (as Peter suggested) for the types of apps that would benefit from this behavior.
Hope that helps :)
Great input! I really love your friends analogy. I hadn't considered that most self-referential relationship are unidirectional, but I think you are right about that (though it still seems odd to me).
I like your suggestion about defaulting to unidirectional, but adding an option for automatic support of bidirectional self-referential relationships. This is also nice because it will keep all existing code working as it does now for any future upgrade - which is very important.
Now I just have to figure out how hard that will be to implement.
Thanks!
Well, I have a case where I want to insert a mind map into a relational database. On a mind map, a single line connects any two related concepts. Since every concept is of one type, and the lines are singular, it is by its nature a bi-directional many-to-many self-relationship. You might be able to decorate (in your mind mapping software) that the line has an arrowhead on one side but not the other, yet to the reader they are still visible as two objects at two ends of one string.
I ran into this problem while attempting to build a research database in FileMaker Pro. If an idea A is related to idea B, idea B will always be related to idea A (by the fact that the inverse is also true). However, because the join table would need two rows per "link", I cannot find any convenient way to do this in FileMaker Pro, since I lack insertion triggers, etc.
people don't need this at all, especially since folksonomy tagging and categorisation is now very popular and blogs and eccommerce sites alike.
I bet youtube to get related videos just search via owner (limit say 5) where id is not equal to id of video playing etc.
Or for general related items just query via tags or category where id is not equal to say the item being viewed.
This is only really appropriate in these scenarios i guess and does not solve the problem say if you wanted to have ten items down a page and under each item
show the number of related items.
I guess I am gonna try and avoid this scenario for now though.