bmoeskau wrote:
Hello all,
We are planning on implementing tagging in our app, similar to del.icio.us, etc. The basic concept is simple: add a Tags table (id / value) and add a TagRelations table (TagID, FK ID) that will link TagID to other entity ids.
My question is simply, has anyone else implemented this in LLBL? Did you do it this way or some other way? I'm a little concerned about the number of joins that may be required, as a generic tag search would essentially have to join to every single entity table that supports tags (could be 8-10 or more). Most systems I've seen that use tagging only operate on one or a handful of entity types, but our goal is to be able to link together any of a number of entities arbitrarily by tags and create dynamic "project" views of our data. Anyone have any insights about the best approach for this?
We use tagging extensively in our app... The problems you describe are well know.
You will need a separate TagRelations table for each type of data being tagged. Using a single table means that you will need a discriminator and this means that you need extra select criteria and you can't define "real" FKs which help the optimizer choose an appropriate query plan based on statistics.
If you want an aggregated resultset from all your TagRelation table you can always UNION the results. This BY FAR more perfromant than a single query with lots of JOINs as the optimiser will select a more appropriate plan for each query.
Depending on the usage profile of your app, i.e. how many concurrent users will be searching and then paging through the tagged data (I presume you are going to have paging) coupled with how the data will be sorted and whether or not there are permission on that data, JOINs will either be okay or a complete "no no".
The TagRelations table needs to have enough information to isolate the set of FKs for your data for a given page. If the sort column is not in the TagRelations table, then SQL will need to JOIN ALL the data and then sort before it can isolate the required page.
If you ensure that each TagRelation table contains exactly the data you need for FKs and sorting criteria etc AND you create an index for each query you are going to run. The JOIN will not be an issue. SQL Server will ONLY join the rows you are planning on fetching anyway using an Index Seek.
Our traffic profile is such that JOIN are a "no no" and therefore we have denormalised all of our tables to eliminate JOINs completely. The denormalisation is managed by triggers and is not for the faint hearted.
We have a seperate stored procedure for every possible fetch and these have been hand tweaked to ensure that SQL Server chooses the correct query plan. The results from the SPs are then "projected" into EntityCollections.
If you have more than 1 web server or business tier server, the other issue you will have is TagIDs. As it is quite possible that more than one user will create the same tag at the same time. You will need to have a unique contraint on the TagName field and ensure that failed inserts check to see if the tag has "since" been created.
I also recommend that you cache TagID/TagName pairs since this information is immutable (doesn't change) and will save you going back to the database to fetch TagIDs when people add tags that you have already in the cache. This means that you will only need to fetch TagIDs for tags that are not already in the cache.
Hope this helps,
Marcus