Monster Metadata Challenge!

Posts   
 
    
Maxus
User
Posts: 76
Joined: 04-Aug-2006
# Posted on: 08-Nov-2007 07:45:16   

Hi People!

I knew the title might peak some interest simple_smile We have a requirement in our project to store 30 fields of metadata (varchar(100)) against each item in the database. I've tried a few things that haven't performed very well. Im using SQL 2005 unfortunately the client can’t be any more specific about the data nor the data type themselves which leads to a generic solution.

The two solutions I've tried:

A table with 30 columns for each of the metadata items - works but like searches crawl along and full text indexing isn't much better.

The other I tried was a table with a single column which were then pivoted around and joined on the item table that the metadata relates to. This preformed okay but the table became exceptionally large very quickly 2.5 million rows X 30.

One suggestion I’ve had is to try using XML, but I’m not sure how LLBLGEN handles updating XML and if it can handle Querying the XML easily, Maybe use a updatable view in between? (if that’s possible?)

I was curious to see if anyone else had a similar requirement and how they dealt with it?

Thanks! Alex

Max avatar
Max
User
Posts: 221
Joined: 14-Jul-2006
# Posted on: 08-Nov-2007 11:15:15   

Maybe a possible solution is a single table with a single varchar (3330) column. where 3330 = 10030 + 1130

the 11 come from the len of the string ">; fieldN=<"

and you code the data in that column like 'field1=<......>; field2=<.....>; ....'

this is harder, because you need to manage escape character for the fields data, and you probably need to double search anything: You will need one sql search to find the row that 'may' contain the required data in the required field, followed by another search (in a stored procedure? on the client? in an UDF?) that, once the rows are found, check if the searched data is in the required field.

This approach have the advantage that you can fill-in and store only the required fields.

Also you probably need a varchar column longer than 3330, because you'll need to manage escape char...

Or maybe that you choose to not use escape char, and use a "fixed lenght" string management, so each field will have exactly 100 char stored. This approach make the search simpler, because if you are searching for the string "abcd" in field 4, you can create a WHERE like WHERE myDataCol like 'field4=<abcd'

but you cant do something like WHERE myDataCol like 'field4=<abcd*' because this will match even the string 'field4=<kkkkrutiue...>; field5=<abcd....>'

Max avatar
Max
User
Posts: 221
Joined: 14-Jul-2006
# Posted on: 08-Nov-2007 11:29:18   

Max wrote:

Also you probably need a varchar column longer than 3330, because you'll need to manage escape char...

Or maybe that you choose to not use escape char, and use a "fixed lenght" string management, so each field will have exactly 100 char stored. This approach make the search simpler, because if you are searching for the string "abcd" in field 4, you can create a WHERE like WHERE myDataCol like 'field4=<abcd'

but you cant do something like WHERE myDataCol like 'field4=<abcd*' because this will match even the string 'field4=<kkkkrutiue...>; field5=<abcd....>'

if you use a "fixed length" string solution, you can do the serach in the following way:

said that you ale looking for the string "my home" in the field 5, you can use a where like the following WHERE SUBSTRING(myDataCol, (5-1) * (100+11) + LEN('field5=<),100) like 'my home'

(5-1) * (100+11) is the lenght of the data that you need to be skip (field1..4)

But I don't know how this will perform...

This is just an example, I hope you get the idea simple_smile

stefcl
User
Posts: 210
Joined: 23-Jun-2007
# Posted on: 08-Nov-2007 13:17:44   

Hello, I think I would go for the table with a single Varchar(100) column + a foreign key. The foreign key index selectivity (1 different value among 30) is not too bad and you are able to search for items by metadata quite efficiently.

Maxus
User
Posts: 76
Joined: 04-Aug-2006
# Posted on: 15-Nov-2007 00:38:34   

Hi Guys,

Thanks for the great suggestions, unfortunately the putting it all in one row became difficult and create a performance overhead. but I admit I hadn't thought of trying that.

Other options I've tried:

  1. Long table with 30 columns each of varchar(100) length, worked but was slow and require 30 indexes (eek). So thats a no go options but it actually out preformed full text indexing in that situation, they weren’t kidding when they said they had improved the like operator using indexes in sql 2005. This solution doesn't allow for expansion, but does work.

  2. XML being stored against each item, well is a class a pain in the butt. the XML is slow to search and populate, a search on 90 rows with primary XML indexes and secondary xml index took 8 seconds, I never let is finish with 50,000 rows (2+ hours). (Hate to think how slow it would be with 2.5 million wink ). I haven't tried full text indexing on it yet. As LLBLgen doesn't deal easily with XML reading and updating, I was thinking of reading the fields out into a view and using a stored procedure to update the details.

  3. Single varchar(100) column with an Id and a Order number. The order number defines which column it belongs to out of the 30. This data was then pivoted inside a view to make it easy to use and join on; this worked well and managed to perform okay, but nothing to write home about. One thing I didn't try is using the order number to create a table partition that should speed it up a bit. Also haven't tried full indexing on it.

Any other suggestions? I might just end up picking 3 if 2 with fulltext doesn't cut it.

trevorg
User
Posts: 104
Joined: 15-Nov-2007
# Posted on: 28-Nov-2007 23:49:15   

"A table with 30 columns for each of the metadata items - works but like searches crawl along "

This would seem like the fastest way to query the information would it not? Did you have any indexes on the table? Do you have to do LIKE comparisons on every column? There must be some column(s) that are mandatory that would allow you to add indexes and get the query down to a subset before doing LIKE comparisons, no?

Maxus
User
Posts: 76
Joined: 04-Aug-2006
# Posted on: 18-Dec-2007 02:47:51   

Hi Trevorg,

Yep, after futher testing a single row that is 30 columns wide provided the best performance. Trying to do the pivot table killed the query because it effectively has to do max 30 times on a single column that contains 30 times the rows, so it was a no go.

The 30 fields worked fine as long as the results were limited and specific as possible.

Thanks A