Normalization and saving attribute/value

Posts   
 
    
tprohas
User
Posts: 257
Joined: 23-Mar-2004
# Posted on: 14-Jul-2004 01:01:39   

I'm not sure if the is an appropriate place to ask this question, but I wanted to ask this to others using LLBLGen Pro.

I'm trying to figure out how to save the state of an entity when some of the attributes/values are stored in a lookup table. After doing some research on this I still do not understand how to deal with the problem of saving data for an entity when the data is stored in a separate table.

Using the DDL below as an example, I create a table for a couple of different parts and their attributes. I have a table for Brakes and a table for Wheels. Now the major difference between the two entities here is that Wheels have a one-to-many relationship with Colors and Sizes. When a customer chooses either a brake or a wheel to add to the Assembly (an assembly of parts) the brake or wheel id is added to the PartID field of the assembly table.

The problem I know have is how do I know what color and size of wheel was chosen. When a customer chooses the wheel they are also going to choose a color and size for that wheel from a couple of drop down lists. My question now is how do I store what color and size wheel the customer chose? Do I somehow store that as part of the Assembly or in a separate table? This seems like a very common problem for relational database design, but I can't seem to find any explanations anywhere.


-- Brakes Table
CREATE TABLE [dbo].[Brakes] (
    [BrakeID] [int] NOT NULL,
    [Manufacturer] [varchar] (32) NOT NULL,
    [Model] [varchar] (32) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [Brakes] ADD CONSTRAINT PK_BrakesID
    PRIMARY KEY NONCLUSTERED ([BrakesID])
GO

-- Wheels Table
CREATE TABLE [dbo].[Wheels] (
    [WheelID] [int] NOT NULL,
    [Manufacturer] [varchar] (32) NOT NULL,
    [Model] [varchar] (32) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[WheelSizes] (
    [WheelSizeID] [int] IDENTITY (1,1) NOT NULL,
    [WheelID] [int] NOT NULL, -- Foreign key to Frames table.
    [Size] [decimal] (4,2) DEFAULT (0) NOT NULL,
    [Grams] [decimal] (5,2) DEFAULT (0) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[WheelColors] (
    [WheelColorID] [int] IDENTITY (1,1) NOT NULL,
    [WheelID] [int] NOT NULL, -- Foreign key to Wheels table.
    [Color] [varchar] (20) DEFAULT ('') NOT NULL,
    [ImageName] [varchar] (20) DEFAULT ('') NOT NULL,
    [Description] [varchar] (200) DEFAULT ('') NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [Wheels] ADD CONSTRAINT PK_WheelID
    PRIMARY KEY NONCLUSTERED ([WheelID])
GO

ALTER TABLE [WheelSizes] ADD CONSTRAINT PK_WheelSizeID
    PRIMARY KEY NONCLUSTERED ([WheelSizeID])
GO

ALTER TABLE [WheelColors] ADD CONSTRAINT PK_WheelColorID
    PRIMARY KEY NONCLUSTERED ([WheelColorID])
GO

ALTER TABLE [WheelSizes] ADD CONSTRAINT FK_WheelSizes_Wheels
    FOREIGN KEY ([WheelID]) REFERENCES [Wheels] ([WheelID])
GO

ALTER TABLE [WheelColors] ADD CONSTRAINT FK_WheelColors_Wheels
    FOREIGN KEY ([WheelID]) REFERENCES [Wheels] ([WheelID])
GO

-- Assemblies Table
CREATE TABLE [dbo].[Assemblies] (
    [AssemblyID] [int] IDENTITY (1,1) NOT NULL,
    [PartID] [int] NOT NULL,
    [Quantity] [tinyint] NOT NULL,
    [SalePrice] [money] NOT NULL
) ON [PRIMARY]
GO
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 14-Jul-2004 11:07:21   

Color and size are attributes of the wheel entity. So you should store that there. You should add a type field to Assembly which illustrates of which type the PartID is: wheel, break, frame, tire etc. This can be an enum like field.

Frans Bouma | Lead developer LLBLGen Pro
tprohas
User
Posts: 257
Joined: 23-Mar-2004
# Posted on: 15-Jul-2004 01:42:51   

Otis,

If I put the Color and Size attributes in the Wheels table I will then have to duplicate the data for one model of wheel for each color/size combination. Wouldn't this be a non-normalized design?

I thought by using a separate table to store the colors and sizes that would be a more properly normalized design. One problem I have with having the color and size attributes in the Wheels table besides the duplication of data is that I then don’t have any way to distinctly identify a model of part. The duplication can be seen in the example below. What I'm trying to figure out how to do set the tables up so that I only get one model in the parts table and then store the multi-valued attributes in a different table.

Model Color Size Deus X-Type Black 170 Deus X-Type Black 175 Deus X-Type Black 180 Deus X-Type Silver 170 Deus X-Type Silver 175 Deus X-Type Silver 180

What do you think?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 15-Jul-2004 10:23:28   

tprohas wrote:

Otis,

If I put the Color and Size attributes in the Wheels table I will then have to duplicate the data for one model of wheel for each color/size combination. Wouldn't this be a non-normalized design?

No, only if you have predefined color definitions which have an ID. If color is 'Black' or 'Blue', storing it in a table like: ColorID int ColorDescription varchar(50)

it will be 'normalized' but you don't gain anything. Blue will not change all of a sudden to 'Brown'. simple_smile In fact, it can be bad for history rows, because what if you use that normalized table as I've described above and store the ColorID with the wheel. A wheel record is stored and over time you change ColorID 10's description from Blue to BlueBrown, as Blue is not sold anymore, but BlueBrown is. An old wheel's row is now suddenly became BlueBrown, while it was Blue simple_smile

Normalization is necessary to reduce redundancy of values for updates. So if you have a customer row and you store the ContactPerson for that customer in 5 different places, you have to update all those 5 places if the contactperson changes. That's why normalization is used, to reduce that amount.

However you're not going to store 'ContactPerson' in a separate table simple_smile

Colors, dates, countries, months of the year... these lists are often pretty static. One can decide to store them in a separate table to have an easy lookup table. However nothing stops you from storing that list in code and when a row which contains a country name is saved simply store the name, not an id.

It comes down to the decision if a color name is a value or a descriptive type. Wheel has Color Wheel has Size Color belongs to Wheel Size belongs to Wheel

When you add these scentenses to an ORM tool like Visio, it will ask you if Color and Size are values (like ContactPerson in customer) or types on their own (like Order belongs to Customer. Customer is a type)

I think that's the level you should make the decision on: is it a value or not. If not (that's up to you, both have pre/cons) you have to define a color type. If it is, you just define a value with Wheel, which becomes a column in the Wheel table.

To me it's logical that color and size are attributes of the wheel. I'd normalize OUT the manufacturer name however as a separate type.

I thought by using a separate table to store the colors and sizes that would be a more properly normalized design. One problem I have with having the color and size attributes in the Wheels table besides the duplication of data is that I then don’t have any way to distinctly identify a model of part. The duplication can be seen in the example below. What I'm trying to figure out how to do set the tables up so that I only get one model in the parts table and then store the multi-valued attributes in a different table.

Model Color Size Deus X-Type Black 170 Deus X-Type Black 175 Deus X-Type Black 180 Deus X-Type Silver 170 Deus X-Type Silver 175 Deus X-Type Silver 180

What do you think?

Well, having size in a separate table will give you the same list, only then '170' is the size ID simple_smile (for color idem)

I now see your point however. Although Color and Size are values, they do not have a 1:1 or 1:n relation with Wheel, but a m:n relation. I then think 2 tables are required: Wheel and WheelColorSize. Wheel contains the core 1:1 and m:1 data related to wheel and WheelColorSize contains teh wheelID and the available Color-Size combinations available of that type of wheel. YOu can also call that last one 'WheelVersions', as it contains per wheel the versions available: I'm sure not every wheel is available in all sizes and colors.

Frans Bouma | Lead developer LLBLGen Pro
tprohas
User
Posts: 257
Joined: 23-Mar-2004
# Posted on: 15-Jul-2004 19:32:13   

OK, so if you agree in the end that storing the wheel color/sizes in a WheelVerison table is probably best in the end then I'm back to my original question. How do I store the color/size that the customer has chosen when the product is added to the Assembly? This is a problem because all the different part types in the system have different attributes that a customer can choose from that have to be stored.

Some Examples

Frame: Color Size RearShockTravel

Wheel: Color Size

Stem: StemLength

Crank: CrankLength

I somehow need to store the version of part that was chosen by the customer in the Assembly. So I would need a column in the Assembly to store each one of the attributes in the Assembly which looks like this.


-- Assemblies Table
CREATE TABLE [dbo].[Assemblies] (
    [AssemblyID] [int] IDENTITY (1,1) NOT NULL,
    [PartID] [int] NOT NULL, -- Foreign key to Part PK.
    [PartType] [int] NOT NULL, -- Foreign key to PartType PK.
    [Quantity] [tinyint] NOT NULL,
    [SalePrice] [money] NOT NULL
) ON [PRIMARY]
GO

What do you think?

Otis wrote:

tprohas wrote:

Otis,

If I put the Color and Size attributes in the Wheels table I will then have to duplicate the data for one model of wheel for each color/size combination. Wouldn't this be a non-normalized design?

No, only if you have predefined color definitions which have an ID. If color is 'Black' or 'Blue', storing it in a table like: ColorID int ColorDescription varchar(50)

it will be 'normalized' but you don't gain anything. Blue will not change all of a sudden to 'Brown'. simple_smile In fact, it can be bad for history rows, because what if you use that normalized table as I've described above and store the ColorID with the wheel. A wheel record is stored and over time you change ColorID 10's description from Blue to BlueBrown, as Blue is not sold anymore, but BlueBrown is. An old wheel's row is now suddenly became BlueBrown, while it was Blue simple_smile

Normalization is necessary to reduce redundancy of values for updates. So if you have a customer row and you store the ContactPerson for that customer in 5 different places, you have to update all those 5 places if the contactperson changes. That's why normalization is used, to reduce that amount.

However you're not going to store 'ContactPerson' in a separate table simple_smile

Colors, dates, countries, months of the year... these lists are often pretty static. One can decide to store them in a separate table to have an easy lookup table. However nothing stops you from storing that list in code and when a row which contains a country name is saved simply store the name, not an id.

It comes down to the decision if a color name is a value or a descriptive type. Wheel has Color Wheel has Size Color belongs to Wheel Size belongs to Wheel

When you add these scentenses to an ORM tool like Visio, it will ask you if Color and Size are values (like ContactPerson in customer) or types on their own (like Order belongs to Customer. Customer is a type)

I think that's the level you should make the decision on: is it a value or not. If not (that's up to you, both have pre/cons) you have to define a color type. If it is, you just define a value with Wheel, which becomes a column in the Wheel table.

To me it's logical that color and size are attributes of the wheel. I'd normalize OUT the manufacturer name however as a separate type.

I thought by using a separate table to store the colors and sizes that would be a more properly normalized design. One problem I have with having the color and size attributes in the Wheels table besides the duplication of data is that I then don’t have any way to distinctly identify a model of part. The duplication can be seen in the example below. What I'm trying to figure out how to do set the tables up so that I only get one model in the parts table and then store the multi-valued attributes in a different table.

Model Color Size Deus X-Type Black 170 Deus X-Type Black 175 Deus X-Type Black 180 Deus X-Type Silver 170 Deus X-Type Silver 175 Deus X-Type Silver 180

What do you think?

Well, having size in a separate table will give you the same list, only then '170' is the size ID simple_smile (for color idem)

I now see your point however. Although Color and Size are values, they do not have a 1:1 or 1:n relation with Wheel, but a m:n relation. I then think 2 tables are required: Wheel and WheelColorSize. Wheel contains the core 1:1 and m:1 data related to wheel and WheelColorSize contains teh wheelID and the available Color-Size combinations available of that type of wheel. YOu can also call that last one 'WheelVersions', as it contains per wheel the versions available: I'm sure not every wheel is available in all sizes and colors.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 16-Jul-2004 09:58:11   

tprohas wrote:

OK, so if you agree in the end that storing the wheel color/sizes in a WheelVerison table is probably best in the end then I'm back to my original question. How do I store the color/size that the customer has chosen when the product is added to the Assembly? This is a problem because all the different part types in the system have different attributes that a customer can choose from that have to be stored.

First think of how to store the static data, thus the bike parts, the bike parts and their different colors/sizes and other data, like customers.

The assembly type is interesting, because it is an objectified relation. simple_smile The relation Assembly - PartVersion is again a new entity with new attributes. PartVersion is for example the WheelID - Size - Color combination, which you can give a new ID.

So the PartVersion already contains the PartID. You just need to know which partversion table you need look at, the wheelpartversion, the crank part versions etc. That is explained by the type field in the assembly table.

This shows a flaw in the relational model, as you can't define the FK constraints on that PartVersionID in Assembly as it points to a lot of tables, based on PartType.

Frans Bouma | Lead developer LLBLGen Pro
tprohas
User
Posts: 257
Joined: 23-Mar-2004
# Posted on: 17-Jul-2004 02:53:19   

Otis wrote:

tprohas wrote:

OK, so if you agree in the end that storing the wheel color/sizes in a WheelVerison table is probably best in the end then I'm back to my original question. How do I store the color/size that the customer has chosen when the product is added to the Assembly? This is a problem because all the different part types in the system have different attributes that a customer can choose from that have to be stored.

First think of how to store the static data, thus the bike parts, the bike parts and their different colors/sizes and other data, like customers.

The assembly type is interesting, because it is an objectified relation. simple_smile The relation Assembly - PartVersion is again a new entity with new attributes. PartVersion is for example the WheelID - Size - Color combination, which you can give a new ID.

So the PartVersion already contains the PartID. You just need to know which partversion table you need look at, the wheelpartversion, the crank part versions etc. That is explained by the type field in the assembly table.

This shows a flaw in the relational model, as you can't define the FK constraints on that PartVersionID in Assembly as it points to a lot of tables, based on PartType.

My design shows a flow in the relational model? This isn't really just a flow in my design?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 17-Jul-2004 10:20:37   

No simple_smile

It's a flaw in the relational model, or better: something that is not possible in the relational model: foreign key constraints which are only valid when a given field has a given value.

This means that you first have to interpret the PartType in assembly, then load the right item from the table belonging to the VALUE of PartType.

Frans Bouma | Lead developer LLBLGen Pro
tprohas
User
Posts: 257
Joined: 23-Mar-2004
# Posted on: 26-Jul-2004 19:57:17   

OK, I've been thinking about this for a while now and think that using your idea of a version table would be the easiest solution I still have a problem with it. The problem is that for the entity part type of Frame the version table could contain more then a hundred different versions/rows for one frame model. This is because each row in the table is dependent on the attibutes; Distributor, DistributorNumber, Color, ColorImage, and a number of size attributes. In one case I have a 112 rows for a single model from a single distributor. Once the frame can be purchased from a second distributor then the number of rows for this model doubles. Now the reason that I have the Distributor and DistributorNumber attributes in the version table is because the DistributorNumber dependent on the Color and Sizes of the frame.

What I have been trying to stay away from duplicating all this data, but because of the problem I have with storing the version of entity that was chosen I can't see a way around using the versioning table and duplicating the data. Do you think I will have any problems with this amount of data duplication?

Thanks

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39797
Joined: 17-Aug-2003
# Posted on: 27-Jul-2004 16:55:37   

Well, the combination of 2 or more attributes creates in your database new facts. (Wheel ABC is available in colors .... and sizes....) To identify these new facts, you can either give them a new ID or re-use the attributes making the facts to be used with new relations where these facts participate in. (for example if you want to identify a wheel of a given color of a given size, you either specify the wheelid, colorid and sizeid or an ID which is stored in the db identifying that combination of wheel type, color and size).

If you want to avoid data duplication, you have to create a new ID for the combination of attributes to identify those unique combinations in other relations.

Frans Bouma | Lead developer LLBLGen Pro