Normalization - Correct 1NF design for this table?

HerrSober

New Member
Joined
Aug 30, 2013
Messages
40
Hi,

I want to design a table with the sales for further use in my database.

I am only displaying two types of Products here, but in reality it could be about 20.

What would be the correct table design in according to the first normal form?


Code:
Date	         Department	Seller	   SummaryField   Volume
01.01.2015        East           Ross             Apple          4
01.01.2015        East            Ross            Banana        4

or:

Code:
Date	            Department	       Seller	        Apple	        Banana
01.01.2015	   East	               Ross	        4	             4
01.01.2015	   East	               Hardmann	        4	             2
01.01.2015	   East	               Pearson	        5	             5
01.01.2015	  West	              Specter	        6	             3

Thank you very much for your help!
 

Excel Facts

Lock one reference in a formula
Need 1 part of a formula to always point to the same range? use $ signs: $V$2:$Z$99 will always point to V2:Z99, even after copying
Hi,

Neither alternative is in first normal form. The rows require a unique key to comply.
 
Upvote 0
you'd have to have a Department table with and ID and DeptName (East or West)
you'd have to have a Seller table with and ID and SellerName (Ross or Hardmann)
you'd have to have a Product table with and ID and ProductName (Banana or Apple)

then you'd put the ID's into the first table instead of the names like you have now
 
Upvote 0
HerrSober,

Can I clarify whether you actually mean "First Normal Form" when you say 1NF?

I ask because people usually assume at least Third Normal Form when designing a database.

To put the question another way: Are you trying to design a database or is this a theoretical question about First Normal Form as distinct from Second, Third, Fourth etc Normal Forms?
 
Upvote 0
This is quite good as an introduction: Teach-ICT AS Level ICT OCR exam board - normalisation, first normal form 1NF, second normal form 2NF, third normal form 3NF

The problem with your example is that I can't tell what the rows are supposed to represent.

In general, a table should be created for a collection of things that are all similar. So you might have one table for suppliers , another for order headers and another for order details (line items). As such, each row should have a unique key (Primary Key). This is because relational databases do not recognize either order of rows or order of attributes (fields). Imagine everything jumbled up in a heap and you can only locate a record using the Primary Key.

If your table is of Products as you suggest then it should contain information about the Product only. If Apple is a product then that should be the key and there should be another table with Suppliers or Purchasers in it with links to the Product table. If the table is actually Orders then there needs to be an order number. Otherwise if you sell something to the same person twice in one day things will get complicated.

(Imagine if you sell 3 apples to company A and 5 Apples to company A on the same day. What do you do if company A calls you to say it received 4 apples? Do you send another or ask for one back?)
 
Upvote 0
To put the question another way: Are you trying to design a database or is this a theoretical question about First Normal Form as distinct from Second, Third, Fourth etc Normal Forms?

I try to design the input data which goes into the database:

My main problem is that the summary data I am working with comes in a vast variety of formats which I cannot upload to my Access database without modifying it first. I therefore need to set a standard input format and I now up with the one format below. After dicovering the a book by C.J. Date which I obviously haven't understood yet, I thought this format could be called 1NF (First Normal Form).

As I see it this input data cannot be further divided as all the four attributes: "Date, Department", "Seller" and "SummaryField" makes the SumID, so the only change I can see for this table is to include more fruits per record.

Code:
SumID   Date	         Department	Seller	   SummaryField      Volume
  1     01.01.2015        East           Ross             Apple           4
  2     01.01.2015        East           Ross             Banana         4

Maybe input data cannot follow 1NF since normalization first can happen after the data is in the database?

As for optimal input format for later processing/normalization: Does it look better to have the summary fields like in the example above or should it rather be spread over columns?

I hope you get a picture of what I am trying to achieve.
 
Upvote 0
@RickXL
Hi,
Neither alternative is in first normal form. The rows require a unique key to comply.
Not true, since Date, Department, Seller, SummaryField is a candidate key and uniquely identifies a record. The first alternative is 1NF (at least). Just so you know :) Remember that a key does not need to be a single column value.

@HerrSober
Date, Department, Seller, SummaryField, Volume is best. Do not spread the values over columns. This has to do with the design of your tables, and you want to avoid repeating fields -- in this case, "Banana, Apple, Orange, Mango, Plum, Apricot .... " which all belongs in one field "TypeOfFruit". From a practical perspective, it also makes sense. If another fruit is added (let's say, "Pear") you don't want to have to add a whole new column to the database just to accomodate the new TypeOfFruit. Your queries will also be easier to write and maintain, which ultimately ensures the correctness of your queries.

PS:
Maybe input data cannot follow 1NF since normalization first can happen after the data is in the database?
As far as input data is concerned, take shortcuts if you need to. But don't let it dictate the structure of your database. Get the input data, but then transform it so that it can be stored in an optimal way that conforms to standards for relational databases. if you have control over how input data is taken in, then let it be optimized so that it's fast and ends up in good form. Remember that one of the main concerns of input data is to make sure that the data is valid. We can generally work with good data, whatever the form. We cannot work with garbage.

For example, you said:
As I see it this input data cannot be further divided as all the four attributes: "Date, Department", "Seller" and "SummaryField" makes the SumID, so the only change I can see for this table is to include more fruits per record.

Code:

Code:
SumID   Date	         Department	Seller	   SummaryField      Volume
  1     01.01.2015        East           Ross             Apple           4
  2     01.01.2015        East           Ross             Banana         4

But why add columns? You can add another row (let's say again, for "Pear") so easily:
Code:
SumID   Date	         Department	Seller	   SummaryField      Volume
  1     01.01.2015        East           Ross             Apple           4
  2     01.01.2015        East           Ross             Banana          4
  3     01.01.2015        East           Ross             Pear            4
 
Last edited:
Upvote 0
Hi,

It makes quite a difference whether you are talking about a relational database table's contents of the contents of an input screen. A database system comes in three parts:

1. The database. That is mainly what CJ Date will be describing.
2. The input screens.
3. The output reports.

A spreadsheet combines all those steps into one. In fact, many people concentrate entirely on steps 2 and 3 and ignore the fact that step 1 is the most important to get right.

The reason I can't give you a straight answer about the apples and bananas is that it depends on what the record is supposed to mean. Is it compulsory that a record will always have a value for both apples and bananas or not?

The database should be normalized. The input screens not so much. Typically they come in multiple parts so you might have a header and then some detail items. For instance, when you enter an order into a system the input screen will usually contain a header with details of the order number, date, supplier, required date, total cost and carriage charges. It will also have separate lines for order items. So if you were selling fruit, you would place apples on one line with a quantity and bananas on another line with a quantity etc.

When the form had been completed a program would take that data and store it in the right tables in the database. Those tables would be normalized and they would be in at least third normal form. While CJ Date has written two volumes called "An Introduction to Database Systems" you can sum up the useful rules by realizing that they are trying to minimize data duplication.

For instance, if you have a single header with multiple detail lines you only store the header once. Also, don't repeat unnecessarily anything in the detail record that could be placed in the header. Have a separate table for detail and header.

Another rule makes all the tables have a fixed number of columns. So if you need to add more of something you add rows not columns. So in the database your data should look more like this:

Header Table
Code:
ID     Date             Department           Seller            
1      01.01.2015       East                 Ross          
2      01.01.2015       East                 Hardmann         
3      01.01.2015       East                 Pearson
Detail Table
Code:
ID     Fruit    Qty
1      apple    4
1      banana   4
2      apple    4    
2      banana   2
3      apple    5    
3      banana   5
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top