Skip to content
Advertisement

How do you normalize a table with a multi-valued field?

I want to normalize a staging table into two tables. One of the fields of input data contains multiple values delimited by a semicolon.

DATETIME | FILENAME | MULTIVALUEDFIELD
7-22-18 | somefile.txt | value1;value2;value3

I’m sure you data-wizards know where this is going, but I want this normalized thusly:

ID | FILENAME |
1 | somefile.txt

and

ID | SINGLEVALUES | OTHERTABLEID
1 | value1 | 1
2 | value2 | 1
3 | value3 | 1

Sorry for formatting issues: I didn’t see a way to do tables. Any help would be awesome!

Advertisement

Answer

It sounds like you actually have a many-to-many relationship here; a filename can refer to many singlevalues, and singlevalues can appear in many filenames.

The traditional way to represent that would be with a bridge table that consists of foreign key references to the primary keys of filename and singlevalues, and, if it’s applicable to your design, any additional fields that represent the relationship.

dbo.filename
+-------------+---------------+
| filename_sk |   filename    |
+-------------+---------------+
|           1 | somefile.txt  |
|           2 | otherfile.txt |
+-------------+---------------+
dbo.value
+----------+--------+
| value_sk | value  |
+----------+--------+
|        1 | value1 |
|        2 | value2 |
|        3 | value3 |
+----------+--------+
dbo.bridge
+-------------+----------+
| filename_sk | value_sk |
+-------------+----------+
|           1 |        1 |
|           1 |        2 |
|           2 |        2 |
|           2 |        3 |
+-------------+----------+
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement