Skip to content
Advertisement

Search for and reverse engineer foreign key relationships in DB without foreign keys defined

I’ve inherited a really old database with about 100 tables. I know there are foreign key relationships by looking through tables, but I’ve also determined that there are no relationships actually defined in the database. Further, there are no consistent naming conventions.

I’ve tried just working it out by looking at tables and trying joins, but it has been time consuming and I just don’t have that much time. So now I’m working on some automated ideas.

Has anyone ever tried to run a query on a database to find foreign keys?

My thoughts on the general logic of the query would be this:

For each table: If it has a primary key, check to see if that key name exists in all other tables. If so, do a check to see if there is a high percentage match between the two sets of IDs. If so, output this as a potential FK relationship. Specify if it is one to one or one to many.

If the same name doesn’t exist in the other tables, try searching all tables for fields with exactly the same data type. Try to see if there is a high percentage match in the two sets of IDs. If so, output this as a potential FK relationship. Specify if it is one to one or one to many.

I know this will likely lead to a lot of false positives, but it would be better than manually searching.

Is my logic sound here, or am I completely off base in trying to automate this search?

My eventual goal is to generate a ER diagram that I can use when building some new queries.

Using MSSQL

Advertisement

Answer

Let’s look for pairs:

SELECT * FROM
  information_schema.columns cl
  INNER JOIN
  information_schema.columns cr
  ON
    cl.table_name < cr.table_name AND
    cl.data_type = cr.data_type

This generates a list of every column in every table crossed with every other column in other tables, where the data types are equal. Hopefully using < on the table names means that tableA.somenumber will be paired with tableB.someothernumber but not the reverse (unless you actually want the reverse) – there isn’t much point asking the db how many values in a equal b and then later asking how many in b equal a

Now let’s make it write an SQL:

SELECT 
  REPLACE(REPLACE(REPLACE(REPLACE(
    'SELECT ''{Ltable}.{Lcol}'' as lefty, ''{Rtable}.{Rcol}'' as righty, count(l.{Lcol}) as countLefty, count(r.{Rcol}) as countRighty, case when count(r.{Rcol}) = 0 then 0 else count(l.{Lcol})/count(r.{Rcol}) end as percenty
     FROM {Ltable} l LEFT JOIN {Rtable} r ON l.{Lcol} = r.{Rcol} UNION ALL',
    '{Ltable}', cl.table_name),
    '{Rtable}', cr.table_name),
    '{Lcol}', cl.column_name),
    '{Rcol}', cr.column_name)
FROM
  information_schema.columns cl
  INNER JOIN
  information_schema.columns cr
  ON
    cl.table_name < cr.table_name AND
    cl.data_type = cr.data_type

If you run this SQL, it will generate a results grid that has an SQL in each row- copy it out the grid and paste it back into a query editor, remove the ultimate UNION ALL and then run it

On my reasonably small DB of 90 tables with ~8 columns in each it generated 62,000 combinations it wanted to do; use with caution.. Or set it going one night on the backup server and come back the next day

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement