I’ve inherited a really old database with about 100 tables. I know there are foreign key relationships by looking through tables, but I’ve also determined that there are no relationships actually defined in the database. Further, there are no consistent naming conventions.
I’ve tried just working it out by looking at tables and trying joins, but it has been time consuming and I just don’t have that much time. So now I’m working on some automated ideas.
Has anyone ever tried to run a query on a database to find foreign keys?
My thoughts on the general logic of the query would be this:
For each table: If it has a primary key, check to see if that key name exists in all other tables. If so, do a check to see if there is a high percentage match between the two sets of IDs. If so, output this as a potential FK relationship. Specify if it is one to one or one to many.
If the same name doesn’t exist in the other tables, try searching all tables for fields with exactly the same data type. Try to see if there is a high percentage match in the two sets of IDs. If so, output this as a potential FK relationship. Specify if it is one to one or one to many.
I know this will likely lead to a lot of false positives, but it would be better than manually searching.
Is my logic sound here, or am I completely off base in trying to automate this search?
My eventual goal is to generate a ER diagram that I can use when building some new queries.
Using MSSQL
Advertisement
Answer
Let’s look for pairs:
SELECT * FROM information_schema.columns cl INNER JOIN information_schema.columns cr ON cl.table_name < cr.table_name AND cl.data_type = cr.data_type
This generates a list of every column in every table crossed with every other column in other tables, where the data types are equal. Hopefully using < on the table names means that tableA.somenumber
will be paired with tableB.someothernumber
but not the reverse (unless you actually want the reverse) – there isn’t much point asking the db how many values in a equal b and then later asking how many in b equal a
Now let’s make it write an SQL:
SELECT REPLACE(REPLACE(REPLACE(REPLACE( 'SELECT ''{Ltable}.{Lcol}'' as lefty, ''{Rtable}.{Rcol}'' as righty, count(l.{Lcol}) as countLefty, count(r.{Rcol}) as countRighty, case when count(r.{Rcol}) = 0 then 0 else count(l.{Lcol})/count(r.{Rcol}) end as percenty FROM {Ltable} l LEFT JOIN {Rtable} r ON l.{Lcol} = r.{Rcol} UNION ALL', '{Ltable}', cl.table_name), '{Rtable}', cr.table_name), '{Lcol}', cl.column_name), '{Rcol}', cr.column_name) FROM information_schema.columns cl INNER JOIN information_schema.columns cr ON cl.table_name < cr.table_name AND cl.data_type = cr.data_type
If you run this SQL, it will generate a results grid that has an SQL in each row- copy it out the grid and paste it back into a query editor, remove the ultimate UNION ALL and then run it
On my reasonably small DB of 90 tables with ~8 columns in each it generated 62,000 combinations it wanted to do; use with caution.. Or set it going one night on the backup server and come back the next day