Skip to content
Advertisement

Selecting into a Table from a Join with Large Amounts of Data

I am trying to join two tables with the results filtered by a date range, and select the created join into a new table. There are two tables: one is about 500,000 records, but the other is about 150 million. I have attempted to run code to extract using an inner join. This usually runs until I receive an error that I have used up all the disc space. I am not sure if the problem lies in the join of the huge table itself or the fact that I am trying to write the results to a new table. Also of note is that the big table is on a linked server as a view.

SELECT
  * INTO New_Table
FROM
  OPENQUERY(
    [Linked_Server],
    'SELECT * FROM [LinkedDB].[Schema].[LinkedServerTable]'
  ) IL
  INNER JOIN [Schema].[OtherTableFromLocalHost] I ON IL.IdColumn = I.IdColumn
WHERE
  I.IDate >= CONVERT(DATE, '1/1/2020')
  AND I.IDate < CONVERT(DATE, '1/31/2020')

Advertisement

Answer

You are doing a distributed join in a transaction across servers. You very much want that join to run in memory on one server. Network is much slower than RAM.

I remember rebooting servers because of things like this. The progression was – try killing the process, try a restart of dtc, try restarting the instance, try rebooting the server.

  1. Do you need all columns from both tables?
  2. How many records are normally selected from [Schema].[OtherTableFromLocalHost]?

If this was executed where the big table was located, I would load the smaller table into a local temp table. Then I would use a local transaction to do the join. You might try doing this with the big table. You need to have space for the 150 million records. It will likely work much faster.

Is there a linked server on the other server to this server? In the remote query, you can load the IdColumn from [Schema].[OtherTableFromLocalHost] using the WHERE clause to a temp table on that server. Then do the join there. Then get the data back into a temp table. Do whatever joins are required with the temp table and then insert into the final destination.

For one process I did create a xml document of values to pass to the remote server. The remote server joined to the xml (nodes stuff) to get the data required without a distributed join.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement