Skip to content
Advertisement

Any suggestion for optimizing BCP export from SQL Server using Python Subprocess

I’m studying BCP for export large amount of data (one time for initial and plan to do it in day-to-day job).

Source data are in SQL Server tables, which consist of some small tables to larger ones (10M+ rows). Destination is in another machine (export to files).

Currently, I’m implementing it using python subprocess.

By using BCP command without specified batchsize (queryout, -U, -P, -S, -c).
And the query is super straightforward (SELECT <column_names> FROM <table_name>). Maybe adding WHERE dates in day-to-day job.

I have tried with 100k data, it took around 2 minutes. However, I haven’t tried with 10M+ data, due to my company’s restriction to use production data in development environment. Also, I couldn’t insert any data into the source SQL Server (only read access).

Could anyone please suggest that is there any ways to optimize the BCP export process?

My understanding is that it should be able to make it better, since I did it in very straightforward way.

Thank you so much.

Advertisement

Answer

If you are moving data from one SQL Server to another MS SQL Server, then using the -N option to copy data to your files in native format will help reduce time to convert data types to text.

Using the -a option to specify network packet size. I cannot suggest a proper value here as this will depend on your network (is the file going to disk that is distributed from the server? if so, then try some different values here… if not, don’t bother… not network involved).

Use the -b option when importing data into your destination. I cannot suggest a proper value here as this will depend on your system architecture, but play with this value in testing to get a refined value. This does not work with export.

When exporting a large table, thread out the copy to multiple files. Hopefully your large tables have a numeric key or some numeric value that has a high selectivity. This value can be used to partition your data into 10 or 100 threads. This will allow you to execute multiple bcp commands at the same time pulling from the same table. Use the “queryout” option and a command like:

“select * from db.dbo.mytable where key % 10 = 0”
to get 1/10th of the data and:

“select * from db.dbo.mytable where key % 10 = 1”
to get the next or another 1/10th of the data.

Execute as many at the same time as your source server can withstand. This is great for speeding up a copy out, but be careful on loading into the destination. You wont be able to run as many together. This will likely be your biggest gain in performance. To get as many BCP commands running as your source server can withstand.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement