Data that I wanted to download was about 45,000,000 rows and 38 columns in Oracle 11g database.
When I tried to download the data with chunksize=50000 in python,
DatabaseError: ORA-01555: snapshot too old
this error occur always at the same point of the chunk(code(1)_at 9th session).
However, when I downloaded the same data separately by date, the data downloaded without any error.(code(2))
At the first time, I guessed “DatabaseError: ORA-01555” this error was because downloading by chunksize 50000 might take too long. But data for one day had about 300,000 rows which was bigger than 50000 chunksize. Can anyone tell me about your opinion?
Below are my codes.
(1) Download by chunk size
# SQL Query tmp = pd.read_sql_query("SELECT * FROM Database WHERE TIME_STAMP BETWEEN TO_TIMESTAMP('2019-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND TO_TIMESTAMP('2019-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')", con = con, chunksize = 50000) gen = map(pd.DataFrame, tmp) # Download by chunk cnt = 0 flag = 0 for index, data in enumerate(gen): if flag == 0 or index == 0: df = pd.DataFrame(data) flag = 1 elif index % 100 != 0: df = pd.concat([df,pd.DataFrame(data)], axis = 0) elif index % 100 == 0: df = pd.concat([df,pd.DataFrame(data)], axis = 0) cnt = cnt+1 # Save as CSV file df.to_csv("{0}{1}{2}".format("D:\Data",cnt,".csv"), header=True, index=False) print("Index: ", index) print("Shape: ", df.shape) print("==========={}th Session Ended==============".format(cnt)) # Flush out Memory del df gc.collect() flag = 0
(Always at 9th Session, the error came out)
(2) Download by date
for i in range(1,32): print("============ 2019-12-",i," ============") date = "2019-12-{}".format(i) # SQL Query tmp = pd.read_sql_query("SELECT * FROM Database WHERE TIME_STAMP BETWEEN TO_TIMESTAMP('2019-12-{} 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND TO_TIMESTAMP('2019-12-{} 23:59:59', 'YYYY-MM-DD HH24:MI:SS')".format(i,i), con = con) # Save as CSV file tmp.to_csv("{0}{1}{2}{3}".format("D:\","ByDate\",date,".csv"), header=True, index=False) # Flush out Memory del tmp gc.collect()
Advertisement
Answer
When you launch your query, Oracle must keep a logically consistent view of it from that moment in time until it finishes responding. As the underlying data is changing through user transactions, Oracle maps the deltas in rollback segments and undo space. If too much data changes and the undo space is exhausted Oracle can no longer maintain that consistent view of your requested data and has to abandon the query – thus the error you’re seeing. See here: http://www.dba-oracle.com/t_ora_01555_snapshot_old.htm
Unless you can modify the database to allow more undo space for transactions you’re likely going to have to wait to run your query when the database is idle or take your data in much smaller chunks.