Edit: Solved! thanks to @kfinity.
AskTom suggests using select /*+ opt_param(‘_optimizer_use_feedback’ ‘false’) */ at the start of your query to disable the feedback usage. This has fixed the problem for me.
tldr: Adding a randomized comment to a query makes it run consistent, removing this comment breaks it.
Warning: long.
Environment
My preferred way of working is with queries in the source as string so that they are in version control and I can see changes over time. Along with that I use dapper
and the oracle.ManagedDataAccess
NuGet package. The application in question is a WPF app (in both cases) running on .NET framework 4.7.2. I’m using Visual studio professional 2017 15.9.5.
The problem
About a year back I encountered this problem with a query. I don’t remember which it was, I do know that I didn’t have the time to document it and post it here. I do now and I ran into the same problem. Back then I somehow figured out that if I restarted my PC or changed the query text, it would run fine again. Just occasionally the symptoms of the problem would show up, I’d add a comment to the query, or remove the one previously there, I would test that specific function every release. I’d test it everytime because if it was faulty on my machine, it would also be faulty on the target user machine. At the time I figured it was a driver/hardware issue with my pc. The way I figured out I could fix it by changing the query text is because I would cut and paste (ctrl-x ctrl-v
) the entire query from the code to Oracle developer
and edit it there. At some point I noticed even an extra whitespace or enter would make it work again.
Back to now, I’ve got the problem again. This time it’s different because it doesn’t fail occasionally anymore. It’s very consistent. Thinking back about how I figured this was a driver/hardware issue, I build and ran the application on 3 different machines, all the same results. I can run the query in Oracle developer
, using ctrl + end
to run the entire query, not just 50 lines. It runs in about 10 seconds. It runs consistently, over and over again, which makes sense if nothing changes.
If I run it from my application, it runs fine – but only once. It takes about 10 seconds as well. If I refresh the data, which runs the query again, it hangs. There are no exceptions, if I break the debugger it’s just chilling on the database.Query<>()
call. If I either restart my PC or change the query, it will run – exactly once.
v$session_longops/full table scan
Of course, I went to google for help. Found some interesting articles that didn’t really help me along:
Oracle inconsistent performance behaviour of query
Until I found this:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1191435335912
They mention v$session_longops
which supposedly gives us insight into long running operations.
select * from v$session_longops where time_remaining > 0
I ran this while I had just refreshed the data in the application, causing the query to run for a second time. I see this:
The first query ran fine, indexes where fine. The 2nd time it started up a full table scan. It doesn’t need this because it runs fine in the first time and in oracle developer
as well. As expected, if I leave it running (takes over 20 minutes) the table scan completes and I get the same result as the first time around.
I found that you can use explain plan for
to explain a query plan without using the GUI in Oracle developer
. This gave me 2 wildly different plans, the one in Oracle developer
always has this note: - 'PLAN_TABLE' is old version
. So I’m not sure that I can trust this information, I don’t know what to do with it.
The fix
Like I said before, adding or removing a comment or rather changing the query text fixes the problem and doesn’t start a full table scan, ever. I added a comment containing DateTime.Now
to the query so that it is always different. It consistently works. While technically this fixes the problem I think it is quite a ridiculous fix to an even more ridiculous problem. I’d rather know why this happens and if maybe I’m doing something else wrong. So the question is, why does a randomized comment fix my query?
The code
Some context: this is an ERP system, the query gets all the workorders that have a hierarchic structure, or that are just on themselves, combines them, then adds their required materials and some other information like their description. It only does this for workorders that are not closed yet.
SQL:
select --Hierarchic info, some aliases exceeded 30 chars and had to be shorted to current state hierarchic_workorders.ccn as HierarchicCcn , hierarchic_workorders.mas_loc as HierarchicMasLoc, hierarchic_workorders.wo_num as HierarchicWoNum, hierarchic_workorders.wo_line as HierarchicWoLine, wo.item as HierarchicItem, wo.revision as HierarchicRevision, wo_item.description as HierarchicDescription, wo_rtg.wc as HierarchicWorkCenter, hierarchic_workorders.startdate as HierarchicStartDate, hierarchic_workorders.mfgclosedate as HierarchicMfgClosedDate, hierarchic_workorders.chassisnumbers as HierarchicChassisNumbers, hierarchic_workorders.wo_level as HierarchicLevel, hierarchic_workorders.parent_wo_num as HierarchicParentWoNum, hierarchic_workorders.parent_wo_line as HierarchicParentWoLine, hierarchic_workorders.parent_wo_bom_useq as HierarchicParentwobomuseq, --wo bom info wo_bom.ccn as WoRtgCcn, wo_bom.mas_loc as WoRtgMasloc, wo_bom.wo_num as WoRtgWoNum, wo_bom.wo_line as WoRtgWoLine, wo_bom.wo_bom_useq as WoRtgWobomUseq, wo_bom.item as WoRtgItem, wo_bom.revision as WoRtgRevision, wo_bom_item.description as WoRtgDescription, wo_bom.bom_comp_qty as WoRtgBomCompQty, wo_bom.bom_commit as WoRtgCommit, wo_bom.backflush as WoRtgBackflush, wo_bom.issue_qty as WoRtgIssueQty, wo_bom.commit_qty as WoRtgCommitQty, wo_bom.reqd_qty as WoRtgReqdQty from live.wo_bom --=========================================================================================================================================================================== -- Maybe it's possible to remove this or the other min operation join in hierarchic_workorders, to make it faster - not sure if possible ==================================== --=========================================================================================================================================================================== left join( select wo_rtg_min_operation.min_operation, wo_rtg.* from live.wo_rtg left join( select ccn, mas_loc, wo_num, wo_line, lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation from live.wo_rtg group by ccn, mas_loc, wo_num, wo_line )wo_rtg_min_operation on wo_rtg_min_operation.ccn = wo_rtg.ccn and wo_rtg_min_operation.mas_loc = wo_rtg.mas_loc and wo_rtg_min_operation.wo_num = wo_rtg.wo_num and wo_rtg_min_operation.wo_line = wo_rtg.wo_line ) wo_rtg on wo_rtg.ccn = wo_bom.ccn and wo_rtg.mas_loc = wo_bom.mas_loc and wo_rtg.wo_num = wo_bom.wo_num and wo_rtg.wo_line = wo_bom.wo_line --This case when is painfully slow but it can't really be cached or indexed and wo_rtg.operation = ( case when wo_bom.operation = ' ' then wo_rtg.min_operation else wo_bom.operation end ) --=========================================================================================================================================================================== -- Find all open MPS orders and highest hierarchic PRP orders. Having these be a subquery instead of the starting data drastically increases performance ======================== --=========================================================================================================================================================================== join( select ccn, mas_loc, wo_num, wo_line, startdate, mfgclosedate, chassisnumbers, wo_level, parent_wo_num, parent_wo_line, parent_wo_bom_useq from ( --=========================================================================================================================================================================== -- PRP ====================================================================================================================================================================== --=========================================================================================================================================================================== select 'PRP' as type, wowob.ccn, wowob.mas_loc, wowob.wo_num, wowob.wo_line, apssplit_min_operation.operation_start_date as startdate, wo.mfg_close_date as mfgclosedate, trim( trim(wo.user_alpha2) || ' ' || trim(wo.user_alpha3) || ' ' || trim(wo.chassis3) || ' ' || trim(wo.chassis4) || ' ' || trim(wo.chassis5) ) as chassisnumbers, level as wo_level, wowob.parent_wo_num, wowob.parent_wo_line, wowob.parent_wo_bom_useq from live.wowob join live.wo on wo.ccn = wowob.ccn and wo.mas_loc = wowob.mas_loc and wo.wo_num = wowob.wo_num and wo.wo_line = wowob.wo_line left join( select ccn, mas_loc, orderident, order_line, lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation, operation_start_date from live.apssplit where schedule = 'SHOP' and order_type = 'W' group by ccn, mas_loc, orderident, order_line, operation_start_date ) apssplit_min_operation on apssplit_min_operation.ccn = wowob.ccn and apssplit_min_operation.mas_loc = wowob.mas_loc and apssplit_min_operation.orderident = wowob.wo_num and apssplit_min_operation.order_line = wowob.wo_line --Only select open wo's --Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason. where apssplit_min_operation.operation_start_date is not null and apssplit_min_operation.operation_start_date < sysdate + :days_ahead --wo.mfg_close_date is null and --wo.fin_close_date is null and --wo.ord_qty - wo.scrap_qty - wo.complete_qty > 0 --and wo.start_date < sysdate + :days_ahead --and wowob.wo_num = ' 334594' --Grab the childs of only the highest parents. connect by prior wowob.ccn = wowob.ccn and prior wowob.mas_loc = wowob.mas_loc and prior wowob.wo_num = wowob.parent_wo_num and prior wowob.wo_line = wowob.parent_wo_line start with wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line in ( --Subquery to select all the highest hierarchic wowob's that are still open in wo. --Performance: --all: 21253 in ? --Open only: 174 in 0.155 seconds select wowob.ccn || wowob.mas_loc || wowob.wo_num || wowob.wo_line as wowob_key from live.wowob --Parent join left join live.wowob parent_wowob on wowob.ccn = parent_wowob.ccn and wowob.mas_loc = parent_wowob.mas_loc and wowob.parent_wo_num = parent_wowob.wo_num and wowob.parent_wo_line = parent_wowob.wo_line --end parent join where wowob.ccn = :ccn and wowob.mas_loc = :mas_loc and parent_wowob.ccn is null ) union all --=========================================================================================================================================================================== -- MPS ====================================================================================================================================================================== --=========================================================================================================================================================================== select 'MPS' as type, wo.ccn, wo.mas_loc, wo.wo_num, wo.wo_line, apssplit_min_operation.operation_start_date as startdate, wo.mfg_close_date as mfgclosedate, trim( trim(wo.user_alpha2) || ' ' || trim(wo.user_alpha3) || ' ' || trim(wo.chassis3) || ' ' || trim(wo.chassis4) || ' ' || trim(wo.chassis5) ) as chassisnumbers, 1 as wo_level, '' as parent_wo_num, '' as parent_wo_line, '' as parent_wo_bom_useq from live.wo join live.item_ccn on item_ccn.ccn = wo.ccn and item_ccn.item = wo.item and item_ccn.revision = wo.revision and item_ccn.mastsched = 'Y' --mps and item_ccn.planned = ' ' --mrp and item_ccn.prp = ' ' --NOT prp... left join( select ccn, mas_loc, orderident, order_line, lpad(to_char(min(to_number(trim(operation)))), 4, ' ') as min_operation, operation_start_date from live.apssplit where schedule = 'SHOP' and order_type = 'W' group by ccn, mas_loc, orderident, order_line, operation_start_date ) apssplit_min_operation on apssplit_min_operation.ccn = wo.ccn and apssplit_min_operation.mas_loc = wo.mas_loc and apssplit_min_operation.orderident = wo.wo_num and apssplit_min_operation.order_line = wo.wo_line --Only select open wo's --Underlying wo's obviously have to start BEFORE their parents, we don't have to check them all for this reason. where apssplit_min_operation.operation_start_date is not null and apssplit_min_operation.operation_start_date < sysdate + :days_ahead ) order by startdate ) hierarchic_workorders on hierarchic_workorders.ccn = wo_bom.ccn and hierarchic_workorders.mas_loc = wo_bom.mas_loc and hierarchic_workorders.wo_num = wo_bom.wo_num and hierarchic_workorders.wo_line = wo_bom.wo_line --=========================================================================================================================================================================== -- Descriptions from wo. wowob and wo_bom are different items and they have different descriptions. ========================================================================= --=========================================================================================================================================================================== left join live.wo on wo.ccn = hierarchic_workorders.ccn and wo.mas_loc = hierarchic_workorders.mas_loc and wo.wo_num = hierarchic_workorders.wo_num and wo.wo_line = hierarchic_workorders.wo_line left join live.item wo_item on wo_item.item = wo.item and wo_item.revision = wo.revision left join live.item wo_bom_item on wo_bom_item.item = wo_bom.item and wo_bom_item.revision = wo_bom.revision
C# (does NOT work):
using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString)) { DynamicParameters parameters = new DynamicParameters(); parameters.Add("ccn", ccn); parameters.Add("mas_loc", masLoc); parameters.Add("days_ahead", daysAhead); return database.Query<HierarchicWoWoBom>(Query, parameters).ToList(); }
C# (does work consistently):
using (IDbConnection database = new OracleConnection(_applicationSettings.OracleConnectionString)) { DynamicParameters parameters = new DynamicParameters(); parameters.Add("ccn", ccn); parameters.Add("mas_loc", masLoc); parameters.Add("days_ahead", daysAhead); return database.Query<HierarchicWoWoBom>($"-- {DateTime.Now}: randomized comment so that this query keeps working.n" + Query, parameters).ToList(); }
Advertisement
Answer
I’m not sure this is really an answer, but it’s too long for a comment.
I think a fast first query followed by a slow second query often indicates a statistics/cardinality feedback issue.
Basically while running the query the first time, the optimizer might detect that the estimated cardinality (number of rows) from the current table/index statistics are pretty inaccurate, so it tries to cache more accurate stats for the next run of the same query. But sometimes that actually makes things much worse.
As a quick fix, AskTom suggests that you can try disabling that feature with the /*+ opt_param('_optimizer_use_feedback' 'false') */
hint, or use SQL Plan Management to save the good plan, as Ted mentioned above.
In the longer term, I think it might indicate that some of your statistics might be stale? You can narrow down the problem stats by doing cardinality tuning and looking for where in the plan the actual rows are a lot higher than the expected rows. The basic process is to run your query with the /*+ GATHER_PLAN_STATISTICS */
hint and then do SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
to see the results.