I need to parse an HTML table:
<table> <tbody> <tr class="blue"><td>code</td><td>rate</td></tr> <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr> <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr> <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr> <tr class="blue"><td>some comment</td><td>some comment</td></tr> <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr> </tbody> </table>
and as a result to get:
+------+----+----+------+ |class |code|em |value | +------+----+----+------+ |gray_1|USD | 1| 11.11| |gray_2|AUD | 1| 22.22| |gray_9|IRR | 100| 33.33| |gray_1|EUR | 1| 44.44|
I tried to do it like this:
with tbl as ( select xmltype(' <table> <tbody> <tr class="blue"><td>code</td><td>rate</td></tr> <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr> <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr> <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr> <tr class="blue"><td>some comment</td><td>some comment</td></tr> <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr> </tbody> </table> ') xml_data from dual ) select * from tbl, xmltable('//table/tbody/tr' passing tbl.xml_data columns data varchar2(128) path './td' )
But this causes an exception:
ORA-19279: XPTY0004 – XQuery dynamic type mismatch: expected singleton sequence – got multi-item sequence
What do I need to do to fix my code?
Advertisement
Answer
Your path
is looking for a td
under the tr
; but there are two, hence the “got multi-item sequence” error you’re seeing. You can reference each td
tag by its position, as td[1]
etc. It’s very reliant on the table structure being as expected though.
With this specific example you can do:
with tbl as ( select xmltype(' <table> <tbody> <tr class="blue"><td>code</td><td>rate</td></tr> <tr class="gray_1"><td><span>USD</span><em>1</em></td><td>476.16</td></tr> <tr class="gray_2"><td><span>AUD</span><em>1</em></td><td>327.65</td></tr> <tr class="gray_9"><td><span>IRR</span><em>100</em></td><td>1.13</td></tr> <tr class="blue"><td>some comment</td><td>some comment</td></tr> <tr class="gray_1"><td><span>EUR</span><em>1</em></td><td>526.54</td></tr> </tbody> </table> ') xml_data from dual ) select x.class, x.currency, x.amount, to_number(x.rate) as rate from tbl cross join xmltable('/table/tbody/tr' passing tbl.xml_data columns class varchar2(10) path '@class', currency varchar2(3) path 'td[1]/span', amount number path 'td[1]/em', rate varchar2(50) path 'td[2]' ) x where x.currency is not null
which gets:
CLASS CUR AMOUNT RATE ---------- --- ---------- ---------- gray_1 USD 1 476.16 gray_2 AUD 1 327.65 gray_9 IRR 100 1.13 gray_1 EUR 1 526.54
It won’t take much variation in the HTML to break it though. See this answer for some reasons it is fragile, and why it generally considered unwise to try to parse HTML as XML.