Skip to content
Advertisement

Shapiro-Wilk test

In PL/SQL, I want to calculate the Shapiro-Wilk value of my selected data.

Apparently, the function DBMS_STAT_FUNCS.normal_dist_fit is able to do it, but it takes a table name as a parameter, and not the result of a select.

The values I am working with are numbers, selected after many inner joins. Here is a sample of my values for a typical test (each value are only one number, I only regrouped them here for readability) :

-1,1168954372406006 -1,0339429378509521 -1,0194162130355835 -,9636680483818054 -,9536418914794922 -,9111286401748657 -,851311981678009 -,8325300216674805 -,8051750063896179 -,738100528717041 -,7174761295318604 -,6651638746261597 -,663613498210907 -,6444216966629028 -,6267942190170288 -,6180349588394165 -,6027824282646179 -,5999149084091187 -,5908389687538147 -,590206503868103 -,5845686793327332 -,5831132531166077 -,5629676580429077 -,5572993159294128 -,5481508374214172 -,5445670485496521 -,5093156099319458 -,49224603176116943 -,47992199659347534 -,4758097231388092 -,4546264410018921 -,38799363374710083 -,3803306519985199 -,3734436631202698 -,36905646324157715 -,27978914976119995 -,2048187553882599 -,16521787643432617

Advertisement

Answer

You could create a view based on your query. (“View” just means a named query, whose code is saved for future use; a view does not contain any data. So even if you would rather not create tables on a production server, it may be OK to create views.) Then you can use this view in a call to the goodness-of-fit procedure.

First let’s clarify that DBMS_STAT_FUNCS.normal_dist_fit is not a function, as might be suggested by the FUNCS in the package name (and as you say in your post), but a procedure. In fact, all the subprograms in that package are procedures!

The documentation for the package is wrong.

https://docs.oracle.com/database/121/ARPLS/d_stat_f.htm#ARPLS68476

It says that mean and stddev are IN parameters. They are not; they are OUT parameters. It is not really clear why they are needed in this procedure (there are separate functions to compute them), but again, that is not my choice, it’s Oracle’s.

Here is a brief example of using the normal_dist_fit procedure on a physical table (from a different schema: the standard HR schema). Note that I must have the select any table privilege, or some other privilege that allows me to select from tables in the HR schema. (I am not logged in as HR, but as another user, MATHGUY.)

declare
  mn number;
  sd number;
  sw number;
begin
  dbms_stat_funcs.normal_dist_fit('HR', 'EMPLOYEES', 'SALARY',
                                  'SHAPIRO_WILKS', mn, sd, sw);
end;
/

W value : .8739562109117848523112862359519603805821

PL/SQL procedure successfully completed.

(Note that the argument for the Shapiro-Wilk test is ‘SHAPIRO_WILKS’, with an S at the end; perhaps Oracle had a good reason for this, although I doubt it.)

Now, let’s say in my schema I have this query, which produces a column of numbers:

select salary * (1 + nvl(commission_pct, 0)) as total_compensation 
from   hr.employees
;

This computes total compensation, including commission (for those employees who have one; the NVL call is needed because for employees without a commission, the percentage is shown as NULL rather than zero).

I will create a view based on this query… but I will need to have the needed privileges, and there is just a slight complication.

So far I assumed I can select from a table in the HR schema; indeed, I have the SELECT ANY TABLE privilege. But, in my case, I have that privilege through a role. That’s OK for selecting, but not OK for creating a view based on that SELECT. To create a view, I must have SELECT ANY TABLE (or a much weaker grant: to select specifically from that one table, or perhaps restricted even further to just the two columns I need) granted to me directly, not through a role. This is something you need to discuss with your DBA; it’s a privilege you already have (otherwise your existing SELECT wouldn’t work), but if you have it through a role, now you will need to have it granted directly to you. And, of course, you must have the CREATE VIEW privilege.

OK, so suppose you have the right privileges. Then you can do this:

Create the view

create or replace view my_emp(total_comp) 
as 
    select salary*(1 + nvl(commission_pct, 0)) from hr.employees;

Invoke the procedure

declare
  mn number;
  sd number;
  sw number;
begin
  dbms_stat_funcs.normal_dist_fit('MATHGUY', 'MY_EMP', 'TOTAL_COMP', 
                                  'SHAPIRO_WILKS', mn, sd, sw);
end;
/

W value : .8852586932906502861798487994791857389177
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement