I am using Apache Derby DB (SQL) version 10.14 and this is the Derby Reference Manual: https://db.apache.org/derby/docs/10.14/ref/refderby.pdf
I am trying to create a star schema and am currently working on the region dimension table.
These are the tables I am using: DWH_PRICE_PAID_RECORDS
TRANSACTION_ID PRICE DATE_OF_TRANSFER PROPERTY_TYPE OLD_NEW DURATION TOWN_CITY DISTRICT COUNTY PPDCATEGORY_TYPE RECORD_TYPE {FDD12C8B-5A02-4B3F-8C67-9BC523DC780B} 71000 15.09.2000 00:00 D N F SCUNTHORPE NORTH LINCOLNSHIRE NORTH LINCOLNSHIRE A A {70F7F480-4A9A-4FEB-A58A-2B964605BFD2} 97000 01.08.2002 00:00 F N L MANCHESTER SALFORD GREATER MANCHESTER A A {64D48FA9-8C85-49D6-AF5A-23FABDDB4FEB} 104000 17.01.2006 00:00 S N L ROCHDALE ROCHDALE GREATER MANCHESTER A A {F0316F65-E375-4DC4-BCDF-3FDC054ADE9C} 188500 18.05.2015 00:00 S N F KIDDERMINSTER MALVERN HILLS WORCESTERSHIRE A A {2EC5A85B-7BEF-4127-B3D0-6B416899CAEB} 180000 07.05.1999 00:00 S N F KINGSTON UPON THAMES KINGSTON UPON THAMES GREATER LONDON A A {21E5FEB7-A62E-2439-E050-A8C06205342E} 55000 28.08.2015 00:00 T N F MOUNTAIN ASH RHONDDA CYNON TAFF RHONDDA CYNON TAFF B A {3E0330F0-0F44-8D89-E050-A8C062052140} 77000 30.08.2016 00:00 T N F WALLASEY WIRRAL MERSEYSIDE A A {D43A8B4A-6272-4706-9189-30F8E24EDF13} 210000 23.05.2007 00:00 S N F BRISTOL NORTH SOMERSET NORTH SOMERSET A A {3575DAF5-0E80-408F-9970-FDF5D1475E73} 185000 16.11.2007 00:00 S N F CREWKERNE SOUTH SOMERSET SOMERSET A A {A4246390-61F4-4228-BC82-79D3F369CA34} 32700 12.12.1996 00:00 F N L SOUTHAMPTON SOUTHAMPTON SOUTHAMPTON A A CREATE TABLE "DWH_PRICE_PAID_RECORDS" ("TRANSACTION_ID" VARCHAR(50) NOT NULL, "PRICE" INTEGER, "DATE_OF_TRANSFER" DATE NOT NULL, "PROPERTY_TYPE" CHAR(1), "OLD_NEW" CHAR(1), "DURATION" CHAR(1), "TOWN_CITY" VARCHAR(50), "DISTRICT" VARCHAR(50), "COUNTY" VARCHAR(50), "PPDCATEGORY_TYPE" CHAR(1), "RECORD_TYPE" CHAR(1));
Postcode Eastings Northings Latitude Longitude Town Region UK_Region Country_Code Country_String AB10 392900 804900 5,713,514 -211,731 Aberdeen Aberdeen City Scotland SCT Scotland AB13 385600 801900 5,710,801 -223,776 Milltimber Aberdeen City Scotland SCT Scotland AB14 383600 801100 5,710,076 -227,073 Peterculter Aberdeen City Scotland SCT Scotland AB21 387900 813200 572,096 -220,033 Aberdeen Airport Aberdeen City Scotland SCT Scotland AB22 392800 810700 5,718,724 -211,913 Bridge of Don Aberdeen City Scotland SCT Scotland AB30 370900 772900 5,684,678 -247,712 Laurencekirk Aberdeenshire Scotland SCT Scotland AB31 368100 798300 5,707,479 -252,623 Banchory Aberdeenshire Scotland SCT Scotland AB32 380800 807200 5,715,545 -231,742 Westhill Aberdeenshire Scotland SCT Scotland AB33 355200 815100 5,722,464 -274,203 Alford Aberdeenshire Scotland SCT Scotland AB34 350800 800600 5,709,393 -281,204 Aboyne Aberdeenshire Scotland SCT Scotland CREATE TABLE "DWH_POSTCODES" ("POSTCODE_ID" INTEGER generated always as identity (start with 1 increment by 1), "POSTCODE" VARCHAR(10), "EASTINGS" VARCHAR(50), "NORTHINGS" VARCHAR(50), "LATITUDE" VARCHAR(10), "LONGITUDE" VARCHAR(10), "TOWN" VARCHAR(50) NOT NULL, "REGION" VARCHAR(50), "UK_REGION" VARCHAR(50), "COUNTRY_CODE" VARCHAR(20), "COUNTRY_STRING" VARCHAR(20));
CREATE TABLE "DIM_REGION" ("REGION_ID" INTEGER generated always as identity (start with 1 increment by 1), "TOWN" VARCHAR(30), "COUNTY" VARCHAR(30), "DISTRICT" VARCHAR(30), "LATITUDE" VARCHAR(10), "LONGITUDE" VARCHAR(10), "COUNTRY_STRING" VARCHAR(20));
First I inserted Town(unique), County and District:
INSERT INTO DIM_REGION (TOWN, County, District) SELECT town_city, MAX(county), MAX(district) FROM DWH_PRICE_PAID_RECORDS GROUP BY town_city;
Town has to be unique because I dont have anything else as identifier to match/link the tables. The DIM_REGION table now has 938 unique town records and I want to “UPDATE” the remaining columns “Latitude”, “Longitude” and “COUNTRY_STRING”. The DWH_POSTCODES table has 1637 unique town records which means the towns are different! Only 532 towns are matching and only those shall be updated in the DIM_REGION table.
I’m starting with just one column “Latitude” and this is what I would expect it to look like after the Update statement:
UPDATE DIM_REGION SET DIM_REGION.LATITUDE = (SELECT DWH_POSTCODES.LATITUDE from dim_region join dwh_postcodes on dim_region.town = dwh_postcodes.town where dim_region.town = dwh_postcodes.town);
but when I do that I get this error message (the error message is german and I had to translate, sorry):
ERROR 21000 a scalar subquery returned more than one value
I don’t understand why. If I commit just the select statement:
SELECT DWH_POSTCODES.LATITUDE from dim_region join dwh_postcodes on dim_region.town = dwh_postcodes.town where dim_region.town = dwh_postcodes.town
I get exactly what I expected to get and thats the whole column with all latitudes.
Thank you very much!
Advertisement
Answer
IBM Db2
and Apache Derby DB
are completely different products. So, remove the db2
tag.
You provided absolutely non-matched lists of towns in both tables. Was it so hard to prepare matched data sets?
If you don’t have duplicates in the dwh_postcodes.town
column, then update
should look like this (may be with the same subselect
in the EXISTS
predicate in WHERE
):
UPDATE DIM_REGION SET DIM_REGION.LATITUDE = ( SELECT --MAX ( DWH_POSTCODES.LATITUDE --) from dwh_postcodes where dim_region.town = dwh_postcodes.town -- fetch first 1 row only ) /* WHERE EXISTS ( SELECT 1 from dwh_postcodes where dim_region.town = dwh_postcodes.town ) */ ;
Otherwise you have to use some aggregate function (or the fetch first
clause if Derby supports it) to make this subselect
return only 1 row as shown commented out in the example above.
Alternatively, you could do 1 INSERT instead of INSERT and UPDATE to fill all 4 columns:
INSERT INTO DIM_REGION (TOWN, County, District, latitude) SELECT p.town_city, p.county, p.district, d.latitude FROM ( SELECT town_city, MAX(county) county, MAX(district) district FROM DWH_PRICE_PAID_RECORDS GROUP BY town_city ) p join dwh_postcodes d on p.town_city = d.town;