I am running queries in a large IBM DB2 database table (let’s call it T) and have found that the cells for column Identifier tend to be padded not just on the margins, but in between as well, as in: ‘ ID1 ID2 ‘. I do not have rights to update this DB, nor would I, given a number of factors. However, I want a way to ignore the whitespace AT LEAST on the left and right, even if I need to simply add a couple of spaces in between. The following queries work, but are slow, upwards of 20 seconds slow….
SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%'; SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2'; SELECT * FROM T WHERE TRIM(Identifier) = 'ID1 ID2'; SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1 ID2'; SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1 ID2%'; SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%'; SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1 ID2'; SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';
Trying to query something like “Select * FROM T WHERE REPLACE(Identifier, ‘ ‘, ”)…” of course just freezes up Access until I Ctrl+Break to end the operation. Is there a better, more efficient way to ignore the whitespace?
================================
UPDATE: As @Paul Vernon describes below, “Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.”
This led me to generate combinations of spaces before ‘ID1’ and ‘ID2’ and select the records using the IN clause. The number of combinations means that the query is slower than if I knew the exact match. This is how it looks in my Java code with Jdbc (edited to make it more generic to the key issue):
private static final int MAX_LENGTH = 30; public List<Parts> queryMyTable(String ID1, String ID2) { String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)"; final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2); return namedJdbcTemplate.query(query,parameters,new PartsMapper()); } public static List<String> getIDCombinations(String ID1, String ID2) { List<String> combinations = new ArrayList<>(); final int literalLength = ID1.length() + ID2.length(); final int maxWhitespace = MAX_LENGTH - literalLength; combinations.add(ID1+ID2); for(int x = 1; x <= maxWhitespace; x++){ String xSpace = String.format("%1$"+x+"s", ""); String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2); String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2); combinations.add(idZeroSpaceBeforeBase); combinations.add(idZeroSpaceAfterBase); for(int y = 1; (x+y) <= maxWhitespace; y++){ String ySpace = String.format("%1$"+y+"s", ""); String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2); combinations.add(id); } } return combinations; }
Advertisement
Answer
Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.
Assuming there is an index on the Identifier
, your only option (if you can’t change the data, or add a functional index or index a generated column), is probably something like this
SELECT * FROM T WHERE Identifier = 'ID1 ID2' OR Identifier = ' ID1 ID2' OR Identifier = ' ID1 ID2' OR Identifier = 'ID1 ID2' OR Identifier = ' ID1 ID2' OR Identifier = ' ID1 ID2'
which the Db2 optimize might implement as 6 index lookups, which would be faster than a full index or table scan
You could also try this
SELECT * FROM T WHERE Identifier LIKE 'ID1 %ID2' OR Identifier LIKE ' ID1 %ID2' OR Identifier LIKE ' ID1 %ID2'
which the Db2 optimize might implement as 3 index range scans,
In both examples add more lines to cover the maximum number of leading spaces you have in your data if needed. In the first example add more lines for the embeded spaces too if needed