Efficient way to ignore whitespace in DB2?

I am running queries in a large IBM DB2 database table (let’s call it T) and have found that the cells for column Identifier tend to be padded not just on the margins, but in between as well, as in: ‘ ID1 ID2 ‘. I do not have rights to update this DB, nor would I, given a number of factors. However, I want a way to ignore the whitespace AT LEAST on the left and right, even if I need to simply add a couple of spaces in between. The following queries work, but are slow, upwards of 20 seconds slow….

SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%';
SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2';
SELECT * FROM T WHERE TRIM(Identifier) = 'ID1  ID2';
SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1  ID2';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1  ID2%';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1  ID2';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';

​x
 
SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%';SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2';SELECT * FROM T WHERE TRIM(Identifier) = 'ID1  ID2';SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1  ID2';SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1  ID2%';SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%';SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1  ID2';SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';​

Trying to query something like “Select * FROM T WHERE REPLACE(Identifier, ‘ ‘, ”)…” of course just freezes up Access until I Ctrl+Break to end the operation. Is there a better, more efficient way to ignore the whitespace?

================================

UPDATE: As @Paul Vernon describes below, “Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.”

This led me to generate combinations of spaces before ‘ID1’ and ‘ID2’ and select the records using the IN clause. The number of combinations means that the query is slower than if I knew the exact match. This is how it looks in my Java code with Jdbc (edited to make it more generic to the key issue):

    private static final int MAX_LENGTH = 30;

    public List<Parts> queryMyTable(String ID1, String ID2) {
        String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)";
        final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2);
        return namedJdbcTemplate.query(query,parameters,new PartsMapper());
    }


    public static List<String> getIDCombinations(String ID1, String ID2) {
        List<String> combinations = new ArrayList<>();
        final int literalLength = ID1.length() + ID2.length();
        final int maxWhitespace = MAX_LENGTH - literalLength;
        combinations.add(ID1+ID2);
        for(int x = 1; x <= maxWhitespace; x++){
            String xSpace = String.format("%1$"+x+"s", "");
            String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2);
            String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2);
            combinations.add(idZeroSpaceBeforeBase);
            combinations.add(idZeroSpaceAfterBase);
            for(int y = 1; (x+y) <= maxWhitespace; y++){
                String ySpace = String.format("%1$"+y+"s", "");
                String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2);
                combinations.add(id);
            }
        }
        return combinations;
    }

 
    private static final int MAX_LENGTH = 30;​    public List<Parts> queryMyTable(String ID1, String ID2) {        String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)";        final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2);        return namedJdbcTemplate.query(query,parameters,new PartsMapper());    }​​    public static List<String> getIDCombinations(String ID1, String ID2) {        List<String> combinations = new ArrayList<>();        final int literalLength = ID1.length() + ID2.length();        final int maxWhitespace = MAX_LENGTH - literalLength;        combinations.add(ID1+ID2);        for(int x = 1; x <= maxWhitespace; x++){            String xSpace = String.format("%1$"+x+"s", "");            String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2);            String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2);            combinations.add(idZeroSpaceBeforeBase);            combinations.add(idZeroSpaceAfterBase);            for(int y = 1; (x+y) <= maxWhitespace; y++){                String ySpace = String.format("%1$"+y+"s", "");                String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2);                combinations.add(id);            }        }        return combinations;    }​

Answer

Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.

Assuming there is an index on the Identifier, your only option (if you can’t change the data, or add a functional index or index a generated column), is probably something like this

SELECT * FROM T
WHERE
    Identifier = 'ID1 ID2'
OR  Identifier = ' ID1 ID2'
OR  Identifier = '  ID1 ID2'
OR  Identifier = 'ID1  ID2'
OR  Identifier = ' ID1  ID2'
OR  Identifier = '  ID1  ID2'

 
SELECT * FROM TWHERE    Identifier = 'ID1 ID2'OR  Identifier = ' ID1 ID2'OR  Identifier = '  ID1 ID2'OR  Identifier = 'ID1  ID2'OR  Identifier = ' ID1  ID2'OR  Identifier = '  ID1  ID2'​

which the Db2 optimize might implement as 6 index lookups, which would be faster than a full index or table scan

You could also try this

SELECT * FROM T
WHERE
    Identifier LIKE 'ID1 %ID2'
OR  Identifier LIKE ' ID1 %ID2'
OR  Identifier LIKE '  ID1 %ID2'

 
SELECT * FROM TWHERE    Identifier LIKE 'ID1 %ID2'OR  Identifier LIKE ' ID1 %ID2'OR  Identifier LIKE '  ID1 %ID2'​

which the Db2 optimize might implement as 3 index range scans,

In both examples add more lines to cover the maximum number of leading spaces you have in your data if needed. In the first example add more lines for the embeded spaces too if needed

Advertisement

Answer