I have a telegram bot (Aiogram, Python), which is connected to PostgreSQL database.
If user enters a some username (in Cyrillic, Russian or Ukrainian), bot should return full information about this user, stored in database.
The problem is: when bot making query to the database, with username for example “Сергій”, it returns nothing, although information exists in database.
SELECT * FROM users WHERE username = 'Сергій';
I found out that problem is in symbols like ‘c’ or ‘i’, that looks similar, but have different hex code in UTF-8. If we’ll change query this way all works fine
SELECT * FROM users WHERE username LIKE '_ерг_й';
I’ve already changed database encoding to UTF-8, but still have this problem. If i copy username directly from database and paste it into bot’s input field – it works.
Here’s main fragments of my code:
@dp.message_handler() async def username_input_handler(message: Message): username = message.text answer_text = await db.search_by_username(username) await message.answer(text=answer_text) # database class class Database(): ... async def search_by_username(username): query = "SELECT * FROM users WHERE username LIKE $1" user = await self.pool.execute(query, username) return user ...
UPDATE:
Problem occurs only when I am querying the ‘users’ table. This table’s data was imported from a .txt
file.
Running the same queries against another table works fine.
Advertisement
Answer
If the users enter the exact same characters as are in the database, the query will work, but obviously they aren’t. You probably need similarity search:
CREATE EXTENSION pg_trgm; CREATE INDEX ON users USING gist (username gist_trgm_ops); SELECT * FROM users ORDER BY username <-> 'Сергій' LIMIT 1;
This will probably not be satisfactory for short names like Сергій
. If both С
and і
are the wrong character, there is only one trigram left to match: ерг
. And this may not find the closest match that you are looking for.
You may need to write a function cyrillic_normalize
that replaces every non-Cyrillic letter with the “closest” Cyrillic letter, like
CREATE FUNCTION cyrillic_normalize(text) RETURNS text LANGUAGE sql AS $$SELECT translate ($1, 'Ci', 'Сі')$$;
where the first string contains the Latin letters and the second the corresponding Cyrillic ones.
Then change your query to
WHERE cyrillic_normalize(username) = cyrillic_normalize('Сергій')