This question may like a duplicate question but I’m giving it back because I couldn’t perfect a solution.
Sorry for that.
Now I’m working on a drug search engine. The user cannot always enter the exact drug name. Depending on the characters entered by the user, I need to get a better result for them. But now it is not. How should I do to do result more accurately? For example, there is information in DB: Анальгин, Цитрамон-П, Лирика-75мг
, etc. … (in Cyrillic). If the user enters in the Latin alphabet, the system converts it to the Cyrillic alphabet and searches through it. So, I need that: Even if the user enters words in Цитрамон, cитрамон, ииттрамон or cтрамон
, the result should return Цитрамон. Or if user enter: Лирика, ЛЛЛирика, иииррика, Лика or Лирик
, the result should return Лирика. Or Аналгин, Анагин ...
should returns Анальгин.
I should get similar results. I tried several search algorithms and packages (Laravel-Searchy, Laravel Scout, TNTSearch and etc) but to no avail. Now a search is working but the results are returning the same as in a SQL LIKE operator. Also, I tried with Levenstein distance as follow as:
My search variable:
$data = []; //dd($request->all()); $search = _ltc($request->qry); // convert input text between latin and crylic $data = Drug::selectRaw('*, levenshtein(?, `name`) as `diff`', [$search]) ->havingBetween('diff', [0,4]) ->limit(10) ->get();
And my SQL levenshtein function:
CREATE DEFINER = 'root'@'localhost' FUNCTION arzondori.levenshtein(s1 VARCHAR(255), s2 VARCHAR(255) ) RETURNS INT(11) DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; DECLARE s1_char CHAR; -- max strlen=255 DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len DO SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; RETURN c; END
I also considered the following options.:
$data = Drug::select("id", "name") ->where('name', 'LIKE', "%$search%") ->get(); $data5 = Searchy::search('drugs') ->fields('name') ->query($search) ->select('id', 'name') ->get();
Can anyone help find a solution to this problem? Sorry in advance if the question is a little confusing! Thanks!
Advertisement
Answer
This will be very hard to solve in the way you are trying. You need a full text search engine like Sphinx or elastic search. That has support for languages and fuzzy search. More info about it here: https://en.wikipedia.org/wiki/Full-text_search
I recommend Sphinx: http://sphinxsearch.com/docs/sphinx3.html#features-overview
The documentation is however very heavy. Another option is elasticsearch that has very good documentation.
Try to invent this your self will be very tricky and you will take time to get good result.