I have a table, where some texts are atrociously big. I want to make sure every row in the query output does not exceed, say, 100.000 characters. How do I do that?
Here is a quick sample:
WITH large_texts AS ( (SELECT 'humongous text goes here' AS text ,'1' AS id) UNION ALL (SELECT 'small one' AS text ,'2' AS id) UNION ALL (SELECT 'and another big one over here' AS text ,'3' AS id) ) SELECT * FROM large_texts
Let’s say I want output text
column to be less than 10 characters. So, I need this result:
+----+------------+ | id | text | +----+------------+ | 1 | humongous | +----+------------+ | 1 | text goes | +----+------------+ | 1 | here | +----+------------+ | 2 | small one | +----+------------+ | 3 | and anothe | +----+------------+ | 3 | r big one | +----+------------+ | 3 | over here | +----+------------+
It would be even better if I could also avoid splitting in the middle of words.
Advertisement
Answer
It would be even better if I could also avoid splitting in the middle of words.
Consider below approach
create temp function split_with_limit(text STRING, len FLOAT64) returns ARRAY<STRING> language js AS r""" let input = text.trim().split(' '); let [index, output] = [0, []] output[index] = ''; input.forEach(word => { let temp = `${output[index]} ${word}`.trim() if (temp.length < len) { output[index] = temp; } else { index++; output[index] = word; } }) return output """; select id, small_chunk from yourtable_with_large_texts, unnest(split_with_limit(text, 10)) small_chunk with offset order by id, offset
If applied to sample data in your question – output is