qufe.texthandler module

Text processing utilities for string manipulation, formatting, and analysis.

This module provides functions for: - Converting lists to DokuWiki table format - Finding string occurrences with context - Pretty-printing nested dictionaries - Extracting substrings between delimiters - Displaying items in column format - Extracts price from a string and returns it as float

qufe.texthandler.extract_number(text: str, strict: bool = True) → float[source]

Extracts a number from text and returns it as float.

Gracefully handles various text formats containing numbers, including those with commas, currency symbols, and other characters.

Parameters:

text – String containing numeric information
strict – If True, raises ValueError when no number is found. If False, returns 0.0 when no number is found (default: True)

Returns:

Extracted number

Return type:

float

Raises:

ValueError – When strict=True and no number is found, or when multiple decimal points exist

Examples

>>> extract_number("1,234.56원")
1234.56
>>> extract_number("₩ 1_234_567")
1234567.0
>>> extract_number("2,500 items")
2500.0
>>> extract_number("Score: 98.5%")
98.5
>>> extract_number("text only", strict=False)
0.0
>>> extract_number("text only", strict=True)
ValueError: No number found

Test Examples:

# strict=True (default) tests test_cases = [

(“1,234.56원”, 1234.56), (”₩ 2,500”, 2500.0), (“1_000_000”, 1000000.0), (“3,456.78 (including tax)”, 3456.78), (“USD 99.99 [discounted]”, 99.99), (“Score: 85.5”, 85.5), (“Temperature: -12.3°C”, 12.3), # Note: minus sign not preserved

]

# strict=False tests lenient_cases = [

(“1,234.56원”, 1234.56), (“no number here”, 0.0), (“text only”, 0.0), (“”, 0.0), (None, 0.0), (“100 items”, 100.0), (“free (0원)”, 0.0),

]

# Always error cases (regardless of strict mode) error_cases = [

“1.234.56”, # multiple decimal points

]

qufe.texthandler.find_all_occurrences(input_string: str, str_to_find: str, print_len: bool = True) → List[int][source]

Find all starting positions of a substring in a string.

Parameters:

input_string – The string to search in
str_to_find – The substring to find
print_len – Whether to print the number of occurrences found

Returns:

List of starting positions where the substring was found

Example

>>> find_all_occurrences("hello world hello", "hello")
occurrences found: 2
[0, 12]

qufe.texthandler.list_to_doku_wiki_table(data: List[List[str]]) → None[source]

Convert a 2D list to DokuWiki table format and print it.

The first row is treated as headers (with ^ delimiters), subsequent rows are treated as data (with | delimiters).

Parameters:: data – 2D list where first row contains headers

Example

>>> data = [['Name', 'Age'], ['Alice', '25'], ['Bob', '30']]
>>> list_to_doku_wiki_table(data)
^ Name ^ Age ^
| Alice | 25 |
| Bob | 30 |

qufe.texthandler.print_dict(data: Dict[str, Any] | List[Any], depth: int = 0, indent: int = 2, max_depth: int = 99) → None[source]

Pretty-print nested dictionaries and lists with indentation.

Parameters:

data – Dictionary or list to print
depth – Current depth level (for recursion)
indent – Number of spaces per indentation level
max_depth – Maximum depth to print (prevents infinite recursion)

Example

>>> data = {'key1': ['item1', 'item2'], 'key2': {'nested': 'value'}}
>>> print_dict(data)
* key1
- item1
- item2
* key2
  * nested

qufe.texthandler.print_if_found(input_string: str, str_to_find: str, len_to_print: int, do_print: bool = True, print_empty: bool = False) → List[str][source]

Find occurrences of a substring and print surrounding context.

Parameters:

input_string – The string to search in
str_to_find – The substring to find
len_to_print – Total length of context to show around each occurrence
do_print – Whether to print the results
print_empty – Whether to print a message if nothing is found

Returns:

List of context strings around each occurrence

Example

>>> print_if_found("hello world hello", "world", 10)
llo world
['llo world ']

qufe.texthandler.print_in_columns(items: List[Any], num_cols: int = 2, add_spaces: int = 2, return_type: str = '') → List[str] | List[tuple] | None[source]

Display items in a column format with proper alignment.

Parameters:

items – List of items to display
num_cols – Number of columns
add_spaces – Additional spaces between columns
return_type – Return format (‘raw’ for tuples, any other string for formatted strings)

Returns:

None (prints output), list of formatted strings, or list of tuples

Example

>>> print_in_columns(['a', 'b', 'c', 'd'], num_cols=2)
a  c
b  d

qufe.texthandler.substring_between(input_string: str, start_string: str, end: str | int, start_offset: int = 0) → List[str][source]

Extract substrings between start and end markers.

Parameters:

input_string – The string to extract from
start_string – The starting delimiter
end – The ending delimiter (string) or length (int)
start_offset – Offset to apply before the start position

Returns:

List of extracted substrings

Example

>>> substring_between("start{content}end start{more}end", "start{", "}", 0)
['content}', 'more}']