qufe.texthandler module
Text processing utilities for string manipulation, formatting, and analysis.
This module provides functions for: - Converting lists to DokuWiki table format - Finding string occurrences with context - Pretty-printing nested dictionaries - Extracting substrings between delimiters - Displaying items in column format - Extracts price from a string and returns it as float
- qufe.texthandler.extract_number(text: str, strict: bool = True) float[source]
Extracts a number from text and returns it as float.
Gracefully handles various text formats containing numbers, including those with commas, currency symbols, and other characters.
- Parameters:
text – String containing numeric information
strict – If True, raises ValueError when no number is found. If False, returns 0.0 when no number is found (default: True)
- Returns:
Extracted number
- Return type:
- Raises:
ValueError – When strict=True and no number is found, or when multiple decimal points exist
Examples
>>> extract_number("1,234.56원") 1234.56 >>> extract_number("₩ 1_234_567") 1234567.0 >>> extract_number("2,500 items") 2500.0 >>> extract_number("Score: 98.5%") 98.5 >>> extract_number("text only", strict=False) 0.0 >>> extract_number("text only", strict=True) ValueError: No number found
- Test Examples:
# strict=True (default) tests test_cases = [
(“1,234.56원”, 1234.56), (”₩ 2,500”, 2500.0), (“1_000_000”, 1000000.0), (“3,456.78 (including tax)”, 3456.78), (“USD 99.99 [discounted]”, 99.99), (“Score: 85.5”, 85.5), (“Temperature: -12.3°C”, 12.3), # Note: minus sign not preserved
]
# strict=False tests lenient_cases = [
(“1,234.56원”, 1234.56), (“no number here”, 0.0), (“text only”, 0.0), (“”, 0.0), (None, 0.0), (“100 items”, 100.0), (“free (0원)”, 0.0),
]
# Always error cases (regardless of strict mode) error_cases = [
“1.234.56”, # multiple decimal points
]
- qufe.texthandler.find_all_occurrences(input_string: str, str_to_find: str, print_len: bool = True) List[int][source]
Find all starting positions of a substring in a string.
- Parameters:
input_string – The string to search in
str_to_find – The substring to find
print_len – Whether to print the number of occurrences found
- Returns:
List of starting positions where the substring was found
Example
>>> find_all_occurrences("hello world hello", "hello") occurrences found: 2 [0, 12]
- qufe.texthandler.list_to_doku_wiki_table(data: List[List[str]]) None[source]
Convert a 2D list to DokuWiki table format and print it.
The first row is treated as headers (with ^ delimiters), subsequent rows are treated as data (with | delimiters).
- Parameters:
data – 2D list where first row contains headers
Example
>>> data = [['Name', 'Age'], ['Alice', '25'], ['Bob', '30']] >>> list_to_doku_wiki_table(data) ^ Name ^ Age ^ | Alice | 25 | | Bob | 30 |
- qufe.texthandler.print_dict(data: Dict[str, Any] | List[Any], depth: int = 0, indent: int = 2, max_depth: int = 99) None[source]
Pretty-print nested dictionaries and lists with indentation.
- Parameters:
data – Dictionary or list to print
depth – Current depth level (for recursion)
indent – Number of spaces per indentation level
max_depth – Maximum depth to print (prevents infinite recursion)
Example
>>> data = {'key1': ['item1', 'item2'], 'key2': {'nested': 'value'}} >>> print_dict(data) * key1 - item1 - item2 * key2 * nested
- qufe.texthandler.print_if_found(input_string: str, str_to_find: str, len_to_print: int, do_print: bool = True, print_empty: bool = False) List[str][source]
Find occurrences of a substring and print surrounding context.
- Parameters:
input_string – The string to search in
str_to_find – The substring to find
len_to_print – Total length of context to show around each occurrence
do_print – Whether to print the results
print_empty – Whether to print a message if nothing is found
- Returns:
List of context strings around each occurrence
Example
>>> print_if_found("hello world hello", "world", 10) llo world ['llo world ']
- qufe.texthandler.print_in_columns(items: List[Any], num_cols: int = 2, add_spaces: int = 2, return_type: str = '') List[str] | List[tuple] | None[source]
Display items in a column format with proper alignment.
- Parameters:
items – List of items to display
num_cols – Number of columns
add_spaces – Additional spaces between columns
return_type – Return format (‘raw’ for tuples, any other string for formatted strings)
- Returns:
None (prints output), list of formatted strings, or list of tuples
Example
>>> print_in_columns(['a', 'b', 'c', 'd'], num_cols=2) a c b d
- qufe.texthandler.substring_between(input_string: str, start_string: str, end: str | int, start_offset: int = 0) List[str][source]
Extract substrings between start and end markers.
- Parameters:
input_string – The string to extract from
start_string – The starting delimiter
end – The ending delimiter (string) or length (int)
start_offset – Offset to apply before the start position
- Returns:
List of extracted substrings
Example
>>> substring_between("start{content}end start{more}end", "start{", "}", 0) ['content}', 'more}']