qufe.dbhandler module

Database handlers for PostgreSQL and SQLite with unified interface.

This module provides handlers for both PostgreSQL and SQLite databases with a common base class for consistent usage patterns.

Required dependencies:
  • For PostgreSQL: pip install qufe[database]

  • For SQLite: No additional dependencies (uses standard library)

class qufe.dbhandler.BaseDBHandler[source]

Bases: ABC

Abstract base class for database handlers.

Provides common interface and shared functionality for different database implementations.

__enter__()[source]

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit.

__init__()[source]

Initialize base handler.

abstractmethod connect() None[source]

Establish database connection.

abstractmethod disconnect() None[source]

Close database connection.

execute_many(query: str, params_list: List[tuple]) None[source]

Execute same query multiple times with different parameters.

Parameters:
  • query – SQL query string with placeholders

  • params_list – List of parameter tuples

abstractmethod execute_query(query: str, params: tuple | None = None) List[source]

Execute a query and return results.

Parameters:
  • query – SQL query string

  • params – Query parameters for safe execution

Returns:

List of query results

table_exists(table_name: str) bool[source]

Check if a table exists in the database.

Parameters:

table_name – Name of the table to check

Returns:

True if table exists, False otherwise

transaction()[source]

Context manager for database transactions.

Automatically commits on success, rolls back on error.

class qufe.dbhandler.PostgreSQLHandler(db_name: str | None = None, user: str | None = None, password: str | None = None, host: str | None = None, port: int | None = None)[source]

Bases: BaseDBHandler

PostgreSQL connection handler with automatic environment variable support.

Extends BaseDBHandler for PostgreSQL-specific functionality.

__init__(db_name: str | None = None, user: str | None = None, password: str | None = None, host: str | None = None, port: int | None = None)[source]

Initialize PostgreSQL connection handler.

Parameters:
  • db_name – Database name (defaults to POSTGRES_DB env var)

  • user – Username (defaults to POSTGRES_USER env var)

  • password – Password (defaults to POSTGRES_PASSWORD env var)

  • host – Host address (defaults to POSTGRES_HOST or ‘localhost’)

  • port – Port number (defaults to POSTGRES_PORT or 5432)

connect() None[source]

Establish PostgreSQL connection.

disconnect() None[source]

Close PostgreSQL connection.

execute_query(query: str, params: tuple | None = None) List[source]

Execute a SQL query and return results.

Parameters:
  • query – SQL query string

  • params – Query parameters (not used in current implementation)

Returns:

List of query results for SELECT queries, empty list for DDL/DML

get_database_list(print_result: bool = False) List[str][source]

Get list of all databases in the PostgreSQL server.

Parameters:

print_result – Whether to print the database list

Returns:

List of database names

get_table_list(print_result: bool = True) List[Dict[str, str]][source]

Get list of all tables and views in the current database.

Parameters:

print_result – Whether to print the table list

Returns:

List of dictionaries containing table information

class qufe.dbhandler.SQLiteHandler(db_path: str | Path, create_dir: bool = False)[source]

Bases: BaseDBHandler

SQLite database handler for local file-based databases.

Provides convenient methods for common SQLite operations in Jupyter notebook environments.

__init__(db_path: str | Path, create_dir: bool = False)[source]

Initialize SQLite handler.

Parameters:
  • db_path – Path to SQLite database file

  • create_dir – Whether to create parent directory if missing

classmethod analyze_scan_results(results: Dict[str, Any], top_n: int = 10, sort_by: str = 'rows', show_unmatched: bool = False) None[source]

Analyze and display detailed scan results.

Parameters:
  • results – Results from scan_databases

  • top_n – Number of top files to show

  • sort_by – Sort criterion (‘rows’, ‘tables’, ‘size’, ‘match_rate’)

  • show_unmatched – Whether to show unmatched tables (default: False)

connect() None[source]

Establish SQLite connection.

create_table(table_name: str, columns: Dict[str, str]) None[source]

Create a table with specified columns.

Parameters:
  • table_name – Name of the table to create

  • columns – Dictionary of column_name: data_type

classmethod describe(db_path: str) None[source]

Show complete database structure.

Parameters:

db_path – Path to SQLite database

disconnect() None[source]

Close SQLite connection.

execute_query(query: str, params: tuple | None = None) List[source]

Execute a SQL query and return results.

Parameters:
  • query – SQL query string

  • params – Query parameters for safe execution

Returns:

List of query results for SELECT queries, empty list for DDL/DML

classmethod export_scan_results(results: Dict[str, Any], output_path: str, format: str = 'csv') None[source]

Export scan results to file.

Parameters:
  • results – Results from scan_databases

  • output_path – Output file path

  • format – Export format (‘csv’, ‘json’, ‘pickle’)

get_table_info(table_name: str) List[Dict[str, Any]][source]

Get column information for a table with multiple fallback methods.

Tries multiple approaches in order: 1. PRAGMA table_info (standard SQLite metadata) 2. Parse CREATE TABLE statement from sqlite_master 3. Infer from sample data using cursor.description

Parameters:

table_name – Name of the table

Returns:

  • cid: Column ID (position)

  • name: Column name

  • type: Data type (may be ‘UNKNOWN’ if inferred)

  • notnull: Whether NOT NULL constraint exists (0 or 1)

  • dflt_value: Default value

  • pk: Whether primary key (0 or 1)

Return type:

List of column information dictionaries with keys

get_table_schema(table_name: str, verbose: bool = False) Dict[str, Any][source]

Get comprehensive table schema information.

Provides detailed information about table structure including the method used to obtain the information.

Parameters:
  • table_name – Name of the table

  • verbose – Whether to include additional diagnostic info

Returns:

  • columns: List of column information

  • method: Method used to obtain info (‘pragma’, ‘schema’, ‘inferred’)

  • accuracy: Confidence level (‘high’, ‘medium’, ‘low’)

  • warnings: List of any warnings

Return type:

Dictionary containing

get_tables() List[str][source]

Get list of all tables in the database.

Returns:

List of table names

insert_data(table_name: str, data: Dict[str, Any]) None[source]

Insert a single row of data into a table.

Parameters:
  • table_name – Name of the table

  • data – Dictionary of column_name: value

classmethod quick_peek(db_path: str, max_rows: int = 5) None[source]

Quick preview of database contents without creating instance.

Parameters:
  • db_path – Path to SQLite database

  • max_rows – Maximum rows to show per table

classmethod quick_scan(folder_path: str, **kwargs) Dict[str, Any][source]

Convenience method for quick database scanning.

Automatically detects Jupyter environment and enables appropriate display mode for the best user experience.

Parameters:
  • folder_path – Directory path to scan

  • **kwargs – Additional arguments for scan_databases

Returns:

Scan results dictionary

Example

>>> # Quick scan with automatic environment detection
>>> results = SQLiteHandler.quick_scan('/data/databases')
>>> # Quick scan with specific columns
>>> results = SQLiteHandler.quick_scan(
...     '/data',
...     required_columns=['column_a', 'column_b']
... )
classmethod read_all_dbs(folder_path: str, table_name: str, pattern: str = '*.db') Dict[str, List][source]

Read same table from multiple database files in a folder.

Parameters:
  • folder_path – Path to folder containing .db files

  • table_name – Name of table to read from each database

  • pattern – File pattern to match (default: “*.db”)

Returns:

Dictionary mapping database filename to data

classmethod read_by_date(db_path: str, table_name: str, date_column: str = 'timestamp', start: str | None = None, end: str | None = None) List[source]

Read data filtered by date range.

Parameters:
  • db_path – Path to SQLite database

  • table_name – Name of the table

  • date_column – Name of the date column

  • start – Start date (inclusive)

  • end – End date (inclusive)

Returns:

List of query results

read_table(table_name: str, columns: str = '*', where: str | None = None, limit: int | None = None) List[source]

Read data from a table with optional filtering.

Parameters:
  • table_name – Name of the table

  • columns – Columns to select (default: ‘*’)

  • where – WHERE clause (without ‘WHERE’ keyword)

  • limit – Maximum number of rows to return

Returns:

List of query results

classmethod scan_databases(folder_path: str, required_columns: List[str] | None = None, pattern: str = '**/*.db', verbose: bool = True, jupyter_mode: bool = False) Dict[str, Any][source]

Scan all SQLite databases in a folder for tables with specific columns.

Efficiently scans multiple database files to find tables containing specified columns and counts their rows. Useful for data exploration and migration planning.

Parameters:
  • folder_path – Directory path to scan for database files

  • required_columns – List of column names that tables must have (None means count all tables)

  • pattern – Glob pattern for finding database files (default: ‘**/*.db’)

  • verbose – Display progress and statistics (default: True)

  • jupyter_mode – Enable Jupyter notebook display mode (default: False)

Returns:

  • total_rows: Total number of rows across all matching tables

  • matching_tables: Count of tables with required columns

  • total_tables_scanned: Total number of tables examined

  • processed_files: Number of successfully processed files

  • error_files: Number of files with errors

  • details: List of detailed information per file

  • errors: List of error messages

  • summary: Processing summary statistics

Return type:

Dictionary containing

Example

>>> # Basic usage - find tables with specific columns
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data/databases',
...     required_columns=['Column_A', 'Column_B'],
...     verbose=True
... )
>>> # Scan all tables in all databases
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data',
...     pattern='**/*.sqlite',
...     required_columns=None
... )
>>> # Jupyter notebook with progress tracking
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data',
...     required_columns=['user_id', 'created_at'],
...     jupyter_mode=True
... )
table_exists(table_name: str) bool[source]

Check if a table exists.

Parameters:

table_name – Name of the table

Returns:

True if table exists

classmethod to_dataframe(db_path: str, table_name: str, where: str | None = None) Any | None[source]

Read table directly into pandas DataFrame.

Parameters:
  • db_path – Path to SQLite database

  • table_name – Name of the table

  • where – Optional WHERE clause

Returns:

pandas DataFrame or None if pandas not available

qufe.dbhandler.help()[source]

Display help information for database handler functionality.

Shows usage examples and available handlers.