data_structures Module API
This section documents the core data structures used throughout datashard.
Enums
FileFormat
ManifestContent
Classes
Schema
PartitionField
PartitionSpec
SortField
SortOrder
DataFile
- class datashard.DataFile(file_path: str, file_format: FileFormat, partition_values: Dict[str, Any], record_count: int, file_size_in_bytes: int, column_sizes: Dict[int, int] | None = None, value_counts: Dict[int, int] | None = None, null_value_counts: Dict[int, int] | None = None, lower_bounds: Dict[int, Any] | None = None, upper_bounds: Dict[int, Any] | None = None, key_metadata: bytes | None = None, checksum: str | None = None, split_offsets: List[int] | None = None, split_compressed_offsets: List[int] | None = None, equality_ids: List[int] | None = None, sort_order_id: int | None = None)[source]
Bases:
objectRepresents a data file in the table
- file_format: FileFormat
- __init__(file_path: str, file_format: FileFormat, partition_values: Dict[str, Any], record_count: int, file_size_in_bytes: int, column_sizes: Dict[int, int] | None = None, value_counts: Dict[int, int] | None = None, null_value_counts: Dict[int, int] | None = None, lower_bounds: Dict[int, Any] | None = None, upper_bounds: Dict[int, Any] | None = None, key_metadata: bytes | None = None, checksum: str | None = None, split_offsets: List[int] | None = None, split_compressed_offsets: List[int] | None = None, equality_ids: List[int] | None = None, sort_order_id: int | None = None) None
DeleteFile
- class datashard.DeleteFile(file_path: str, file_format: FileFormat, partition_values: Dict[str, Any], record_count: int, file_size_in_bytes: int, content: ManifestContent = ManifestContent.DELETES)[source]
Bases:
objectRepresents a delete file in the table
- file_format: FileFormat
- content: ManifestContent = 1
ManifestFile
- class datashard.ManifestFile(manifest_path: str, manifest_length: int, partition_spec_id: int, added_snapshot_id: int, added_data_files_count: int, existing_data_files_count: int, deleted_data_files_count: int, partitions: List[Dict[str, Any]], content: ManifestContent = ManifestContent.DATA, sequence_number: int | None = None, min_sequence_number: int | None = None)[source]
Bases:
objectManifest file that lists data or delete files
- content: ManifestContent = 0
- __init__(manifest_path: str, manifest_length: int, partition_spec_id: int, added_snapshot_id: int, added_data_files_count: int, existing_data_files_count: int, deleted_data_files_count: int, partitions: List[Dict[str, Any]], content: ManifestContent = ManifestContent.DATA, sequence_number: int | None = None, min_sequence_number: int | None = None) None
Snapshot
HistoryEntry
TableMetadata
- class datashard.TableMetadata(location: str, table_uuid: str = <factory>, format_version: int = 2, last_sequence_number: int = 0, last_updated_ms: int = <factory>, last_column_id: int = 0, schemas: ~typing.List[~datashard.data_structures.Schema] = <factory>, current_schema_id: int = 0, partition_specs: ~typing.List[~datashard.data_structures.PartitionSpec] = <factory>, default_spec_id: int = 0, sort_orders: ~typing.List[~datashard.data_structures.SortOrder] = <factory>, default_sort_order_id: int = 1, properties: ~typing.Dict[str, str] = <factory>, current_snapshot_id: int | None = None, snapshots: ~typing.List[~datashard.data_structures.Snapshot] = <factory>, snapshot_log: ~typing.List[~datashard.data_structures.HistoryEntry] = <factory>, metadata_log: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>)[source]
Bases:
objectMain metadata structure for an Iceberg table
- partition_specs: List[PartitionSpec]
- __init__(location: str, table_uuid: str = <factory>, format_version: int = 2, last_sequence_number: int = 0, last_updated_ms: int = <factory>, last_column_id: int = 0, schemas: ~typing.List[~datashard.data_structures.Schema] = <factory>, current_schema_id: int = 0, partition_specs: ~typing.List[~datashard.data_structures.PartitionSpec] = <factory>, default_spec_id: int = 0, sort_orders: ~typing.List[~datashard.data_structures.SortOrder] = <factory>, default_sort_order_id: int = 1, properties: ~typing.Dict[str, str] = <factory>, current_snapshot_id: int | None = None, snapshots: ~typing.List[~datashard.data_structures.Snapshot] = <factory>, snapshot_log: ~typing.List[~datashard.data_structures.HistoryEntry] = <factory>, metadata_log: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>) None