Quickstart Guide
================

This guide will get you up and running with datashard in just a few minutes. We'll cover the basics of creating tables, adding data, and using time travel.

Creating Your First Table
-------------------------

Let's create a simple table to store user data:

.. code-block:: python

   from datashard import create_table, Schema
   import os

   # Define schema for user data
   user_schema = Schema(
       schema_id=1,
       fields=[
           {"id": 1, "name": "user_id", "type": "long", "required": True},
           {"id": 2, "name": "name", "type": "string", "required": True},
           {"id": 3, "name": "age", "type": "long", "required": True},
           {"id": 4, "name": "email", "type": "string", "required": True}
       ]
   )

   # Create a new table
   table_path = "/tmp/my_first_table"
   table = create_table(table_path, user_schema)

   print(f"Table created at: {table_path}")

Adding Data with Transactions
-----------------------------

Now let's add some data using ACID transactions:

.. code-block:: python

   # Sample data to add
   sample_data = [
       {"user_id": 1, "name": "Alice", "age": 30, "email": "alice@example.com"},
       {"user_id": 2, "name": "Bob", "age": 25, "email": "bob@example.com"},
       {"user_id": 3, "name": "Charlie", "age": 35, "email": "charlie@example.com"}
   ]

   # Add data - append_records handles transactions automatically
   success = table.append_records(records=sample_data, schema=user_schema)

   print(f"Data added successfully: {success}")

Reading Data and Snapshots
--------------------------

Let's check what snapshots are available and access the current state:

.. code-block:: python

   # Get current snapshot
   current_snapshot = table.current_snapshot()
   print(f"Current snapshot ID: {current_snapshot.snapshot_id if current_snapshot else 'None'}")

   # List all snapshots
   all_snapshots = table.snapshots()
   print(f"Number of snapshots: {len(all_snapshots)}")

   for snapshot in all_snapshots:
       print(f"  - Snapshot {snapshot['snapshot_id']} at {snapshot['timestamp']}")

Time Travel Example
-------------------

One of the key features of datashard is time travel. Let's see how to travel back to a previous snapshot:

.. code-block:: python

   # If you have multiple snapshots, you can time travel to a specific one
   if len(all_snapshots) > 0:
       # Get first snapshot (oldest)
       first_snapshot = all_snapshots[0]
       historical_data = table.time_travel(snapshot_id=first_snapshot['snapshot_id'])
       print(f"Traveled to snapshot: {first_snapshot['snapshot_id']}")

Advanced Example: Complex Data Operations
-----------------------------------------

Here's a more comprehensive example that demonstrates multiple features:

.. code-block:: python

   from datashard import create_table, Schema
   import os

   # Define schema for complex data
   complex_schema = Schema(
       schema_id=1,
       fields=[
           {"id": 1, "name": "id", "type": "long", "required": True},
           {"id": 2, "name": "category", "type": "string", "required": True},
           {"id": 3, "name": "value", "type": "double", "required": True},
           {"id": 4, "name": "timestamp", "type": "string", "required": True}
       ]
   )

   # Create a new table with more complex data
   complex_table_path = "/tmp/complex_table"
   complex_table = create_table(complex_table_path, complex_schema)

   # Add more complex data
   complex_data = [
       {"id": 1, "category": "A", "value": 100.0, "timestamp": "2023-01-01"},
       {"id": 2, "category": "B", "value": 200.5, "timestamp": "2023-01-02"},
       {"id": 3, "category": "A", "value": 150.0, "timestamp": "2023-01-03"}
   ]

   # Add data
   success = complex_table.append_records(records=complex_data, schema=complex_schema)

   # Verify the data was added
   print(f"Table has {len(complex_table.snapshots())} snapshots after adding complex data")

Reading Data with Filters
-------------------------

DataShard supports efficient data reading with predicate pushdown and parallel processing:

.. code-block:: python

   # Basic scan - read all data
   all_records = table.scan()
   print(f"Total records: {len(all_records)}")

   # Scan with filter (predicate pushdown - filters at parquet level)
   filtered = table.scan(filter={"name": "Alice"})
   print(f"Records for Alice: {len(filtered)}")

   # Comparison filters
   young_users = table.scan(filter={"age": ("<", 30)})

   # Range filter
   age_range = table.scan(filter={"age": ("between", (25, 35))})

   # IN filter
   specific_users = table.scan(filter={"user_id": ("in", [1, 2])})

   # Column projection (only read specific columns)
   names_only = table.scan(columns=["name", "email"])

   # Parallel reading (2-4x speedup with multiple files)
   results = table.scan(parallel=True)  # Use all CPU cores
   results = table.scan(parallel=4)     # Use 4 threads

   # Combine filter, columns, and parallel
   df = table.to_pandas(
       columns=["name", "age"],
       filter={"age": (">=", 25)},
       parallel=True
   )

Streaming Large Tables
----------------------

For memory-efficient processing of large tables:

.. code-block:: python

   # Process in batches (memory-efficient)
   for batch in table.scan_batches(batch_size=1000):
       for record in batch:
           print(record)

   # Iterate record by record
   for record in table.iter_records(filter={"age": (">", 30)}):
       print(record["name"])

   # Iterate as pandas DataFrame chunks
   for chunk_df in table.iter_pandas(chunksize=10000):
       # Process each chunk with pandas operations
       summary = chunk_df.groupby("name").count()
       print(summary)

Cleaning Up
-----------

After you're done, you can remove the tables:

.. code-block:: python

   import shutil
   # Clean up (optional)
   if os.path.exists("/tmp/my_first_table"):
       shutil.rmtree("/tmp/my_first_table")
   if os.path.exists("/tmp/complex_table"):
       shutil.rmtree("/tmp/complex_table")

What's Next?
------------

Now that you've completed the quickstart, you can:

- Read about :doc:`concepts` to understand the core ideas behind datashard
- Learn more about :doc:`transactions` for safe concurrent operations
- Explore :doc:`time_travel` for historical data access
- Dive into the :doc:`api/iceberg` for detailed API documentation