Usage Guide
===========

Installation
------------

From PyPI (once published)::

    pip install pqfilt

From source::

    git clone https://github.com/ysBach/pqfilt.git
    cd pqfilt
    pip install -e .

Python API
----------

Basic Filtering
~~~~~~~~~~~~~~~

The main entry point is :func:`pqfilt.read`::

    import pqfilt

    # Simple comparison
    df = pqfilt.read("data.parquet", filters="vmag < 20")

    # Equality
    df = pqfilt.read("data.parquet", filters="flag == 1")

Expression Syntax
~~~~~~~~~~~~~~~~~

Expressions support ``&`` (AND), ``|`` (OR), and parentheses for grouping.
``&`` binds tighter than ``|`` (standard boolean precedence)::

    # AND: both conditions must hold
    df = pqfilt.read("data.parquet", filters="a > 5 & b < 10")

    # OR: either condition holds
    df = pqfilt.read("data.parquet", filters="a < 3 | a > 8")

    # Mixed with parentheses
    df = pqfilt.read("data.parquet", filters="(a < 3 & b > 50) | c == 1")

Membership Filters
~~~~~~~~~~~~~~~~~~

Use ``in`` and ``not in`` with comma-separated values. You can optionally enclose the
list in brackets ``[]`` or parentheses ``()`` for readability::

    df = pqfilt.read("data.parquet", filters="desig in [1, 2, 3]")
    df = pqfilt.read("data.parquet", filters="name not in (foo, bar)")
    df = pqfilt.read("data.parquet", filters="desig in '1', '2', '3'")

If your Parquet column is a string type but contains numeric-looking values
(like ``"1"``), explicitly wrap the values in single or double quotes to
prevent `pqfilt` from coercing them to numbers. This avoids PyArrow type errors::

    # '1' is preserved as a string
    df = pqfilt.read("data.parquet", filters="desig in ['1', '356']")

Tuple Syntax
~~~~~~~~~~~~

For programmatic use, pass filters as a list of 3-tuples (flat AND)::

    df = pqfilt.read("data.parquet", filters=[("a", ">", 5), ("b", "<", 10)])

Or as a list of lists for DNF (OR of AND-groups)::

    df = pqfilt.read("data.parquet", filters=[
        [("a", "<", 3)],
        [("a", ">", 8)],
    ])

Column Selection
~~~~~~~~~~~~~~~~

Use ``columns`` for projection pushdown (only listed columns are read)::

    df = pqfilt.read("data.parquet", filters="a > 5", columns=["a", "b"])

Special Column Names
~~~~~~~~~~~~~~~~~~~~

Columns with spaces, hyphens, or operator characters can be
backtick-quoted::

    df = pqfilt.read("data.parquet", filters="`alpha*360` > 100")
    df = pqfilt.read("data.parquet", filters="`my column` <= 50")

Multi-file and Glob
~~~~~~~~~~~~~~~~~~~

Pass a glob pattern or a list of files::

    df = pqfilt.read("data/*.parquet", filters="vmag < 20")
    df = pqfilt.read(["file1.parquet", "file2.parquet"], filters="a > 5")

Output
~~~~~~

Save filtered results directly::

    df = pqfilt.read("data.parquet", filters="a > 5", output="out.parquet")
    df = pqfilt.read("data.parquet", filters="a > 5", output="out.csv")

CLI Usage
---------

Basic usage::

    pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet

Multiple ``-f`` flags are AND-ed together::

    pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet

Boolean expressions within a single ``-f``::

    pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o out.parquet

Column selection::

    pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o out.parquet

Overwrite existing output::

    pqfilt data/*.parquet -f "vmag < 20" -o out.parquet --overwrite