Python • Pandas • PyArrow • Spark
Copy-paste ready snippets for the most common ways to open Parquet data. Use tabs to switch libraries and jump directly to the section you need.
Ideal for local analysis and quick exploration. Install pandas with a Parquet engine such as pyarrow.
columns= and row-level predicates via PyArrow.engine="pyarrow" (default in recent pandas) for performance and type fidelity.Fast for local analysis; requires pandas with pyarrow engine.
import pandas as pddf = pd.read_parquet("file.parquet")print(df.head())
Code not working as expected?
Dependency issues or errors loading your Parquet file? Try it instantly in the browser.
Use Parquet Tools OnlinePyArrow exposes Parquet metadata, statistics, and row groups. It is excellent when you need control over columns, filters, or schema inspection before converting to pandas.
import pyarrow.parquet as pqtable = pq.read_table("file.parquet", columns=["id", "country"])print(table.schema)# Filter rows by predicate pushdown (supported when statistics exist)dataset = pq.ParquetDataset("file.parquet")filtered = dataset.read(columns=["id", "country"])df = filtered.to_pandas()
Tip: call pq.read_metadata to inspect row groups, compression, and column types without loading the full dataset.
Spark handles partitioned datasets on cloud storage with predicate pushdown and column pruning. Use it when the dataset exceeds single-machine memory.
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import colspark = SparkSession.builder.appName("read-parquet").getOrCreate()df = spark.read.parquet("s3://bucket/path/to/table/")# Column pruning + predicate pushdownfiltered = df.select("id", "country").where(col("country") == "US")filtered.show(10)filtered.write.mode("overwrite").parquet("s3://bucket/tmp/us-users/")
Tip: Keep partitions balanced (avoid millions of small files) and prefer column pruning to reduce shuffle costs.
pyarrow orfastparquet (pip install "pandas[parquet]").pq.read_metadata. repartition in Spark. dataset.to_pandas(split_blocks=True), or sample via Spark then export.