GeoFabric API Reference¶
Complete reference for all GeoFabric functions, classes, and methods.
Table of Contents¶
- Core Functions
- Configuration Functions
- ROI (Region of Interest) Functions
- Cache Functions
- Validation Functions
- Dataset Class
- Query Class
- Data Classes
Core Functions¶
gf.open(uri, *, engine=None)¶
Open a geospatial data source and return a Dataset.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| uri | str | URI to the data source |
| engine | DuckDBEngine \| None | Optional custom engine instance |
Returns: Dataset
Supported URI Schemes:
| Scheme | Description | Example |
|--------|-------------|---------|
| file:// or path | Local files (Parquet, GeoJSON, etc.) | gf.open("data.parquet") |
| s3:// | Amazon S3 | gf.open("s3://bucket/data.parquet") |
| gs:// or gcs:// | Google Cloud Storage | gf.open("gs://bucket/data.parquet") |
| az:// | Azure Blob Storage | gf.open("az://container/data.parquet") |
| postgresql:// | PostGIS database | gf.open("postgresql://host/db?table=t") |
| overture:// | Overture Maps data | gf.open("overture://buildings") |
| stac:// | STAC catalogs | gf.open("stac://catalog.com/collection") |
Example:
import geofabric as gf
# Local file
ds = gf.open("buildings.parquet")
# S3 with credentials configured
gf.configure_s3(access_key_id="...", secret_access_key="...")
ds = gf.open("s3://my-bucket/data.parquet?anonymous=false")
# PostGIS
ds = gf.open("postgresql://user:pass@host:5432/db?table=parcels")
Configuration Functions¶
All configuration functions set credentials programmatically. Programmatic configuration takes precedence over environment variables.
gf.configure_s3(...)¶
Configure Amazon S3 credentials.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| access_key_id | str \| None | None | AWS access key ID |
| secret_access_key | str \| None | None | AWS secret access key |
| region | str \| None | None | AWS region (e.g., 'us-east-1') |
| session_token | str \| None | None | AWS session token (for temporary credentials) |
| endpoint | str \| None | None | Custom S3 endpoint (for MinIO, DigitalOcean Spaces) |
| use_ssl | bool | True | Use SSL for connections |
Example:
# Standard AWS credentials
gf.configure_s3(
access_key_id="AKIAIOSFODNN7EXAMPLE",
secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
region="us-east-1"
)
# MinIO or S3-compatible service
gf.configure_s3(
access_key_id="minioadmin",
secret_access_key="minioadmin",
endpoint="http://localhost:9000",
use_ssl=False
)
gf.configure_gcs(...)¶
Configure Google Cloud Storage credentials.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| access_key_id | str \| None | None | GCS HMAC access key ID |
| secret_access_key | str \| None | None | GCS HMAC secret access key |
| project | str \| None | None | GCP project ID |
Example:
gf.configure_gcs(
access_key_id="GOOGTS7C7FUP3AIRVJTE2BCD",
secret_access_key="bGoa+V7g/yqDXvKRqq+JTFn4uQZbPiQJo4pf9RzJ",
project="my-gcp-project"
)
gf.configure_azure(...)¶
Configure Azure Blob Storage credentials.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| account_name | str \| None | None | Azure storage account name |
| account_key | str \| None | None | Azure storage account key |
| connection_string | str \| None | None | Full Azure connection string |
| sas_token | str \| None | None | Shared Access Signature token |
Example:
# Account name and key
gf.configure_azure(
account_name="mystorageaccount",
account_key="accountkey123..."
)
# Connection string
gf.configure_azure(
connection_string="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=..."
)
# SAS token
gf.configure_azure(
account_name="mystorageaccount",
sas_token="sv=2021-06-08&ss=b&srt=sco&sp=r..."
)
gf.configure_postgis(...)¶
Configure default PostGIS connection parameters.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| host | str \| None | None | Database host |
| port | int \| None | None | Database port (typically 5432) |
| database | str \| None | None | Database name |
| user | str \| None | None | Database user |
| password | str \| None | None | Database password |
| sslmode | str \| None | None | SSL mode (disable, allow, prefer, require, verify-ca, verify-full) |
Example:
gf.configure_postgis(
host="db.example.com",
port=5432,
user="geouser",
password="geopassword",
sslmode="require"
)
# Now use shorter connection strings
ds = gf.open("postgresql:///mydb?table=public.buildings")
gf.configure_stac(...)¶
Configure STAC catalog authentication.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| api_key | str \| None | None | API key for authenticated catalogs |
| headers | dict[str, str] \| None | None | Custom HTTP headers |
| default_catalog | str \| None | None | Default STAC catalog URL |
Example:
# API key authentication
gf.configure_stac(api_key="my-stac-api-key")
# Bearer token authentication
gf.configure_stac(
headers={"Authorization": "Bearer eyJ..."}
)
# Combined configuration
gf.configure_stac(
api_key="my-api-key",
headers={"X-Custom-Header": "value"},
default_catalog="https://planetarycomputer.microsoft.com/api/stac/v1"
)
gf.configure_http(...)¶
Configure global HTTP settings for web requests.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| proxy | str \| None | None | HTTP proxy URL |
| timeout | int | 30 | Request timeout in seconds |
| headers | dict[str, str] \| None | None | Custom HTTP headers for all requests |
| verify_ssl | bool | True | Verify SSL certificates |
Example:
gf.configure_http(
proxy="http://corporate-proxy:8080",
timeout=60,
headers={"User-Agent": "MyApp/1.0"},
verify_ssl=True
)
gf.get_config()¶
Get the current GeoFabric configuration.
Returns: GeoFabricConfig - The global configuration object
Example:
config = gf.get_config()
print(config.s3.access_key_id)
print(config.postgis.host)
gf.reset_config()¶
Reset all configuration to defaults, clearing any programmatically set credentials.
Example:
gf.configure_s3(access_key_id="...", secret_access_key="...")
gf.reset_config() # Clears S3 credentials, reverts to env vars
ROI (Region of Interest) Functions¶
gf.roi.bbox(minx, miny, maxx, maxy, srid=4326)¶
Create a bounding box ROI.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| minx | float | - | Minimum X coordinate (longitude) |
| miny | float | - | Minimum Y coordinate (latitude) |
| maxx | float | - | Maximum X coordinate (longitude) |
| maxy | float | - | Maximum Y coordinate (latitude) |
| srid | int | 4326 | Spatial Reference ID |
Returns: ROI
Example:
# New York City area
roi = gf.roi.bbox(-74.10, 40.60, -73.70, 40.90)
result = ds.within(roi).to_pandas()
gf.roi.wkt(wkt_text, srid=4326)¶
Create an ROI from WKT (Well-Known Text) geometry.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| wkt_text | str | - | WKT geometry string |
| srid | int | 4326 | Spatial Reference ID |
Returns: ROI
Example:
# Polygon ROI
roi = gf.roi.wkt("POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9 40.7, -74.0 40.7))")
result = ds.within(roi).to_pandas()
Cache Functions¶
gf.configure_cache(...)¶
Configure the global query cache.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| cache_dir | str \| None | ~/.geofabric/cache | Cache directory path |
| enabled | bool | True | Enable/disable caching |
| max_size_mb | int | 1000 | Maximum cache size in MB |
Example:
gf.configure_cache(
cache_dir="/tmp/geofabric_cache",
enabled=True,
max_size_mb=2000
)
gf.get_cache()¶
Get the global cache instance.
Returns: QueryCache
Example:
cache = gf.get_cache()
print(f"Cache size: {cache.size_mb():.2f} MB")
cache.clear() # Clear all cached data
Validation Functions¶
gf.validate_geometries(engine, sql, geometry_col="geometry", id_col=None)¶
Validate geometries in a query result.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| engine | DuckDBEngine | - | Database engine |
| sql | str | - | SQL query |
| geometry_col | str | "geometry" | Geometry column name |
| id_col | str \| None | None | ID column for issue reporting |
Returns: ValidationResult
gf.compute_stats(engine, sql, geometry_col="geometry")¶
Compute statistics for a query result.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| engine | DuckDBEngine | - | Database engine |
| sql | str | - | SQL query |
| geometry_col | str | "geometry" | Geometry column name |
Returns: DatasetStats
Dataset Class¶
The Dataset class represents a geospatial data source.
Properties¶
| Property | Type | Description |
|---|---|---|
columns |
list[str] |
List of column names |
dtypes |
dict[str, str] |
Mapping of column names to data types |
Methods¶
dataset.query()¶
Create a new Query object for this dataset.
Returns: Query
dataset.within(roi)¶
Filter to geometries within the ROI.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| roi | ROI | Region of interest |
Returns: Query
dataset.where(sql_predicate)¶
Filter with a SQL predicate.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| sql_predicate | str | SQL WHERE clause condition |
Returns: Query
dataset.select(columns)¶
Select specific columns.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| columns | str \| Sequence[str] | Column(s) to select |
Returns: Query
dataset.limit(n)¶
Limit the number of results.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| n | int | Maximum number of rows |
Returns: Query
dataset.count()¶
Return the total number of rows.
Returns: int
dataset.head(n=10)¶
Return the first n rows.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| n | int | 10 | Number of rows |
Returns: pd.DataFrame
dataset.sample(n=10, seed=None)¶
Return a random sample of n rows.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| n | int | 10 | Number of rows |
| seed | int \| None | None | Random seed |
Returns: pd.DataFrame
dataset.validate(geometry_col="geometry", id_col=None)¶
Validate geometries in the dataset.
Returns: ValidationResult
dataset.stats(geometry_col="geometry")¶
Compute statistics for the dataset.
Returns: DatasetStats
Query Class¶
The Query class provides a fluent API for building and executing queries.
Filtering Methods¶
query.select(columns)¶
Select specific columns.
Returns: Query
query.where(sql_predicate)¶
Add a SQL WHERE condition.
Example:
query.where("population > 1000000")
query.where("type IN ('residential', 'commercial')")
Returns: Query
query.within(roi, geometry_col="geometry")¶
Filter to geometries within the ROI.
Returns: Query
query.limit(n)¶
Limit the number of results.
Returns: Query
Spatial Transformation Methods¶
query.buffer(distance, unit="meters", geometry_col="geometry")¶
Buffer geometries by a distance.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| distance | float | - | Buffer distance |
| unit | str | "meters" | Distance unit |
| geometry_col | str | "geometry" | Geometry column |
Returns: Query
query.simplify(tolerance, preserve_topology=True, geometry_col="geometry")¶
Simplify geometries with a tolerance.
Returns: Query
query.centroid(geometry_col="geometry")¶
Replace geometries with their centroids.
Returns: Query
query.convex_hull(geometry_col="geometry")¶
Replace geometries with their convex hulls.
Returns: Query
query.envelope(geometry_col="geometry")¶
Replace geometries with their bounding boxes.
Returns: Query
query.make_valid(geometry_col="geometry")¶
Repair invalid geometries using ST_MakeValid.
Returns: Query
query.transform(to_srid, from_srid=4326, geometry_col="geometry")¶
Transform geometries to a different CRS.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| to_srid | int | - | Target SRID (e.g., 3857) |
| from_srid | int | 4326 | Source SRID |
Returns: Query
query.clip(clip_wkt, geometry_col="geometry")¶
Clip geometries to a boundary (intersection).
Returns: Query
query.erase(erase_wkt, geometry_col="geometry")¶
Erase a region from geometries (difference).
Returns: Query
query.boundary(geometry_col="geometry")¶
Extract geometry boundaries.
Returns: Query
query.explode(geometry_col="geometry")¶
Explode multi-part geometries into single-part geometries.
Returns: Query
query.densify(max_distance, geometry_col="geometry")¶
Add intermediate vertices along geometry edges.
Returns: Query
query.point_on_surface(geometry_col="geometry")¶
Replace geometries with a point guaranteed to be on the surface.
Returns: Query
query.reverse(geometry_col="geometry")¶
Reverse the order of vertices in geometries.
Returns: Query
query.flip_coordinates(geometry_col="geometry")¶
Flip X and Y coordinates (useful for lat/lon vs lon/lat issues).
Returns: Query
query.collect(geometry_col="geometry")¶
Collect all geometries into a single MultiGeometry.
Returns: Query
query.symmetric_difference(other_wkt, geometry_col="geometry")¶
Compute symmetric difference (XOR) with another geometry.
Returns: Query
query.dissolve(by=None, geometry_col="geometry")¶
Dissolve geometries, optionally grouped by columns.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| by | str \| list[str] \| None | None | Group by column(s) |
Returns: Query
Computed Column Methods¶
query.with_area(column_name="area", geometry_col="geometry")¶
Add an area column computed from geometries.
Returns: Query
query.with_length(column_name="length", geometry_col="geometry")¶
Add a length/perimeter column computed from geometries.
Returns: Query
query.with_perimeter(column_name="perimeter", geometry_col="geometry")¶
Alias for with_length() for semantic clarity with polygons.
Returns: Query
query.with_bounds(geometry_col="geometry")¶
Add columns for geometry bounding box (minx, miny, maxx, maxy).
Returns: Query
query.with_geometry_type(column_name="geom_type", geometry_col="geometry")¶
Add a column with the geometry type.
Returns: Query
query.with_num_points(column_name="num_points", geometry_col="geometry")¶
Add a column with the number of points in each geometry.
Returns: Query
query.with_is_valid(column_name="is_valid", geometry_col="geometry")¶
Add a column indicating if each geometry is valid.
Returns: Query
query.with_distance_to(reference_wkt, column_name="distance", geometry_col="geometry")¶
Add a column with distance to a reference geometry.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| reference_wkt | str | WKT string of reference geometry |
Returns: Query
query.with_x(column_name="x", geometry_col="geometry")¶
Add a column with X coordinate (longitude).
Returns: Query
query.with_y(column_name="y", geometry_col="geometry")¶
Add a column with Y coordinate (latitude).
Returns: Query
query.with_coordinates(x_column="x", y_column="y", geometry_col="geometry")¶
Add X and Y coordinate columns.
Returns: Query
Spatial Join Methods¶
query.sjoin(other, predicate="intersects", how="inner", lsuffix="_left", rsuffix="_right", geometry_col="geometry")¶
Spatial join with another query.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| other | Query | - | Other query to join with |
| predicate | str | "intersects" | Spatial predicate |
| how | str | "inner" | Join type ('inner', 'left') |
Supported predicates: intersects, within, contains, touches, crosses, overlaps
Returns: Query
query.nearest(other, k=1, max_distance=None, geometry_col="geometry")¶
Find k nearest neighbors from another query.
Columns from the right query that conflict with left query columns are automatically renamed with a _right suffix. The distance to each neighbor is returned in the _distance column.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| other | Query | - | Query with reference features |
| k | int | 1 | Number of nearest neighbors |
| max_distance | float \| None | None | Maximum search distance |
Returns: Query - Contains left columns, renamed right columns (with _right suffix for conflicts), and _distance column
Example:
# Find nearest hospital for each building
buildings = gf.open("buildings.parquet")
hospitals = gf.open("hospitals.parquet")
result = buildings.query().nearest(hospitals.query(), k=1)
# If both have 'name' column: result has 'name' (building) and 'name_right' (hospital)
# Result also includes '_distance' column with the distance to nearest neighbor
Inspection Methods¶
query.count()¶
Return the number of rows matching the query.
Returns: int
query.head(n=10)¶
Return the first n rows as a DataFrame.
Returns: pd.DataFrame
query.sample(n=10, seed=None)¶
Return a random sample of n rows.
Returns: pd.DataFrame
query.columns¶
Return list of column names.
Returns: list[str]
query.dtypes¶
Return mapping of column names to data types.
Returns: dict[str, str]
query.describe(geometry_col="geometry")¶
Return summary statistics.
Returns: pd.DataFrame
query.explain()¶
Return the query execution plan.
Returns: str
query.sql()¶
Return the generated SQL.
Returns: str
Output Methods¶
query.to_pandas()¶
Execute query and return as pandas DataFrame.
Returns: pd.DataFrame
query.to_geopandas(geometry_col="geometry")¶
Execute query and return as GeoDataFrame.
Requires: pip install geofabric[viz]
Returns: gpd.GeoDataFrame
query.to_arrow()¶
Execute query and return as PyArrow Table.
Returns: pa.Table
query.to_parquet(path, geometry_col="geometry")¶
Export to Parquet file (with GeoParquet metadata if geopandas available).
Returns: str (path)
query.to_geojson(path)¶
Export to GeoJSON file.
Returns: str (path)
query.to_pmtiles(pmtiles_path, *, layer="features", maxzoom=14, minzoom=0, geometry_col="geometry")¶
Export to PMTiles format for web mapping.
Parameters:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| pmtiles_path | str | - | Output file path |
| layer | str | "features" | Layer name |
| maxzoom | int | 14 | Maximum zoom level |
| minzoom | int | 0 | Minimum zoom level |
Returns: str (path)
query.to_flatgeobuf(path, geometry_col="geometry")¶
Export to FlatGeobuf format.
Returns: str (path)
query.to_geopackage(path, layer="data", geometry_col="geometry")¶
Export to GeoPackage format.
Returns: str (path)
query.to_csv(path, include_wkt=True, geometry_col="geometry")¶
Export to CSV format (optionally with WKT geometry).
Returns: str (path)
query.show(geometry_col="geometry")¶
Display results interactively using lonboard.
Requires: pip install geofabric[viz]
Returns: Interactive map widget
Streaming Methods¶
query.iter_chunks(chunk_size=10000)¶
Iterate over query results in PyArrow RecordBatch chunks.
Yields: pa.RecordBatch
query.iter_dataframes(chunk_size=10000)¶
Iterate over query results as DataFrames.
Yields: pd.DataFrame
Aggregation Methods¶
query.aggregate(agg)¶
Aggregate results.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| agg | dict[str, str] | Aggregation specification (e.g., {"by": "type"}) |
Returns: pd.DataFrame
Data Classes¶
ValidationResult¶
Result of geometry validation.
| Field | Type | Description |
|---|---|---|
total_rows |
int |
Total number of rows |
valid_count |
int |
Number of valid geometries |
invalid_count |
int |
Number of invalid geometries |
null_count |
int |
Number of null geometries |
issues |
list[ValidationIssue] |
List of validation issues |
is_valid |
bool |
Whether all geometries are valid |
Methods:
- summary() - Returns a formatted summary string
ValidationIssue¶
A geometry validation issue.
| Field | Type | Description |
|---|---|---|
row_id |
Any |
Row identifier |
issue_type |
str |
Type of issue |
message |
str |
Issue description |
DatasetStats¶
Statistics about a dataset.
| Field | Type | Description |
|---|---|---|
row_count |
int |
Total number of rows |
column_count |
int |
Number of columns |
columns |
list[str] |
Column names |
dtypes |
dict[str, str] |
Column data types |
bounds |
tuple[float, float, float, float] \| None |
Bounding box (minx, miny, maxx, maxy) |
geometry_type |
str \| None |
Predominant geometry type |
crs |
str \| None |
Coordinate reference system |
null_counts |
dict[str, int] |
Null counts per column |
Methods:
- summary() - Returns a formatted summary string
ROI¶
Region of interest for spatial queries.
| Field | Type | Description |
|---|---|---|
kind |
str |
ROI type ('bbox' or 'wkt') |
wkt |
str \| None |
WKT geometry (for wkt kind) |
minx, miny, maxx, maxy |
float \| None |
Bounding box coordinates |
srid |
int |
Spatial reference ID (default: 4326) |
Configuration Classes¶
GeoFabricConfig¶
| Field | Type | Description |
|---|---|---|
s3 |
S3Config |
S3 configuration |
gcs |
GCSConfig |
GCS configuration |
azure |
AzureConfig |
Azure configuration |
postgis |
PostGISConfig |
PostGIS configuration |
stac |
STACConfig |
STAC configuration |
http |
HTTPConfig |
HTTP configuration |
S3Config¶
| Field | Type | Default | Description |
|---|---|---|---|
access_key_id |
str \| None |
None |
AWS access key ID |
secret_access_key |
str \| None |
None |
AWS secret access key |
region |
str \| None |
None |
AWS region |
session_token |
str \| None |
None |
Session token |
endpoint |
str \| None |
None |
Custom endpoint |
use_ssl |
bool |
True |
Use SSL |
GCSConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
access_key_id |
str \| None |
None |
HMAC access key |
secret_access_key |
str \| None |
None |
HMAC secret |
project |
str \| None |
None |
GCP project ID |
AzureConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
account_name |
str \| None |
None |
Storage account name |
account_key |
str \| None |
None |
Storage account key |
connection_string |
str \| None |
None |
Connection string |
sas_token |
str \| None |
None |
SAS token |
PostGISConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
host |
str \| None |
None |
Database host |
port |
int \| None |
None |
Database port |
database |
str \| None |
None |
Database name |
user |
str \| None |
None |
Database user |
password |
str \| None |
None |
Database password |
sslmode |
str \| None |
None |
SSL mode |
STACConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
api_key |
str \| None |
None |
API key |
headers |
dict[str, str] |
{} |
Custom headers |
default_catalog |
str \| None |
None |
Default catalog URL |
HTTPConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
proxy |
str \| None |
None |
Proxy URL |
timeout |
int |
30 |
Timeout in seconds |
headers |
dict[str, str] |
{} |
Custom headers |
verify_ssl |
bool |
True |
Verify SSL certs |
Credential Precedence¶
GeoFabric follows industry-standard credential resolution:
-
Programmatic configuration (highest priority)
gf.configure_s3(access_key_id="...", secret_access_key="...") -
Environment variables
export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." -
Credential files (e.g.,
~/.aws/credentials) -
Instance metadata (IAM roles, service accounts)
Version¶
Access the library version:
import geofabric as gf
print(gf.__version__)