Changelog
All notable changes to this project are documented here.
2026-03-18 (4)
Fixed
Root logger level
DEBUG→INFO—setup_logging()inhls_utils.pyset the root logger toDEBUG, causing rasterio and GDAL internal trace messages to flood the log (thousands ofDEBUG [rasterio.env]/DEBUG [rasterio._io]lines per run). Changed toINFOso only pipeline-level messages and libraryWARNING+ output appear.Steps 04/05:
"Skipped"results logged atERROR— the result-dispatch logic in the main loop of steps 04 and 05 routed any string not starting with"OK"or"WARNING"tologger.error."Skipped (Exists)"(step 04) and"Skipped (No outliers)"(step 05) both fell into this branch. Added an explicitSkippedprefix check →logger.infobefore the error fallthrough.Steps 09/10:
BLOCKXSIZEwithoutTILED=YESin temp tile writes —to_raster()calls for intermediate temp GeoTIFFs in steps 09 and 10 passedblockxsize/blockysizewithouttiled=True, producing GDALCPLE_IllegalArgwarnings on every tile. Addedtiled=Trueto all three affectedto_raster()calls.Step 09: dask chunk mismatch
UserWarning—xr.open_dataset(nc_path, chunks={'time': 10})produced a dask performance warning when the on-disk NetCDF chunk layout differed from the requested chunking. Changed tochunks='auto'to align with on-disk layout, consistent with how steps 04, 05, and 10 open datasets.Step 03: CRS remap log message rewording — the southern hemisphere CRS adjustment log line used
[CRS fix]andcorrected to, implying an error condition. Replaced with[CRS]andsouthern hemisphere tile, remapped … → …to reflect that this is routine, expected processing for any southern hemisphere tile.
2026-03-18 (3)
Added
Structured logging across all Python steps and
hls_pipeline.sh— all ten Python pipeline scripts (steps 02–11) now use Python’sloggingmodule via a sharedsetup_logging(step_name)helper insrc/hls_utils.py. Every log line carries a timestamp, level, and bracketed step label:2026-03-18 20:55:49 INFO [04_mean_reproject] message. Previously all diagnostic output used bareprint()calls with no timestamps or severity levels. Key design points:Single implementation in
hls_utils.py; no logging boilerplate duplicated across scripts.Root logger handler guard (
if not root.handlers) makessetup_loggingidempotent — calling it inmultiprocessing.PoolorProcessPoolExecutorchild processes does not produce duplicate output.Worker functions (steps 02–05, 09–10) are unchanged; they return status strings/dicts to the main process, which performs all
logger.*()calls.StreamHandlertargetssys.stdoutso2>&1 | tee -a "$LOGFILE"in the shell captures all output.hls_pipeline.shgainslog_info,log_warn, andlog_errorhelper functions that emit the same timestamp + level +[pipeline]format, making combined shell/Python log output visually consistent.
2026-03-18 (2)
Fixed
Step 03 — southern hemisphere CRS stored as UTM North — HLS v2.0 GeoTIFFs for tiles south of the equator embed a UTM North zone (EPSG:326xx) with negative northings instead of the standard UTM South convention (EPSG:327xx, false_northing=10,000,000). All southern Africa (BioSCape) and other southern hemisphere tiles were affected.
HLSNetCDFAggregator.run()insrc/03_hls_netcdf_build.pynow detects this case after reading the first GeoTIFF: ifpyproj.to_epsg(min_confidence=20)returns a UTM North code (32601–32660) and the pixel-center y mean is negative, the CRS WKT is replaced with the UTM South equivalent (EPSG + 100, e.g. 32634 → 32734) and y-coordinates are shifted by +10,000,000 m. This correction is applied before chunk dicts are built, so both single-chunk and merged tiles are written with the correct EPSG:327xx CRS and positive UTM South northings. Previously rebuilt tiles will need to be regenerated with step 03 to pick up the corrected CRS and coordinates; downstream steps 04–11 that reproject toTARGET_CRSare not affected because they perform a full reprojection from the source CRS.
2026-03-18
Fixed
Step 03 —
_FillValuelost inmerge_chunks—process_netcdf_chunkcorrectly creates the VI variable withfill_value=np.nan, butmerge_chunksrecreated the same variable without afill_valueargument. netCDF4 therefore fell back to its built-in default sentinel (9.969209968386869e+36) for all missing cells in merged files, and the_FillValueattribute was absent from the output. Any tile requiring chunk merging (virtually all multi-year tiles with more acquisitions thanCHUNK_SIZE) was affected. Fixed by addingfill_value=np.nanto thecreateVariablecall inmerge_chunks(src/03_hls_netcdf_build.pyline 231). Newly rebuilt tiles will store missing data asNaNand carry a proper_FillValue = NaNattribute.
2026-03-12
Changed
Pipeline scripts moved to
src/— all 11 step scripts (01_hls_download_query.sh–11_hls_outlier_gpkg.py) andhls_utils.pyrelocated from the repository root intosrc/.hls_pipeline.shremains at the root. All invocation paths inhls_pipeline.sh,CLAUDE.md,README.md, anddocs/updated accordingly. Pythonimport hls_utilsstatements are unaffected (Python resolves the import from the script’s own directory).Step 03 — improved CF-1.8 CRS metadata in NetCDF output — the
spatial_refgrid-mapping variable now carries bothcrs_wkt(CF-1.8 standard) andspatial_ref(GDAL / rioxarray compatibility) attributes, plusgrid_mapping_name(derived via pyproj) andlong_name. Thex/ycoordinate variables now includestandard_name,long_name, andaxisattributes; thetimevariable now includesstandard_name,calendar, andaxis. A globalConventions = "CF-1.8"attribute is now written. Themerge_chunkspath mirrors all the same attributes. These changes makeda.rio.crs(rioxarray path 1 indetect_crs()) reliably resolve without falling back to the globalcrsattribute. Existing NetCDF files built with the prior format remain readable via thedetect_crs()fallback chain.Step 03 — CRS WKT stored as pyproj WKT2 instead of GDAL WKT1 —
HLSNetCDFAggregator.run()now generates the CRS WKT string viaProjCRS.from_user_input(crs).to_wkt()(pyproj WKT2) instead of rasterio’scrs.to_wkt()(GDAL WKT1). GDAL WKT1 for some HLS tiles lacks a top-levelAUTHORITY["EPSG","XXXXX"]node, causingpyproj.CRS.from_wkt(wkt).to_epsg()to returnNone. Downstream consumers that group tiles by EPSG code (e.g. cross-CRS reprojection checks) would treat same-zone tiles as different CRS groups. The pyproj WKT2 output always includes a resolvable authority node. Existing NetCDF files retain their original WKT; rebuilding with step 03 is recommended for tiles where EPSG grouping matters downstream.
2026-02-28
Added
NETCDF_COMPLEVEL— configurable zlib compression level (0–9, default1) for NetCDF time-series files written by step 03. Threaded throughHLSNetCDFAggregatorintochunk_infodicts (worker) andmerge_chunks.GEOTIFF_COMPRESS— configurable compression codec (defaultLZW) for all GeoTIFF outputs in steps 02 and 04–10. Accepts any codec supported by the local GDAL build (LZW,DEFLATE,ZSTD,NONE).GEOTIFF_BLOCK_SIZE— configurable internal tile block dimension in pixels (default512) for all tiled GeoTIFF outputs in steps 04–10.512is standard for desktop GIS;256is preferred for Cloud-Optimized GeoTIFFs.reproject_resolution()inhls_utils.py— CRS-unit-aware resolution helper replacing all hardcodedresolution=30calls in steps 04, 05, 09, 10. Returns metres unchanged for projected CRS; converts to approximate degrees for geographic CRS and logs a warning.
Fixed
Steps 04, 05, 09, and 10 produced a 1×1 pixel output with no valid data when
TARGET_CRSwas set to a geographic CRS (e.g.EPSG:4148) becauseresolution=30was interpreted as 30 degrees per pixel instead of 30 metres.
2026-02-26
Added
Read the Docs configuration and Sphinx documentation scaffold (
docs/)docs/overview.md: comprehensive pipeline guide (full user documentation)
Changed
README.md restructured as a GitHub landing page (elevator pitch, outputs table, key features, quick start, and link to RTD); full documentation moved to
docs/overview.mddocs/index.mdupdated to a hub toctree (overview, configuration, changelog); no longer uses{include}to pull README content
Fixed
System requirements table in README: added
gdalinfo(called directly by step 01 for GeoTIFF validation; provided by the conda environment via rasterio’s GDAL dependency); clarified that conda is required not just for Python packages but because it supplies native geospatial libraries (GDAL, PROJ, HDF5, GEOS)Per-file download validation with retry logic in step 01
Changed
NUM_WORKERSrestored to8inconfig.env
Removed
Bulk download mode retired; tile-by-tile is now the only download mode, reducing peak disk usage to roughly one tile’s worth of raw data at a time
2026-02-25
Added
Step 09 — CountValid mosaic: counts valid (unmasked, in-range) observations per pixel across all download cycles and mosaics the result into a single study-area-wide GeoTIFF. Reads from NetCDF files (step 03); independent of
TIMESLICE_WINDOWSand the time-series step.
Changed
Steps renumbered to reflect execution order:
Former step 09 (time-series) → Step 10
Former step 10 (outlier GeoPackage) → Step 11
2026-02-22
Added
Initial release of the HLS Vegetation Index Pipeline
11-step end-to-end workflow: download → VI calculation → NetCDF → reprojection → mosaics → time-series → outlier export
Support for NDVI, EVI2, and NIRv vegetation indices
Bitwise Fmask quality masking with independently configurable flags for cirrus, cloud, adjacent cloud, shadow, snow/ice, water, and aerosol mode
Tile-by-tile orchestration for steps 01–03, with optional space-saver flags to remove raw and/or VI intermediate files after each tile’s NetCDF is built
Configurable parallel processing via
NUM_WORKERSPer-VI valid range outlier detection with configurable bounds (
VALID_RANGE_NDVI,VALID_RANGE_EVI2,VALID_RANGE_NIRv)Multi-band seasonal composite stacks via
TIMESLICE_WINDOWS(step 10)GeoPackage export of per-pixel outlier observations with WGS84 coordinates (step 11)
Pre-flight band validation: the orchestrator checks that all bands required for the selected VIs are configured before any step executes
SKIP_APPROVALflag for automated / non-interactive pipeline runs