HLS VI Pipeline Guide
A production-ready, 11-step processing pipeline for computing and analyzing vegetation indices from NASA’s Harmonized Landsat and Sentinel-2 (HLS) surface reflectance data. Designed for researchers across ecology, agriculture, land management, and remote sensing who need reproducible, time-series vegetation analysis at scale.
Table of Contents
Overview
The HLS Vegetation Index Pipeline automates the full workflow from raw HLS satellite imagery to analysis-ready vegetation index products.
Input: NASA HLS v2.0 granules (Landsat-8/9 L30 and Sentinel-2 S30) queried directly from the NASA Common Metadata Repository (CMR) API.
Output:
Cloud-masked, quality-filtered vegetation index GeoTIFFs per granule
Per-tile CF-1.8 compliant NetCDF time-series stacks with sensor metadata
Temporal mean raster mosaics reprojected to a user-specified CRS
Outlier-flagged pixel summaries (raster mean + count) and point-vector GeoPackages
Multi-band seasonal composite stacks with user-defined, named time windows
Supported Vegetation Indices:
Index |
Formula |
Typical Range |
|---|---|---|
NDVI |
|
−1.0 to 1.0 |
EVI2 |
|
−1.0 to 2.0 |
NIRv |
|
−0.5 to 1.0 |
Data Source: NASA Harmonized Landsat and Sentinel-2 (HLS) — 30 m spatial resolution, ~2–3 day combined revisit time.
Key Features
End-to-end automation — a single
bash hls_pipeline.shcommand runs all 11 steps sequentiallyFlexible step control — run the full pipeline or any subset using named step identifiers or built-in aliases
Quality masking — bitwise Fmask decode with independently configurable cloud, cloud shadow, snow/ice, water, and aerosol modes
Outlier detection — identifies and exports pixels outside per-VI valid ranges as raster summaries and searchable point-vector GeoPackages
Seasonal composites — user-defined, named time windows produce multi-band stacks for phenological or climatological analysis, with window labels embedded in band metadata
Parallel processing — multiprocessing across configurable worker counts for all compute-intensive steps
Memory-efficient — dask-chunked xarray processing and streaming rasterio mosaic merges scale to large study extents without out-of-memory failures
Consistent tile filtering —
HLS_TILESenforces a fixed MGRS tile set uniformly across all 11 stepsCloud-optimized output — all GeoTIFF outputs use LZW compression, internal tiling, and predictor settings appropriate to their data type
Pre-flight validation — the pipeline validates that all bands required for the selected VIs are configured before any step executes
Tile-by-tile processing — steps 01–03 always run one tile at a time, reducing peak disk usage to roughly one tile’s worth of raw data; optional space-saver flags automatically remove raw and/or VI intermediate files after each tile’s NetCDF is built
Pipeline Architecture
NASA CMR API
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Step 01 · download │
│ src/01_hls_download_query.sh │
│ Query CMR API, estimate storage, download raw L30/S30 bands │
└──────────────────────────────┬──────────────────────────────────────┘
│ Raw GeoTIFF bands (B04, B05/B8A, Fmask)
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Step 02 · vi_calc │
│ src/02_hls_vi_calc.py │
│ Apply Fmask quality masking; compute NDVI / EVI2 / NIRv │
└──────────────────────────────┬──────────────────────────────────────┘
│ Per-granule VI GeoTIFFs
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Step 03 · netcdf │
│ src/03_hls_netcdf_build.py │
│ Aggregate GeoTIFFs → CF-1.8 NetCDF time-series per tile │
└──────────────────────────────┬──────────────────────────────────────┘
│ Per-tile NetCDF time-series
┌──────────┴──────────┐
▼ ▼
┌───────────────────┐ ┌──────────────────────┐
│ Steps 04 + 06 │ │ Steps 05 + 07 + 08 │
│ Mean products │ │ Outlier products │
└────────┬──────────┘ └──────────┬───────────┘
│ │
▼ ├──────────────────────┐
┌────────────────┐ ▼ ▼
│ Mean mosaic │ ┌─────────────────┐ ┌──────────────────┐
└────────┬───────┘ │ Outlier rasters│ │ Step 11 · GPKG │
│ └─────────────────┘ └──────────────────┘
▼
┌────────────────┐
│ Step 10 │
│ Time-series │
│ stacks │
└────────────────┘
Step Summary
Step |
Script |
Step Name |
Description |
|---|---|---|---|
01 |
|
|
Query NASA CMR API; download raw HLS granules (L30/S30 bands + Fmask) |
02 |
|
|
Compute VI GeoTIFFs; apply configurable bitwise Fmask quality masking |
03 |
|
|
Aggregate per-granule GeoTIFFs into CF-1.8 compliant NetCDF time-series per tile |
04 |
|
|
Temporal mean per tile; reproject to |
05 |
|
|
Outlier-aware mean + valid count per tile; reproject |
06 |
|
|
Mosaic per-tile means into a study-area-wide GeoTIFF |
07 |
|
|
Mosaic outlier-filtered means |
08 |
|
|
Mosaic valid-observation counts |
09 |
|
|
Count valid observations per pixel across all download cycles; mosaic into a study-area-wide GeoTIFF |
10 |
|
|
Multi-band seasonal composite stacks with user-defined time windows |
11 |
|
|
Export per-pixel outlier observations (value, date, location) to GeoPackage |
Prerequisites
1. NASA Earthdata Account
HLS data is hosted on NASA Earthdata. You must:
Register for a free account at https://urs.earthdata.nasa.gov/
Accept the required End User License Agreements (EULAs) for HLS data products on the Earthdata website
Create a
~/.netrcfile with your credentials:
machine urs.earthdata.nasa.gov login <your_username> password <your_password>
Secure the file:
chmod 600 ~/.netrc
2. System Requirements
Tool |
Minimum Version |
Notes |
|---|---|---|
conda or mamba |
any |
Required for environment management. conda-forge packages also supply the native geospatial libraries (GDAL, PROJ, HDF5, GEOS) that the pipeline depends on — these are not separate system installs |
bash |
3.2+ |
Uses standard POSIX-compatible syntax; tested on macOS (ZSH) and Linux |
wget or curl |
any |
Used by the download script (step 01); the script auto-detects whichever is available |
gdalinfo |
any |
GDAL command-line tool used by step 01 to validate each downloaded GeoTIFF; provided automatically by the conda environment via rasterio’s GDAL dependency — no separate installation needed, but the pipeline must be run with the conda environment active |
3. MGRS Tile Identification
HLS data is organized by MGRS (Military Grid Reference System) tiles. You need the MGRS tile IDs covering your area of interest before configuring the pipeline. Useful tools:
NASA Earthdata Search — interactive map with MGRS tile overlays
MGRS Mapper — web-based tile lookup by location
QGIS with the HLS tiling shapefile
4. Storage Estimate
When download is active, the pipeline prints a comprehensive storage estimate before any data are downloaded, then prompts for confirmation. The estimate covers all active steps and shows both a Peak and Final figure.
Steps 01–03 per-granule sizes (empirically calibrated):
Product |
Approximate size per granule |
|---|---|
Raw bands (NIR, Red, Fmask) |
~45 MB |
VI GeoTIFF per VI |
~15 MB |
NetCDF contribution per VI |
~12 MB (zlib compression achieves ~7× on NaN-heavy time-series) |
Steps 04–11 per-tile sizes (LZW-compressed estimates):
Product |
Approximate size |
|---|---|
Mean reprojected tile (step 04) |
~55 MB per tile per VI |
Outlier tiles — mean + count (step 05) |
~5 MB per tile per VI (highly variable — nearly 0 for NDVI with default valid range) |
Float32 mosaic contribution (steps 06, 07) |
~35 MB per tile per VI |
uint16 mosaic contribution (steps 08, 09) |
~3 MB per tile per VI |
Time-series stack per window (step 10) |
~50 MB per tile per VI per window |
Outlier GeoPackage (step 11) |
variable — depends on outlier rate |
Peak vs. Final storage:
Est. Peak — worst-case simultaneous disk usage during the tile loop: all NetCDF files accumulated so far plus one tile’s raw and/or VI files still in progress
Est. Final — what remains on disk when all active steps complete (with space savers: raw and VI intermediates are deleted; without: they persist)
A 10-tile study area with 5 years of data (bi-weekly acquisitions) running all 11 steps can require 100–300+ GB depending on VIs and time windows. Enable the space-saver options to minimize peak disk usage — raw and VI intermediate files are deleted per tile as soon as each NetCDF is built.
Installation
1. Clone the Repository
git clone https://github.com/stephenconklin/HLS_VI_Pipeline.git
cd HLS_VI_Pipeline
2. Create the Conda Environment
conda env create -f environment.yml
conda activate hls_pipeline
Tip: Mamba resolves conda environments significantly faster than conda:
mamba env create -f environment.yml
3. Verify the Installation
python -c "import numpy, pandas, rasterio, netCDF4, xarray, rioxarray, dask, fiona; print('Environment OK')"
Configuration
All pipeline parameters are set in config.env. This file is sourced by hls_pipeline.sh before each step — no environment variables need to be exported manually.
For the full parameter reference, see the Configuration Reference.
Paths
BASE_DIR="/path/to/your/project" # Root output directory — edit this first
LOG_DIR="${BASE_DIR}/0_Logs"
RAW_HLS_DIR="${BASE_DIR}/1_Raw" # Downloaded HLS granules
VI_OUTPUT_DIR="${BASE_DIR}/2_Interim/1_VI_Products"
NETCDF_DIR="${BASE_DIR}/2_Interim/2_NetCDF"
REPROJECTED_DIR="${BASE_DIR}/2_Interim/3_VI_Mean_Tiles"
REPROJECTED_DIR_OUTLIERS="${BASE_DIR}/2_Interim/4_VI_Outlier_Tiles"
MOSAIC_DIR="${BASE_DIR}/3_Out/1_Mosaic"
TIMESLICE_OUTPUT_DIR="${BASE_DIR}/3_Out/2_TimeSeries"
OUTLIER_GPKG_DIR="${BASE_DIR}/3_Out/3_Outlier_Points"
Vegetation Indices
PROCESSED_VIS="NDVI EVI2 NIRv" # Space-separated; compute all three
PROCESSED_VIS="NDVI" # Compute NDVI only
Step Control
STEPS="all" # Run all 11 steps
STEPS="products" # Steps 02–11 (skip download)
STEPS="mosaics" # Steps 06–08 only
STEPS="outliers" # Steps 05, 07, 08, 11
STEPS="vi_calc netcdf mean_flat" # Named steps, space-separated
Step aliases:
Alias |
Expands To |
|---|---|
|
Steps 01–11 |
|
Steps 02–11 |
|
Steps 01–03 |
|
Steps 06–08 |
|
Steps 05, 07, 08, 11 |
Named step identifiers:
download · vi_calc · netcdf · mean_flat · outlier_flat · mean_mosaic · outlier_mosaic · outlier_counts · count_valid_mosaic · timeseries · outlier_gpkg
Tile List
HLS_TILES="17TNE 17TNF 17TNG 17TPE 17TPF" # Space-separated MGRS tile IDs
Tile filtering is applied at every step. If HLS_TILES is unset, all tiles found in each input directory are processed.
Download Settings
DOWNLOAD_CYCLES="2015-12-01|2016-03-31 2016-12-01|2017-03-31" # Space-separated date ranges
CLOUD_COVERAGE_MAX=75 # Maximum cloud cover % accepted by CMR query
SPATIAL_COVERAGE_MIN=0 # Minimum spatial coverage % accepted by CMR query
Band Selection
Specifies which bands to download for each HLS sensor:
L30_BANDS="B05 B04 Fmask" # Landsat: NIR (B05), Red (B04), Quality (Fmask)
S30_BANDS="B8A B04 Fmask" # Sentinel-2: NIR narrow (B8A), Red (B04), Quality (Fmask)
Note: The pipeline validates that the bands required for each VI in
PROCESSED_VISare present in the band lists before any step runs and will warn you if something is missing.
Fmask Quality Masking
Control which pixel categories are excluded. Masked pixels are set to NaN and excluded from all downstream computations.
MASK_CIRRUS="TRUE"
MASK_CLOUD="TRUE"
MASK_ADJACENT_CLOUD="TRUE"
MASK_CLOUD_SHADOW="TRUE"
MASK_SNOW_ICE="TRUE"
MASK_WATER="TRUE"
MASK_AEROSOL_MODE="MODERATE" # NONE | HIGH | MODERATE | LOW
HLS_SCALE_FACTOR=0.0001 # HLS surface reflectance scale factor
Aerosol masking modes:
Mode |
Behaviour |
|---|---|
|
No aerosol masking applied |
|
Mask high-aerosol pixels only |
|
Mask moderate and high aerosol pixels |
|
Mask all non-zero aerosol levels |
Valid Ranges (Outlier Detection)
Pixels within the valid range contribute to the temporal mean. Pixels outside the valid range are treated as outliers and tracked separately through Steps 05, 07, 08, and 11.
VALID_RANGE_NDVI="-1,1"
VALID_RANGE_EVI2="-1,2"
VALID_RANGE_NIRv="-0.5,1"
Space Saver Options
The pipeline always processes steps 01–03 one tile at a time — downloading, computing VIs, and building the NetCDF before moving to the next tile. This limits peak disk usage to roughly one tile’s worth of raw data at any given moment.
Two optional space-saver flags delete intermediate files for each tile immediately after its NetCDF is built. Both default to FALSE and only take effect when netcdf is included in STEPS.
SPACE_SAVER_REMOVE_RAW="FALSE" # Delete raw HLS band + Fmask files after NetCDF is built
SPACE_SAVER_REMOVE_VI="FALSE" # Delete VI GeoTIFF files after NetCDF is built
Note: Both space-saver options can be enabled together. Raw files and VI GeoTIFFs are no longer needed once the NetCDF time-series is complete — all downstream steps (04–11) read only from the NetCDF files.
Download Approval
Before any data is downloaded, the pipeline prints a storage estimate and prompts for confirmation. To bypass this prompt in automated or non-interactive contexts:
SKIP_APPROVAL="FALSE" # Set TRUE to skip the download confirmation prompt
Failed tiles are skipped with a logged error and the pipeline continues to the next tile. A summary of succeeded and failed tiles is printed at the end of the tile loop.
Time-Series Windows
Define named time windows for seasonal composite stack generation (Step 10):
TIMESLICE_ENABLED="TRUE"
TIMESLICE_STAT="mean"
TIMESLICE_WINDOWS="Winter_2015_2016:2015-12-01|2016-03-31 \
Summer_2016:2016-06-01|2016-08-31 \
Winter_2016_2017:2016-12-01|2017-03-31"
Format: label:YYYY-MM-DD|YYYY-MM-DD, space-separated. Labels must be alphanumeric with underscores only. Each label becomes the band description in the output stack.
Processing Performance
NUM_WORKERS=8 # Parallel worker processes — set to available CPU cores
CHUNK_SIZE=10 # NetCDF time slices per dask chunk
TARGET_CRS="EPSG:6350" # Output CRS — must be a projected CRS (metres)
Output Format
NETCDF_COMPLEVEL=1 # zlib compression level for NetCDF files (0–9; 1 = fast, 9 = smallest)
GEOTIFF_COMPRESS="LZW" # GeoTIFF codec: LZW (default), DEFLATE, ZSTD, NONE
GEOTIFF_BLOCK_SIZE=512 # Internal tile block size in pixels (512 for GIS; 256 for COG/web)
Quick Start
A minimal example to process a single summer season for two tiles:
1. Edit config.env:
BASE_DIR="/path/to/your/output"
HLS_TILES="18TVL 18TVM"
PROCESSED_VIS="NDVI"
DOWNLOAD_CYCLES="2020-06-01|2020-08-31"
STEPS="all"
NUM_WORKERS=4
2. Activate the environment and run:
conda activate hls_pipeline
bash hls_pipeline.sh
The pipeline will print a run summary, prompt you to confirm the storage estimate before downloading, then execute each step sequentially, logging all output to ${BASE_DIR}/0_Logs/.
Running the Pipeline
Full Run
bash hls_pipeline.sh
The pipeline prints a run summary showing active steps, VIs, tile count, worker count, and target CRS, then logs all step output to a timestamped file in LOG_DIR.
Partial Runs
Use the STEPS variable to reprocess specific stages without rerunning the entire pipeline:
# In config.env
STEPS="mean_mosaic outlier_mosaic outlier_counts"
# Or inline as a one-off override
STEPS="mean_mosaic" bash hls_pipeline.sh
Resuming After Failure
Each step skips output files that already exist. If a run is interrupted mid-step, simply rerun the pipeline — completed output files will not be regenerated.
Step Reference
Step 01 — Download
Downloads HLS granules via the NASA CMR API with an interactive storage estimate and user approval gate.
Inputs: NASA CMR API (date ranges, tile IDs, cloud/spatial coverage thresholds, band list)
Outputs: Raw GeoTIFFs in
RAW_HLS_DIR, organized by sensor/year/tile hierarchyKey feature: Prints a comprehensive storage estimate covering all active steps (01–11) with Peak and Final figures, then requires user confirmation before downloading (set
SKIP_APPROVAL=TRUEto bypass for automated runs)Credentials required:
~/.netrcwith NASA Earthdata login
Step 02 — VI Calculation
Applies Fmask quality masking and computes vegetation indices from raw surface reflectance bands.
Inputs: Raw L30/S30 band GeoTIFFs from
RAW_HLS_DIROutputs: One GeoTIFF per granule per VI in
VI_OUTPUT_DIRMasking: Configurable bitwise Fmask decode (cloud, shadow, snow/ice, water, aerosol)
Parallelism:
multiprocessing.PoolwithNUM_WORKERS
Step 03 — NetCDF Build
Aggregates per-granule VI GeoTIFFs into per-tile time-series files.
Inputs: VI GeoTIFFs from
VI_OUTPUT_DIROutputs:
T{TILE}_{VI}.ncinNETCDF_DIR— CF-1.8 compliant withdays since 1970-01-01time encoding and sensor (L30/S30) metadata per observationParallelism: Chunked writes with
multiprocessing.PoolSouthern hemisphere CRS correction: HLS v2.0 GeoTIFFs for tiles south of the equator embed a UTM North zone (EPSG:326xx) with negative northings instead of the standard UTM South convention (EPSG:327xx, false_northing=10,000,000). Step 03 automatically detects this case (UTM North EPSG code + negative y centroid) and corrects it: the CRS is replaced with the UTM South equivalent (EPSG + 100) and y-coordinates are shifted by +10,000,000 m. The output NetCDF files carry the correct EPSG:327xx CRS with positive northings, as expected by GIS tools and CF-1.8 validators. This correction is applied transparently and logged with a
[CRS fix]prefix in the pipeline output.
Step 04 — Temporal Mean + Reproject
Computes the pixel-wise temporal mean for each tile across the full date range and reprojects to TARGET_CRS.
Inputs: NetCDF time-series from
NETCDF_DIROutputs:
T{TILE}_{VI}_average_{VI}_{CRS}.tifinREPROJECTED_DIR(30 m, Cloud-Optimized GeoTIFF)Note: Valid-range filtering is applied before computing the mean; outlier pixels do not contribute to the average
Step 05 — Outlier Extraction + Reproject
Identifies pixels with unmasked values outside the per-VI valid range and summarizes them per tile.
Inputs: NetCDF time-series from
NETCDF_DIROutputs: Two GeoTIFFs per tile per VI in
REPROJECTED_DIR_OUTLIERS:Outlier mean: temporal mean of out-of-range values
Outlier count: number of time slices with an outlier at each pixel
Steps 06–08 — Mosaics
Merge all per-tile rasters into study-area-wide mosaics using streaming merge (memory-efficient, handles large tile counts).
Step |
Output Filename |
Notes |
|---|---|---|
06 |
|
Temporal mean mosaic |
07 |
|
Outlier mean mosaic |
08 |
|
Outlier count mosaic (uint16, nodata=0) |
Step 09 — CountValid Mosaic
Counts valid observations per pixel across the full downloaded period and mosaics all tiles into a study-area-wide GeoTIFF. The temporal scope is implicitly defined by DOWNLOAD_CYCLES — only data within those cycles is present in the NetCDF files. This step reads directly from NetCDF and is fully independent of Step 10 (timeseries).
Inputs: Per-tile NetCDF time-series from
NETCDF_DIR(produced by Step 03)Outputs:
HLS_Mosaic_CountValid_{VI}_{CRS}.tifinMOSAIC_DIRTo run independently:
STEPS="count_valid_mosaic" bash hls_pipeline.shBand description:
CountValid_AllDownloadCyclesembedded in band metadata
Step 10 — Time-Series Stacks
Builds multi-band seasonal composite GeoTIFFs from user-defined time windows.
Inputs: NetCDF time-series +
TIMESLICE_WINDOWSdefinitions fromconfig.envOutputs: Two multi-band stacks per VI in
TIMESLICE_OUTPUT_DIR:HLS_TimeSeries_{VI}_Mean_{CRS}.tif— temporal mean per window (one band per window)HLS_TimeSeries_{VI}_CountValid_{CRS}.tif— valid-pixel count per window (one band per window)
Band descriptions: Window labels are embedded in band metadata and are visible in QGIS, GDAL, and rasterio
Step 11 — Outlier GeoPackage
Exports every individual outlier pixel-date observation as a point feature, enabling spatial and temporal exploration of anomalies.
Inputs: NetCDF time-series from
NETCDF_DIROutputs:
HLS_outliers_{VI}.gpkginOUTLIER_GPKG_DIR(WGS84 / EPSG:4326 points)Feature attributes:
tile_id,vi_type,sensor,date,vi_value,geometryUse cases: Sensor artifact investigation, data quality review, anomaly mapping
Output Products
Directory Structure
${BASE_DIR}/
├── 0_Logs/
│ └── hls_pipeline_YYYYMMDD_HHMMSS.log
├── 1_Raw/
│ ├── L30/YYYY/HH/T/T/T/
│ │ └── HLS.L30.T{TILE}.{DATE}.v2.0.{Band}.tif
│ └── S30/YYYY/HH/T/T/T/
│ └── HLS.S30.T{TILE}.{DATE}.v2.0.{Band}.tif
├── 2_Interim/
│ ├── 1_VI_Products/
│ │ └── HLS.{L30|S30}.T{TILE}.{DATE}.v2.0.{VI}.tif
│ ├── 2_NetCDF/
│ │ └── T{TILE}_{VI}.nc
│ ├── 3_VI_Mean_Tiles/
│ │ └── T{TILE}_{VI}_average_{VI}_{CRS}.tif
│ └── 4_VI_Outlier_Tiles/
│ ├── T{TILE}_{VI}_outlier_mean_{VI}_{CRS}.tif
│ └── T{TILE}_{VI}_outlier_count_{VI}_{CRS}.tif
└── 3_Out/
├── 1_Mosaic/
│ ├── HLS_Mosaic_{VI}_{CRS}.tif
│ ├── HLS_Mosaic_Outlier_Mean_{VI}_{CRS}.tif
│ ├── HLS_Mosaic_Outlier_Count_{VI}_{CRS}.tif
│ └── HLS_Mosaic_CountValid_{VI}_{CRS}.tif (step 09 — count across all download cycles)
├── 2_TimeSeries/
│ ├── HLS_TimeSeries_{VI}_Mean_{CRS}.tif (N bands — one per window)
│ └── HLS_TimeSeries_{VI}_CountValid_{CRS}.tif (N bands — one per window)
└── 3_Outlier_Points/
└── HLS_outliers_{VI}.gpkg
File Format Reference
Product |
Format |
Dtype |
Nodata |
Compression |
|---|---|---|---|---|
VI GeoTIFF (step 02) |
GeoTIFF |
float32 |
NaN |
LZW |
NetCDF time-series (step 03) |
NetCDF-4 |
float32 |
NaN |
zlib |
Mean tile (step 04) |
COG GeoTIFF |
float32 |
NaN |
LZW + predictor 3 |
Outlier mean tile (step 05) |
GeoTIFF |
float32 |
NaN |
LZW + predictor 3 |
Outlier count tile (step 05) |
GeoTIFF |
uint16 |
0 |
LZW + predictor 2 |
Mean / outlier mosaics (steps 06–07) |
GeoTIFF |
float32 |
NaN |
LZW |
Count mosaic (step 08) |
GeoTIFF |
uint16 |
0 |
LZW |
CountValid mosaic (step 09) |
GeoTIFF |
uint16 |
0 |
LZW + predictor 2 |
Time-series stacks (step 10) |
BigTIFF |
float32 / uint16 |
NaN / 0 |
LZW |
Outlier GeoPackage (step 11) |
GeoPackage |
— |
— |
— |
Nodata note: A value of
0in count products means no data at that pixel, not missing data in the raster sense. For outlier count products (steps 05/08),0means no outlier observations were recorded. For the CountValid mosaic (step 09),0means no valid observations were found across all download cycles.
Advanced Usage
Running a Subset of Steps
# In config.env — rerun mosaics after adding more tiles
STEPS="mean_mosaic outlier_mosaic outlier_counts"
# One-off override without editing config.env
STEPS="outlier_gpkg" bash hls_pipeline.sh
Computing Multiple VIs
PROCESSED_VIS="NDVI EVI2 NIRv"
All steps loop over each VI in PROCESSED_VIS. Ensure all required bands are listed in L30_BANDS and S30_BANDS. For NDVI, EVI2, and NIRv, the required bands are B04 (Red), B05/B8A (NIR), and Fmask.
Custom Time Windows
Time windows can span any date range and do not need to follow calendar boundaries:
TIMESLICE_WINDOWS="Peak_Green_2020:2020-06-15|2020-07-31 \
Late_Summer_2020:2020-08-01|2020-09-15 \
Dormant_2020_2021:2020-11-15|2021-03-15"
The output stack will have one band per window. Band descriptions (window labels) are stored in the GeoTIFF metadata and are visible in QGIS, ArcGIS Pro, and when reading with rasterio or GDAL.
Minimizing Disk Usage
For large study areas with limited storage, enable both space-saver options. This keeps only one tile’s raw and VI files on disk at a time — all other storage is the accumulated NetCDF outputs.
SPACE_SAVER_REMOVE_RAW="TRUE"
SPACE_SAVER_REMOVE_VI="TRUE"
STEPS="build_nc"
After all tiles complete steps 01–03, continue with downstream steps:
STEPS="mean_flat outlier_flat mean_mosaic outlier_mosaic outlier_counts count_valid_mosaic timeseries outlier_gpkg"
Note: The space-saver options only apply during steps 01–03. Steps 04–11 always operate across all tiles using the NetCDF files.
Tuning Valid Ranges
Default valid ranges are broad. Narrow them for domain-specific applications:
VALID_RANGE_NDVI="0.1,0.9" # Forest canopy: exclude bare soil and sparse cover
VALID_RANGE_EVI2="-0.5,1.5" # Wider range for agricultural areas
Any unmasked pixel outside these bounds is flagged as an outlier and routed to Steps 05, 07, 08, and 11.
Optimising Worker Count
Set NUM_WORKERS based on available physical CPU cores. A safe starting point is (total cores − 2) to leave headroom for the OS and dask overhead:
NUM_WORKERS=14 # Example for a 16-core workstation
Reduce this value if you encounter out-of-memory errors during Steps 04, 05, or 09.
Changing the Output CRS
The default CRS is EPSG:6350 (NAD83 Conus Albers Equal Area). Change TARGET_CRS to any projected CRS supported by PROJ:
TARGET_CRS="EPSG:32618" # UTM Zone 18N (WGS84)
TARGET_CRS="EPSG:9221" # South African Albers Equal Area
TARGET_CRS="EPSG:3857" # Web Mercator
All reprojected outputs and mosaics (Steps 04–11) will use the new CRS. The CRS code (dots stripped) is embedded in output filenames (e.g., EPSG6350).
Note:
TARGET_CRSmust be a projected CRS (linear units such as metres). If a geographic CRS (degrees) is set, the pipeline will accept it but print a[WARN]and convert the 30 m output resolution to an approximate degree equivalent — which is not equal-area and not recommended for pixel-level VI analysis.
Troubleshooting
Download fails with a 401 or authentication error
Verify your ~/.netrc credentials, file permissions (chmod 600 ~/.netrc), and that you have accepted all required EULAs for HLS on the NASA Earthdata website.
syntax error or unexpected token in hls_pipeline.sh
Verify that config.env is present in the repository root and that all required variables are set. The pipeline has been tested on macOS (ZSH) and Linux.
Out-of-memory errors during Steps 04, 05, 09, or 10
Reduce NUM_WORKERS and/or CHUNK_SIZE in config.env. Fewer simultaneous dask tasks significantly reduce peak RAM usage.
A step finishes but output files are missing
Check the timestamped log file in LOG_DIR for error messages. Most steps print tile-level warnings when inputs are missing or when a tile is skipped.
The mosaic is missing one or more tiles
Verify that the missing tiles are included in HLS_TILES and that their intermediate reprojected GeoTIFFs exist in REPROJECTED_DIR (for Steps 06–07) or REPROJECTED_DIR_OUTLIERS (for Steps 07–08).
NetCDF time coordinate errors or wrong dates
HLS granule filenames must not be renamed. Step 03 parses acquisition dates directly from the standard HLS filename convention: HLS.{L30|S30}.T{TILE}.{YYYYDDD}T{HHMMSS}.v2.0.
Step 10 time-series stack has fewer bands than expected
If TIMESLICE_ENABLED is not set to "TRUE", Step 10 will skip processing. Also verify that at least one observation falls within each configured time window for the tiles in HLS_TILES.
Credits & Acknowledgments
Adapted Code
NASA HLS Download Script
src/01_hls_download_query.sh is adapted in part from the NASA getHLS.sh script, published by the NASA HLS Data Resources Team under the Apache 2.0 License.
HLS Data Citation
Users of this pipeline who publish results should cite the HLS datasets:
Masek, J., Ju, J., Roger, J.-C., Skakun, S., Vermote, E., Claverie, M., Dungan, J., Yin, Z., Freitag, B., Justice, C. (2021). HLS Operational Land Imager Surface Reflectance and TOA Brightness Daily Global 30m v2.0 [Data set]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/HLS/HLSL30.002
Skakun, S., Ju, J., Roger, J.-C., Vermote, E., Masek, J., Justice, C. (2021). HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0 [Data set]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/HLS/HLSS30.002
Python Libraries
This pipeline is built on the following open-source libraries:
Library |
Purpose |
License |
|---|---|---|
GeoTIFF I/O, reprojection, mosaicing |
BSD-3 |
|
N-dimensional array and NetCDF processing |
Apache 2.0 |
|
Spatial extensions for xarray |
Apache 2.0 |
|
Parallel and chunked computation |
BSD-3 |
|
GeoPackage vector I/O |
BSD-3 |
|
Array mathematics |
BSD-3 |
|
Tabular data handling |
BSD-3 |
|
Low-level NetCDF I/O |
MIT |
|
Point geometry construction |
BSD-3 |
|
CRS transformations |
MIT |
License
This project is licensed under the MIT License.
src/01_hls_download_query.sh is adapted in part from NASA’s getHLS.sh, released by NASA under the Apache 2.0 License.