Skip to content

Comments

r.sim: Parallelize dx/dy derivatives computation using OpenMP#7094

Open
Abhi-d-gr8 wants to merge 4 commits intoOSGeo:mainfrom
Abhi-d-gr8:dxdy-multicore
Open

r.sim: Parallelize dx/dy derivatives computation using OpenMP#7094
Abhi-d-gr8 wants to merge 4 commits intoOSGeo:mainfrom
Abhi-d-gr8:dxdy-multicore

Conversation

@Abhi-d-gr8
Copy link
Contributor

@Abhi-d-gr8 Abhi-d-gr8 commented Feb 16, 2026

Overview

This PR parallelizes the dx/dy slope derivatives computation in simlib/derivatives.c (Horn 3×3 method), addressing #7039.

The main simulation already supports multiple cores, but the dx/dy computation was still serial. This change enables that step to use available CPU cores as well.


What changed

  • Added an OpenMP parallel for on the outer row loop.
  • Used default(none) with explicit shared(...) and private(...) scoping.
  • Preserved original math and behavior.
  • Serial builds (without OpenMP) remain unchanged.

Performance Evaluation

To isolate the impact of this change, timing was performed inside derivatives() only, excluding raster I/O and the main simulation loop.

This avoids masking the effect, since total r.sim.water runtime is dominated by the simulation phase.

Test Setup

  • Grid size: 5000 × 5000 (25 million cells)
  • Interleaved runs: 1-thread vs 8-thread
  • First run discarded to avoid warm-up bias
  • Measured using omp_get_wtime() around derivatives() only

Results (Apple M3 Air)

  • OMP_NUM_THREADS=1: ~0.19 s (average)
  • OMP_NUM_THREADS=8: ~0.06 s (average)
  • Speedup: ~3.2×

This confirms that the dx/dy computation scales across cores as intended.

As expected, the overall r.sim.water runtime improvement is smaller since derivatives() is only one stage of the workflow.

telegram-cloud-photo-size-5-6093558501759192775-y


Build / OpenMP Notes

To observe multi-threaded behavior in derivatives():

  • GRASS must be configured with OpenMP support:

    ./configure --with-openmp
    
  • The compiler must support OpenMP (e.g., libomp on macOS).

  • The source includes guarded OpenMP headers:

    #if defined(_OPENMP)
    #include <omp.h>
    #endif
  • Control the number of threads via:

    OMP_NUM_THREADS=<N>
    

If OpenMP is not available, the code compiles and runs serially without any behavior or numerical changes.


Reproducible Benchmark Command

For reference, this is the exact command used for the interleaved benchmark:

g.gisenv set=DEBUG=1
g.region n=5000 s=0 e=5000 w=0 res=1

echo "=== Interleaved 1-vs-8 thread runs (5000x5000 = 25M cells) ==="

for nprocs in 1 8 1 8 1 8; do
  r.sim.water elevation=elev rain_value=50 man_value=0.05 \
      depth=depth_bench discharge=disch_bench \
      nwalkers=5000 niterations=0 random_seed=1 nprocs=$nprocs \
      2>&1 | grep -E "OpenMP threads|derivatives time"
  echo "---"
done

Timing instrumentation (added for evaluation)

To isolate the performance of derivatives() (not masked by the main simulation),I temporarily added the following OpenMP-guarded timing block:

diff --git a/raster/r.sim/simlib/derivatives.c b/raster/r.sim/simlib/derivatives.c
--- a/raster/r.sim/simlib/derivatives.c
+++ b/raster/r.sim/simlib/derivatives.c
@@ -23,6 +23,11 @@
 H = (geometry->stepx / geometry->conv) * 8.0;
 V = (geometry->stepy / geometry->conv) * 8.0;

+#if defined(_OPENMP)
+    G_debug(1, "OpenMP threads (derivatives): %d", omp_get_max_threads());
+    double t0 = omp_get_wtime();
+#endif

 #if defined(_OPENMP)
 #pragma omp parallel for default(none) schedule(static) \
@@ -101,4 +109,8 @@
     }

+#if defined(_OPENMP)
+    G_debug(1, "dx/dy derivatives time: %.4f s", omp_get_wtime() - t0);
+#endif
 }

It used for benchmarking and does not affect numerical behavior.

@github-actions github-actions bot added raster Related to raster data processing C Related code is in C module labels Feb 16, 2026
@wenzeslaus
Copy link
Member

Can you please add the diff into your description to add the time measuring code?

@Abhi-d-gr8
Copy link
Contributor Author

Hi @wenzeslaus !
I have added the diff in the description (at the end), to calculate the time measuring code of the derivatives.c

@wenzeslaus
Copy link
Member

Awesome! I'm getting good results locally. Let's finalize it! Some valgrind-like scrutinizing would be good. There is some discussion in issues or PRs about how to do that with threads in our context in relation to r.mapcalc parallel code.

image
for nprocs in 1 8 1 8 1 8 1 4 1 4 1 4 1 36 1 36 1 36 1 36; do   r.sim.water elevation=elevation rain_value=50 man_value=0.05       depth=depth_bench discharge=disch_bench       nwalkers=5000 niterations=0 random_seed=1 nprocs=$nprocs --o       2>&1 | grep -E "threads";   echo "---"; done
diff --git a/raster/r.sim/simlib/derivatives.c b/raster/r.sim/simlib/derivatives.c
index 1519055783..9df42503b7 100644
--- a/raster/r.sim/simlib/derivatives.c
+++ b/raster/r.sim/simlib/derivatives.c
@@ -22,6 +22,11 @@ void derivatives(const Geometry *geometry, float **elevation, double **dx,
     H = (geometry->stepx / geometry->conv) * 8.0;
     V = (geometry->stepy / geometry->conv) * 8.0;
 
+#if defined(_OPENMP)
+    //G_debug(1, "OpenMP threads (derivatives): %d", omp_get_max_threads());
+    double t0 = omp_get_wtime();
+#endif
+
 #if defined(_OPENMP)
 #pragma omp parallel for default(none) schedule(static) \
     shared(geometry, elevation, dx, dy, H, V)           \
@@ -101,4 +106,9 @@ void derivatives(const Geometry *geometry, float **elevation, double **dx,
             dy[row][col] = ((c7 + c8 + c8 + c9) - (c1 + c2 + c2 + c3)) / V;
         }
     }
+
+#if defined(_OPENMP)
+    G_debug(1, "threads=%d, time=%.4f", omp_get_max_threads(), omp_get_wtime() - t0);
+#endif
+
 }
# generated

import pandas as pd
import matplotlib.pyplot as plt

# Raw data
data = [
    (1, 0.0537), (8, 0.0164),
    (1, 0.0540), (8, 0.0119),
    (1, 0.0546), (8, 0.0113),
    (1, 0.0557), (4, 0.0208),
    (1, 0.0538), (4, 0.0263),
    (1, 0.0534), (4, 0.0202),
    (1, 0.0557), (36, 0.0126),
    (1, 0.0525), (36, 0.0061),
    (1, 0.0536), (36, 0.0092),
    (1, 0.0553), (36, 0.0104),
]

df = pd.DataFrame(data, columns=["threads", "time"])

# Aggregate statistics: mean and full range
stats = (
    df.groupby("threads")["time"]
    .agg(mean="mean", min="min", max="max")
    .reset_index()
)

# Compute asymmetric error bars (min/max range)
yerr = [
    stats["mean"] - stats["min"],
    stats["max"] - stats["mean"],
]

# Plot
plt.figure()
plt.errorbar(
    stats["threads"],
    stats["mean"],
    yerr=yerr,
    fmt="o-"
)
plt.xlabel("Number of threads")
plt.ylabel("Time (s)")
plt.title("Threads vs Time with Full Data Range (min–max)")
plt.grid(True)

plt.show()

@@ -1,4 +1,6 @@
#include "simlib.h"
#include <grass/gis.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed? (It is for debug, but here without it...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is not needed , I was using different approaches so had included it earlier but yes its not required.

@wenzeslaus
Copy link
Member

Another thing, do the current tests actually cover the two cases for dx-dy and for enabled/disabled parallelism?

@Abhi-d-gr8
Copy link
Contributor Author

Abhi-d-gr8 commented Feb 16, 2026

Another thing, do the current tests actually cover the two cases for dx-dy and for enabled/disabled parallelism?

It doesn't cover external dx/dy for parallel I think but it doesn't need dx-dy parallelization at that point of time because it is being calculated externally for that particular cause and for the internal calculation we are using dx-dy so yeah for that we have the parallelization as of now.

@Abhi-d-gr8
Copy link
Contributor Author

Abhi-d-gr8 commented Feb 16, 2026

Awesome! I'm getting good results locally. Let's finalize it! Some valgrind-like scrutinizing would be good. There is some discussion in issues or PRs about how to do that with threads in our context in relation to r.mapcalc parallel code.

That's good ... could you give some more context on this ?
Also I did search up abit and Valgrind does not support Mac , the alternatives it showed were Dr.Memory and AddressSanitizer.

@github-actions github-actions bot added Python Related code is in Python tests Related to Test Suite labels Feb 16, 2026
@Abhi-d-gr8
Copy link
Contributor Author

I've removed the redundant include and added test_nodxdy_parallel, which specifically exercises the parallel derivatives calculation (nprocs=2) without external input maps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C Related code is in C module Python Related code is in Python raster Related to raster data processing tests Related to Test Suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants