There is quite some additional information in the SparkDebugResult (e.g. “real” compute CPU time aggregated across all workers - not wall time but actually time spent number crunching, memory usage, shuffle reads…).
These statistics currently are only available in DebugMode (Fast Debug gives you these metrics on the selected node) and look like this:
Assuming, that your workflow consists of one single Spark Action, it’s relatively easy to compare two versions by doing a debug run on the “sink” of the workflow being the last processor in the chain.
When there are multiple Actions in your workflow, it will get a bit tricky. Currently, we have no integrated measurement other than above metrics.
For sure, there is the technical possibility to generate aggregated metrics with building blocks of Debug Mode. If you need this in your life, feel free to file a feature request. I do really like the idea of having means of comparison for WFs wrt. their performance, memory usage and so on.