- Can we know how much memory a data table in OD occupies?
- Can we see how many partitions a data table (in a parquet format) have? Preferably using REST API calls or some SQL tricks!
What I want to achieve is: that one of our customers is interested in seeing all this information in the Apps as a KPI. As far as I know, all the above magics can be done by guardians manually going into a server. But I am wondering if there are other ways to do that.
Thanks in advance!
Create a new workflow.
Add a “Data Table Load” processor to a workflow to load up the data table you would like to inspect.
Click on “Save & Debug” to run the Full Debug Mode, then open up the debugger tab of the Data Table Load processor.
- The memory taken up by a data table (in bytes) should be in the “SPARK_DEBUG_INFO” → processorMetrics → inputBytesRead
- You can see the number of partitions of a data table in the “SPARK_DEBUG_INFO” → rddDependences → partitions field.
No idea how to get this information via the REST API though!
Thanks, @DanyEle, but I found that this information obtained from
save&debug is incorrect. I did several tests. For example
SPARK_debug_info it says 32 partitions. However, when I checked the corresponding data table from onedata-server directly, I found that it has 1000 partitions (actually this table was made using manipulate partitions processor).
and also the memory is not correct, in spark debug info it says around 4Mb, actually, this parquet file in the OD server is around 45Mb.
I am not sure, if Spark considers all data for debug mode, maybe it is taking only some partitions for debugging purposes.
Maybe @Flogge has some more insights about it!