Hi ONE DATA Community,
let’s suppose I am using a Database Connection Load processor and I have subsequent filter operations defined in ONE DATA query or Data-Filter processors.
Let’s furthermore assume, that I am using a preconfigured connection to an Oracle DB in the processor.
Are these filters being pushed down to the database by Spark/ONE DATA or is the complete data being loaded over the network into the Spark context and the filters getting applied there?
I just tried it out on one of our customer’s instances with a filter and rowcount, and the count+filter defined in ONE DATA is 5-10 times slower, than directly writing the count+filter into the database-connection load processor.
Thanks for your help