Hello.
Our team having memory management and cpu issue.
I use cpu, not gpu. question below is about CPU only.
cpu question:
1)
I found that python process handle only one core because of global lock. even if I set multi_thread=True is just work for concurrency not for parallelism.
So If I want to use all of cores, I have to make processes as much as numbers of cores.
but when I run bentoml, it make only one runner process I guess this manipulate only one cores.
what if I use all of my core in one model. what should I do?
2)
I test performance between using one runner and using two runner.
first I send 5000 request in one runner.
second I send 2500 request to one runner and rest 2500 request to other runner.
both take similar time.
I thought If I use 2 runner, It use two core so it will take less time.
What's wrong with me?
memory question:
If I run the one process, tensorflow take lots of memory(not model memory, library memory).
If I run multiple runner, It should take multiple library memory as much as number of process.
I want to share this memory management among runners....
please give me advise...
I need help..