Hello is there a particular reason why docker images are cre DataHub #getting-started

Hello, is there a particular reason why docker ima...

incalculable-ocean-74010

03/15/2021, 4:40 PM

Hello, is there a particular reason why docker images are created directly from datahub's source instead of relying on published artifacts? I.e: published jars for GMS? published packages for python? Right now, if I need to modify a particular image I need to have the entire codebase locally available to perform relatively minor changes.

mammoth-bear-12532

03/15/2021, 4:52 PM

No specific reason other than convenience (of writing the docker script I imagine) ... @microscopic-receptionist-23548 or @steep-airplane-62865 might be able to shed more light

incalculable-ocean-74010

03/15/2021, 5:24 PM

Would it be reasonable to release the code artifacts for each module to their respective places and then modify the dockerfiles? I understand the value of convenience to test local changes, but perhaps we could find a middle ground like having 2 sets of dockerfiles? The first set for local development and the second as part of the release process.

microscopic-receptionist-23548

03/15/2021, 5:30 PM

we have docker files for local development. see

docker/dev.sh

. I guess I'm not sure I see the advantage of using prepublished packages to build images. Can you explain further what kind of changes you're making?

microscopic-receptionist-23548

03/15/2021, 5:34 PM

Also, I'm no docker expert, but I do think it is a wise thing to build the code on the image it is going to run, to help ensure compatibility

incalculable-ocean-74010

03/15/2021, 5:41 PM

In my case I was trying to modify the datahub-ingestion docker image to add support for druid, see this PR: https://github.com/linkedin/datahub/pull/2235 I've tested in on end well enough for my use-case. I wanted to make the docker-image available so that I could use it but it implies having the entire datahub repository on my end during the build phase which uses my company's resources and is essentially duplicated work once the PR is merged.

incalculable-ocean-74010

03/15/2021, 5:43 PM

I imagined similar use-cases of smallish changes for devs in other companies may occured and thought if there could be an easier way to extend/modify these dockerfiles

microscopic-receptionist-23548

03/15/2021, 6:05 PM

I wanted to make the docker-image available so that I could it

microscopic-receptionist-23548

03/15/2021, 6:05 PM

could what it?

incalculable-ocean-74010

03/15/2021, 6:09 PM

so sorry, so that it could be used in a deployment at my company without having to wait for a release on datahub's side.

microscopic-receptionist-23548

03/15/2021, 6:11 PM

so, assume we did publish python artifacts here, for sake of argument. how would that "fix" this issue? the artifact itself would still need to be built and published, and then a new docker image that pulls in the artifact built...

microscopic-receptionist-23548

03/15/2021, 6:11 PM

right?

incalculable-ocean-74010

03/15/2021, 6:13 PM

Yes, my only intention is to reduce the dependency overhead and steps in the docker image building process (having to copy the entire codebase, compile it in the docker build process, move compiled objects, etc...)

microscopic-receptionist-23548

03/15/2021, 6:14 PM

all you did was offload it to some prior step that builds artifacts; that still needs to be run

incalculable-ocean-74010

03/15/2021, 6:17 PM

Very true, it is my experience that that is normal procedure anyway, whether in a CI or locally, no need to duplicate it again in the dockerfile.

incalculable-ocean-74010

03/15/2021, 6:18 PM

If those build artifacts are published somewhere it allows others to create modified docker images to the oficial ones without having the codebase at hand if not needed.

microscopic-receptionist-23548

03/15/2021, 6:20 PM

Hmm that makes sense, though I don't think it helps this specific PR since you're modifying artifacts anyway. Really this is the opposite? You want to modify the artifact but not the docker image 😛 This specific PR aside, I'll look into it a bit. Again, not a docker expert, I'm certainly not sure what best practices are. Again, so far as I know, building on the image is a good idea, but python/java should be portable and it shouldn't matter....

incalculable-ocean-74010

03/15/2021, 6:22 PM

Probably, altering the artifact alters the docker image I guess. Since it is python I think I could replace the relevant python files but yes it would not be easiest thing.

Open in Slack

Previous Next