Perhaps a silly question, but are we truly limited...
# ingestion
l
Perhaps a silly question, but are we truly limited to environment options of ’[“DEV”, “EI”, “PROD” or “CORP”]? Is there a way to create additional, more specific environment values directly applicable to our use case?
m
Hey @little-van-63930, yes this is an unfortunate limitation right now. Has also been pointed out by others. We are currently figuring out ways to address this.
(while maintaining backwards compatibility)
l
@mammoth-bear-12532, Is there a github issue I can track?
m
@little-van-63930: Now there is 🙂 https://github.com/linkedin/datahub/issues/3001
l
…and commented. FWIW, I cant adopt until this is done, but I can POC this till then. Mostly for the reasons in the comment (production for us is the source - the dw/analytics environment is ‘our’ production, but the naming has to be clear for the org - which is one of the reasons we’re taking on this. TYVM
m
Thanks for letting us know! Will give you a sense of timeline for getting this done early next week.
👍 1
l
TYVM. If I come across any other dealbreakers that I think are general use cases, not specific to my environment, Should I send you a note, or..? The only other thing I have atm is 1: • to figure out how to set lineage manually (and to figure out if it is table or field level when I do). • How to build the containers using the datahub docker, but parameterized for the AWS environment so the app containers are created, but using our existing AWS managed services (RDS / MSK / ES). I have an idea on the second, but any direction where to look or what to edit would prove useful (and I’d be happy share a working config sample and instructions for use once I have it working). Again, TYVM.
b
Yes please post them here! They are a great source of feedback, as most likely others share the same requirements. As for your points: Currently, lineage must be provided via API. I can share an example of a CURL to do so, but we'd recommend you push this information using our Python Client library. I'm not sure I fully grasp the second requirement. You want to deploy to AWS but without using Kubernetes, so you want a way to easily configure the default configs for ones that allow you to speak with managed services instead. Is that correct? In cases like these, we do usually recommend folks use our Helm Charts, as it will be a bit easier to coordinate the configuration across the containers
l
@big-carpet-38439, I’m deploying to AWS, but using containers for the datahub services and native AWS SAAS for the other services (MSK for Kafka, RDS for MySQL, their ES. I found I’m having to create a new docker-compose which has these defined by name, and also setting the service container URLs by name. Debugging now (connectivity, still have some name resolution issues).
m
@little-van-63930: are you following this guide? https://datahubproject.io/docs/deploy/aws/
l
@mammoth-bear-12532, not at all following that - we don’t use Kubernetes, and if I can’t hand management and monitoring off to our devops team, I’m stuck with that task (which is an expensive proposition). I’m planning on setting the internal services up using EC2 or Fargate containers, aliased using an internally referenced A or CNAME record, pointing at the cluster network interfaces. For now, this should work as I don’t expect a lot of changes / UI load. But I do have to go through each service and set the host names, and also figure out what to do for ES user/pass entry (the ES config seems to be different for different modules), and for Kafka using plaintext (the internal IPs are hard-locked for access with a global exclusion and few, specific inclusions). That all said, any idea where this error could come from? I’m pulling the latest container for datahub-gms, and not seeing any connectivity issues to any external services, but it is non-responsive, so the container set (datahub-gms / datahub-frontend-react / schema-registry) won’t start… Thank you, David _`datahub-gms | 2021-08-02 231004.036WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.ANTLRFileStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/ANTLRFileStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/ANTLRFileStream.class` _`datahub-gms | 2021-08-02 231004.046WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.ANTLRInputStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/ANTLRInputStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/ANTLRInputStream.class` _`datahub-gms | 2021-08-02 231004.046WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.ANTLRReaderStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/ANTLRReaderStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/ANTLRReaderStream.class` _`datahub-gms | 2021-08-02 231004.047WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.ANTLRStringStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/ANTLRStringStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/ANTLRStringStream.class` _`datahub-gms | 2021-08-02 231004.048WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.BaseRecognizer scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/BaseRecognizer.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/BaseRecognizer.class` _`datahub-gms | 2021-08-02 231004.048WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.BitSet scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/BitSet.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/BitSet.class` _`datahub-gms | 2021-08-02 231004.048WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.BufferedTokenStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/BufferedTokenStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/BufferedTokenStream.class` _`datahub-gms | 2021-08-02 231004.049WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.CharStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/CharStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/CharStream.class` _`datahub-gms | 2021-08-02 231004.049WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.CharStreamState scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/CharStreamState.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/CharStreamState.class` _`datahub-gms | 2021-08-02 231004.049WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.ClassicToken scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/ClassicToken.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/ClassicToken.class` _`datahub-gms | 2021-08-02 231004.050WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.CommonToken scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/CommonToken.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/CommonToken.class` _`datahub-gms | 2021-08-02 231004.050WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.CommonTokenStream scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/CommonTokenStream.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/CommonTokenStream.class` _`datahub-gms | 2021-08-02 231004.050WARNoeja.AnnotationParserqtp544724190 15 org.antlr.runtime.DFA scanned from multiple locations: jarfile///tmp/jetty-0_0_0_0-8080-war_war-_-any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr-runtime-3.5.2.jar!/org/antlr/runtime/DFA.class, jarfile///tmp/jetty-0_0_0_0-8080-war_war--any-5484694777862272382.dir/webapp/WEB-INF/lib/antlr4-4.5.jar!/org/antlr/runtime/DFA.class`
e
Can you try posting the whole GMS logs?
l
I’ll need to review and ensure there isnt anything there I shouldn’t disclose, but otherwise yes.
Here’s the start of the log. It doesnt take long for this error to appear, and it just keeps running after that. FWIW, this setting is enabled:
Copy code
- GRAPH_SERVICE_IMPL=elasticsearch
e
Which error are you referring to? I don’t see one in the above pasted log.
l
I’m referencing the repeated warnings, unless they are expected. They are constantly written to the log, repetitively, until I stop the container. Specifically, the front-end container running react reports that datahub-gms isnt available, and this is the only issue I see noted in that container’s log.
e
You can ignore the warnings. Does GMS die after some time? or do you see any more logs after those warnings?
b
It could be a memory issue - can you confirm that GMS has died by running
docker container ls
to view running containers?
l
It doesnt ever actually ‘die’ until its stopped. it’s status message is “datahub-gms is running but not healthy”. It may be a docker network config issue. I’m playing with the configs to validate this hypothesis, but it will be a bit before I can do so. Network speed changed where I’m vacationing; the download speeds went from 100 to 3 Mb overnight, so…..
e
so are the above logs the end of gms logs?
no other logs are showing up?
hmn usually, if it stops in the middle like that, it means the container died. We’ve never seen it stop in the middle like that while still running