https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • s

    strong-analyst-47204

    07/08/2020, 7:52 AM
    yes,I used metadata-ingestion/sql-etl
  • b

    bumpy-keyboard-50565

    07/21/2020, 12:58 PM
    @strong-analyst-47204 @billowy-eye-48149 let's move the discussion to this issue created by @numerous-market-21675: https://github.com/linkedin/datahub/issues/1728
  • b

    bumpy-keyboard-50565

    08/01/2020, 11:58 AM
    has renamed the channel from "data-ingestion" to "datahub-ingestion"
  • s

    strong-pharmacist-65336

    10/01/2020, 5:55 PM
    I am trying to run ingest data via mysql_etl.py script but I am getting avro schema file is not available error
  • s

    strong-pharmacist-65336

    10/01/2020, 5:55 PM
    image.png
  • s

    strong-pharmacist-65336

    10/01/2020, 5:55 PM
    do you know how to resolve it ?
  • s

    strong-pharmacist-65336

    10/01/2020, 5:56 PM
    (venv) (base) USA-MAC-NIS1908:sql-etl nlangaliya$ python3 mysql_etl.py
    Traceback (most recent call last):
     
    File "mysql_etl.py", line 13, in <module>
      
    run(URL, OPTIONS, PLATFORM)
     
    File "/Users/nlangaliya/Documents/GitHub/datahub/metadata-ingestion/sql-etl/common.py", line 111, in run
      
    produce_dataset_mce(mce, kafka_config)
     
    File "/Users/nlangaliya/Documents/GitHub/datahub/metadata-ingestion/sql-etl/common.py", line 97, in produce_dataset_mce
      
    record_schema = avro.load(kafka_config.avsc_path)
     
    File "/Users/nlangaliya/Documents/GitHub/datahub/venv/lib/python3.7/site-packages/confluent_kafka/avro/load.py", line 36, in load
      
    with open(fp) as f:
    FileNotFoundError: [Errno 2] No such file or directory: '../../metadata-events/mxe-schemas/src/renamed/avro/com/linkedin/mxe/MetadataChangeEvent.avsc'
    (venv) (base) USA-MAC-NIS1908:sql-etl nlangaliya$ cat mysql_etl.py
    from common import run
    # See <https://github.com/PyMySQL/PyMySQL> for more detail
    hostname = '127.0.0.1'
    DATABASE = 'datahub'
    USER = 'datahub'
    PASSWORD = 'datahub'
    URL = '' # e.g. <mysql+pymysql://username>:password@hostname:port
    URL = '<mysql+pymysql://datahub>:datahub@127.0.0.1'
    OPTIONS = {} # e.g. {"encoding": "latin1"}
    PLATFORM = 'mysql'
    run(URL, OPTIONS, PLATFORM)
  • m

    microscopic-receptionist-23548

    10/01/2020, 5:56 PM
    have you built the project yet?
  • m

    microscopic-receptionist-23548

    10/01/2020, 5:57 PM
    step 2
  • s

    stale-pilot-26214

    11/04/2020, 4:16 AM
    @stale-pilot-26214 has left the channel
  • q

    quaint-pharmacist-15654

    11/17/2020, 6:44 PM
    @quaint-pharmacist-15654 has left the channel
  • a

    average-city-12965

    11/30/2020, 2:28 PM
    The
    created
    is another example. The field contains the creation timestamp and the author, but it is also required in the schema and can't be left empty for the GMS service to decide. Is the intention that I'll look that up from the GMS API when updating existing data via MCEs?
  • c

    clever-journalist-89046

    12/15/2020, 11:05 AM
    Hi team, I am getting the exception when trying to run bigquery_etl.py.
  • c

    clever-journalist-89046

    12/15/2020, 11:06 AM
    Copy code
    Invalid project ID 'bq_demo'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.
  • g

    gentle-plumber-6625

    12/16/2020, 12:25 PM
    HI Team, getting error message as "SyntaxError: invalid syntax" while trying with any datasource Traceback (most recent call last): File "mssql_etl.py", line 1, in <module> from common import run File "/home/ubuntu/datahub/metadata-ingestion/sql-etl/common.py", line 61 "platform": f"urnlidataPlatform:{platform}", ^ SyntaxError: invalid syntax we tried both mysql & mssql but getting same error. we haven't touch or modify common.py file. we used below connection string for both mysql
  • g

    gentle-plumber-6625

    12/16/2020, 12:26 PM
    cat mysql_etl.py from common import run # See https://github.com/PyMySQL/PyMySQL for more details #URL = '' # e.g. mysql+pymysql://username:password@hostname:port URL = 'mysql+pymysql://root:root@123@127.0.0.1:3306' OPTIONS = {} # e.g. {"encoding": "latin1"} PLATFORM = 'mysql' run(URL, OPTIONS, PLATFORM)
  • l

    lemon-analyst-37781

    02/09/2021, 1:57 AM
    image.png
  • l

    lemon-analyst-37781

    02/09/2021, 1:57 AM
    Hey, getting this error while running snowflake_etl.py
  • i

    incalculable-ocean-74010

    02/16/2021, 12:51 PM
    Hello, is there any documentation on how to work with Spring's bean and injection logic? I've got the following instantiation exception after defining a new entity and creating the necessary files in the gms module:
    Copy code
    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'restliRequestHandler' defined in ServletContext resource [/WEB-INF/beans.xml]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler]: Constructor threw exception; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'streamSearchDao' available
    	at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:314)
    	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:295)
    	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1358)
    	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1204)
    	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557)
    	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517)
    	at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323)
    	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321)
    	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:207)
    	at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1114)
    	at org.springframework.web.context.support.HttpRequestHandlerServlet.init(HttpRequestHandlerServlet.java:61)
  • i

    incalculable-ocean-74010

    02/16/2021, 12:52 PM
    I've created the referenced bean
    streamSearchDao
    as such:
    Copy code
    package com.linkedin.gms.factory.stream;
    import com.linkedin.metadata.configs.StreamSearchConfig;
    import com.linkedin.metadata.dao.search.ESSearchDAO;
    import com.linkedin.metadata.search.StreamDocument;
    import org.elasticsearch.client.RestHighLevelClient;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.context.ApplicationContext;
    import org.springframework.context.annotation.Bean;
    import org.springframework.context.annotation.Configuration;
    import org.springframework.context.annotation.DependsOn;
    import javax.annotation.Nonnull;
    
    @Configuration
    public class StreamSearchDaoFactory {
        @Autowired
        ApplicationContext applicationContext;
    
        @Nonnull
        @DependsOn({"elasticSearchRestHighLevelClient"})
        @Bean(name = "streamSearchDAO")
        protected ESSearchDAO createInstance() {
            return new ESSearchDAO(applicationContext.getBean(RestHighLevelClient.class), StreamDocument.class,
                    new StreamSearchConfig());
        }
    }
  • p

    powerful-egg-69769

    02/22/2021, 4:11 PM
    looking at teh ingestion examples, they all require lots of gradle or python dependencies
  • i

    incalculable-ocean-74010

    03/01/2021, 10:24 AM
    Hello, has anyone tried using the ingestion recipes against a Hive table? I've being playing around with a dummy local setup using: https://github.com/big-data-europe/docker-hive and the following recipe:
    Copy code
    ---
    source:
      type: hive
      config:
        username: hive
        password: hive
        database: default
        host_port: localhost:10000
    #   table_pattern:
    #        allow:
    #            - "schema1.table1"
    #            - "schema1.table2"
    #        deny:
    #            - "^.*\.sys_.*" # deny all tables that start with sys_
    
    sink:
      type: console
    However I get an unfamiliar stacktrace:
  • i

    incalculable-ocean-74010

    03/01/2021, 10:24 AM
    Has anyone come across this stack trace?
  • i

    incalculable-ocean-74010

    03/01/2021, 10:24 AM
    Copy code
    ▶ datahub ingest -c hive_to_console.yml
    [2021-03-01 10:21:25,708] DEBUG    {datahub.entrypoints:64} - Using config: {'source': {'type': 'hive', 'config': {'username': 'hive', 'password': 'hive', 'database': 'default', 'host_port': 'localhost:10000'}}, 'sink': {'type': 'console'}}
    [2021-03-01 10:21:25,708] DEBUG    {datahub.ingestion.run.pipeline:63} - Source type:hive,<class 'datahub.ingestion.source.hive.HiveSource'> configured
    [2021-03-01 10:21:25,709] DEBUG    {datahub.ingestion.run.pipeline:69} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
    [2021-03-01 10:21:25,709] DEBUG    {datahub.ingestion.source.sql_common:152} - sql_alchemy_url=<hive://hive:hive@localhost:10000/default>
    Traceback (most recent call last):
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/bin/datahub", line 11, in <module>
        load_entry_point('datahub', 'console_scripts', 'datahub')()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/pedro/dev/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 70, in ingest
        pipeline.run()
      File "/home/pedro/dev/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 81, in run
        for wu in self.source.get_workunits():
      File "/home/pedro/dev/datahub/metadata-ingestion/src/datahub/ingestion/source/sql_common.py", line 154, in get_workunits
        inspector = reflection.Inspector.from_engine(engine)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 135, in from_engine
        return Inspector(bind)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 108, in __init__
        bind.connect().close()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2263, in connect
        return self._connection_cls(self, **kwargs)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 104, in __init__
        else engine.raw_connection()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2369, in raw_connection
        return self._wrap_pool_connect(
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
        return fn()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
        return _ConnectionFairy._checkout(self)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
        fairy = _ConnectionRecord.checkout(pool)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
        rec = pool._do_get()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
        self._dec_overflow()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
        compat.raise_(
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
        raise exception
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
        return self._create_connection()
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
        return _ConnectionRecord(self)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
        self.__connect(first_connect_check=True)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
        pool.logger.debug("Error on connect(): %s", e)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
        compat.raise_(
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
        raise exception
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
        connection = pool._invoke_creator(self)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
        return dialect.connect(*cargs, **cparams)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 508, in connect
        return self.dbapi.connect(*cargs, **cparams)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/pyhive/hive.py", line 94, in connect
        return Connection(*args, **kwargs)
      File "/home/pedro/dev/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/pyhive/hive.py", line 123, in __init__
        raise ValueError("Password should be set if and only if in LDAP or CUSTOM mode; "
    ValueError: Password should be set if and only if in LDAP or CUSTOM mode; Remove password or use one of those modes
  • i

    incalculable-ocean-74010

    03/01/2021, 5:34 PM
    Has the new metadata ingestion framework been tested with alternative login methods other than username/password in the connection string to databases?
  • b

    brief-toothbrush-55766

    03/05/2021, 12:36 PM
    Am running into this error while trying to ingest:
  • b

    brief-toothbrush-55766

    03/05/2021, 12:37 PM
    Any hints?
  • l

    loud-island-88694

    03/05/2021, 3:20 PM
    @gray-shoe-75895 can help more. In the meantime, can you try
    pip3 install --upgrade setuptools
  • c

    curved-crayon-1929

    03/05/2021, 3:26 PM
    @loud-island-88694 it worked awesome thanks
    🙌 1
  • l

    loud-island-88694

    03/05/2021, 3:28 PM
    👍
1...130131132...144Latest