https://linen.dev logo
Join Slack
Powered by
# best-practices
  • n

    Noam Siegel

    02/04/2025, 7:38 AM
    Asking for a friend... Whats the loader/extract that I can use to perform ETL from a folder that contains lots of json files to a postgres db? I have tried to use https://hub.meltano.com/extractors/tap-singer-jsonl/ but am running into this error:
    Copy code
    (etl) ➜  etl git:(main) ✗ meltano config tap-singer-jsonl set local.folders /Users/noamsiegel/Downloads/tripadvisor-matched-files/
    
    2025-02-04T07:34:25.310094Z [info     ] The default environment 'dev' will be ignored for `meltano config`. To configure a specific environment, please use the option `--environment=<environment name>`.
    Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    Failed to parse JSON array from string: '/Users/noamsiegel/Downloads/tripadvisor-matched-files/'
    I tried with and without the last "/" at the end of the folder path
    r
    • 2
    • 6
  • p

    Pawel Plaszczak

    02/04/2025, 4:14 PM
    What about sending over large CLOB data? Can Meltano/Singer do this? I have some cases of large documents stored in CLOB columns in database where fields may have many MB. I saw 40 MB in one case, but I do not know what is the upper limit. How does this translate to Meltano and Singer limitations? Will I hit some hard-coded limits? As I understand, all data is being sent as JSON. I don't think JSON specification itself has any limit, but I imagine that there may be practical issues with how particular python handle large data including buffering and other issues. Please share best practices and if you know of practical limitations, please share them. Scenario: I am copying from tap-oracle to target-oracle. I see that target-oracle does not handle CLOBS correctly: it translates them to VARCHAR. I saw the client is hard-coded to use thin client. I am thinking to follow advice from @Edgar Ramírez (Arch.dev) and fork it to use thick client rather than thin, hoping this resolves the issue. But I assume I might hit various other issues down the road. Maybe the translation from CLOB to VARCHAR is not a mistake, but a way to overcome some problem? Before going too far and hitting some wall, I want to check what is awaiting me there. Any feedback is more than welcome.
    e
    • 2
    • 2
  • e

    Emre Üstündağ

    02/04/2025, 9:55 PM
    Hi. I am new to data pipeline concepts so these questions might be piece of cakes to most of you. Any help would be much appreciated. As seen on the attachment, I am creating an EL pipeline to extract data with tap-amazon-sp and load to target-clickhouse. I 've made some configurations and the data flow seems ok now (after countless daily tests actually :)). What I did is executing "... run tap-amazon-sp target-clickhouse" command and after 3-4 hours, doing it again. When I check my local clickhouse, orders related tables such as orders, orderitems, orderaddress etc. seem they worked as they are expected due to incremental replications. However, the products related tables, product_details etc. seems duplicated all rows. For example, they were 5000 after first run and now 10000 after second run. I dont think it is expected. Should I be missing something to configure tap-amazon-sp (I checked tap.properties.json, these tables have no replication keys and replication methods are full-table after each task ran) or should target-clickhouse handle this by dropping previous table and create the new one with a single configuration? And I also wonder how can I configure the tap-amazon-sp to perform EL pipeline schedules seperately? I mean the orders related tables are varying frequently, so they need to be run daily at least in my case. But product related tables' rows dont change frequently, so the EL task for these tables dont need to be run daily, maybe weekly or even monthly would be enough. What is the best practise to make this configurations to perform running meltano tasks seperately? I dont know if I should do these with different projects based on data needs. In addition, the solution should not affect the other ones working properly for example let's say I configured target-clickhouse load method to "overwrite", it will cause deleting all existing rows for incremental pipelines that it shouldn't. In short, I need a best practise approach to make all these happen, if possible in a single meltano project
  • p

    Pawel Plaszczak

    02/18/2025, 9:45 AM
    I look for a versatile, Open-Source ETL solution. Having researched many options, we have selected Singer and Meltano for prototype, and currently are still in prototype stage. With this, people in this Slack community were extremely helpful; most help came from @Edgar Ramírez (Arch.dev) but also others. I have now published a short article documenting our journey and the decision process why Singer and Meltano was preselected. Comments and feedback alwayls welcome. https://medium.com/@pp_85623/towards-the-killer-open-source-elt-etl-4270df7d3d93
    👀 2
    👌 1
  • a

    Andy Crellin

    02/21/2025, 10:14 AM
    Hi, I've set up a Meltano instance on a dev machine and now want to move it to a live server so it cam be properly scheduled etc. I have the state stored in the default local SQLite database. I want to move everything so that I can keep the state and just get things running again on the new machine - I want to make sure that I don't need to re-extract the data, but continue to use the state in the database. 1. My approach will be to literally just copy all of the files in the Meltano folder over to the new machine and run it from there - are there any gotchas or considerations that I need to bear in mind for this approach? 2. Once in the new location, I would expect the new maintainer of this to sync it to a github repo - again, anything I should make them particularly aware of?
    r
    • 2
    • 2
  • j

    Jesse Neumann

    02/21/2025, 10:45 PM
    Hi all, I'm working on a pipeline that pulls news data in from an api, stores it in postgres, scrubs it with dbt, and then does some sentiment analysis by calling a LLM endpoint with each new row. Currently I have the pipeline scheduled using the Airflow orchestrator, but it stops before the sentiment analysis/LLM endpoint step happens. I currently have the sentiment analysis/LLM endpoint step in a separate project and was going to call its Docker image from inside the Airflow orchestrator, but thought it would be nice to have everything inside Meltano. Is there a best practice for where to put the python scripts for calling the LLM endpoint? Also, is there an easy way to trigger them automatically after the previous steps run, or do I just create a new DAG for the Airflow orchestrator?
    • 1
    • 1
  • s

    Siba Prasad Nayak

    02/24/2025, 7:10 AM
    Hi Everyone, Good day ! I installed meltano 2.20.0 version on my machine and tried to use the option meltano had "meltano ui" to check GUI web server. When I executed the command, I could see the web server pages are loading for a very long time. I am not sure what is causing this issue ? Any incompatibility or python version issue !!!
    MeltanoUI_2.20.0.mp4
    e
    • 2
    • 9
  • c

    Chad

    02/26/2025, 2:47 AM
    Hi all, wondering if anyone has managed to get creative with Meltano job orchestration to build out more complex DAGs. We have some jobs that have multiple tasks where some of the tasks benefit from and can be run in parallel while other tasks can't. I can't see an easy way to specify tasks in a way that would allow my custom DAG to know which can be run concurrently and which sequentially. Other than using annotations on the job has anyone come up with some creative ways to address this or am I missing something? Also with annotations it seems they are not exported with the JSON schedules so would need to manually extract them form the YAML.
    a
    • 2
    • 3
  • a

    ashish singh

    03/06/2025, 9:50 PM
    hi folks, i am a new Meltano user and wondering if there is a suggested method/mechanism to mask data when ingesting using Meltano? if so, any documentation/examples that i can get?
    ✅ 1
    r
    • 2
    • 2
  • p

    Pouya Barrach-Yousefi

    03/10/2025, 7:00 PM
    Does anyone has a good deck for data versioning best practices and considerations that I can build off of? Happy to share my own decks in exchange.
    e
    v
    • 3
    • 2
  • p

    Pawel Plaszczak

    03/20/2025, 12:41 AM
    Dear all, in this Medium article I put together my experience in evaluating the open source Meltano as ELT framework (in which exercise many of you helped and I am truly grateful). Please feel free to comment, find flaws in my writing, and propose solutions to problems if my solutions are imperfect. Especially @Edgar Ramírez (Arch.dev) and @Steve Clarke may be interested. https://medium.com/@pp_85623/meltano-in-action-hands-on-evaluation-of-an-open-source-elt-framework-5a1d5b93b483
    👀 1
    e
    • 2
    • 1
  • p

    Pawel Plaszczak

    03/20/2025, 10:15 AM
    I am looking for good practices on propagading the DELETE operations over the Meltano ELT. As one of the things mentioned in the article below, it seems to me that propagation of DELETEs are not supported in target-oracle so I'd have to implement it myself. How? My guess is that I should be looking at
    _sdc_deleted_at
    timestamps (if produced by tap-oracle) and use DBT transformations to handle these. Would this be the recommended way? Does anyone know whether tap-oracle correctly produces these timestamps? More generally, I am of an impressions that deletes are rarely implemented by the targets. Am I correct and if so, why is it so? I understand they are somewhat more tricky to implement than barebones inserts, however they seem to me quite an important and probably quite frequent requirement for proper ETL. How are people in general dealing with this problem? https://medium.com/@pp_85623/meltano-in-action-hands-on-evaluation-of-an-open-source-elt-framework-5a1d5b93b483
  • s

    Siba Prasad Nayak

    03/21/2025, 7:33 AM
    Hi Team, Hope you are doing well. I am getting one issue when I am using tap-postgres and target-jsonl for my custom connector. Can anyone please help. > _(meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend> meltano invoke tap-postgres_ > 2025-03-21T081517.883761Z [warning ] Failed to create symlink to 'meltano.exe': administrator privilege required > 2025-03-21T081517.917544Z [info ] Environment 'dev' is active > 2025-03-21 134520,737 | INFO | tap-postgres | Skipping deselected stream 'public-accounts'. > _(meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>_ > > meltano=# \dp public.accounts > Access privileges > Schema | Name | Type | Access privileges | Column privileges | Policies > --------+----------+-------+----------------------------+-------------------+---------- > public | accounts | table | postgres=arwdDxtm/postgres | | > (1 row) > > meltano=# > > Below is my meltano.yml configuration. > - name: tap-postgres > namespace: tap_postgres > pip_url: ./connectors/tap-postgres > executable: tap-postgres > config: > database: meltano > host: localhost > port: 5432 > user: postgres > password: ******** > filter_schemas: [public] > select_all_tables: true # Or false, and use a selection JSON > select_all_fields: true # If false, use a selection JSON to specify fields > sqlalchemy_url: "postgresql://postgres:"******"@localhost:5432/meltano" > > meltano=# > 2025-03-21 134520,737 | INFO | tap-postgres | Skipping deselected stream 'public-accounts'. > _(meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>_
    r
    • 2
    • 5
  • p

    Pawel Plaszczak

    03/21/2025, 4:02 PM
    In this public article I summarized my performance tests of Meltano. It would be much appreciated if people who are more experienced looked at what I've done and commented whether the numbers quoted make sense. I tested throughput of Oracle and Postgres connection in various types of data. https://medium.com/@pp_85623/testing-performance-of-elt-data-ingest-meltano-oracle-and-postgresql-471989ff82df
  • s

    Siba Prasad Nayak

    04/02/2025, 3:50 AM
    Hi @Everyone, Good Day ! I am trying to run a meltano pipeline using environment variables. So I set the env variables $env:MELTANO_SETTING_TAP_POSTGRES_DATABASE="meltano"; $env:MELTANO_SETTING_TAP_POSTGRES_HOST="localhost"; $env:MELTANO_SETTING_TAP_POSTGRES_PORT="5432"; $env:MELTANO_SETTING_TAP_POSTGRES_USER="postgres"; $env:MELTANO_SETTING_TAP_POSTGRES_PASSWORD="****"; $env:MELTANO_SETTING_TAP_POSTGRES_FILTER_SCHEMAS="public"; $env:MELTANO_SETTING_TAP_POSTGRES_SQLALCHEMY_URL="postgresql://postgres:siba1234@localhost:5432/meltano" and commented the config sections in the meltano.yml file, then I locked the plugin. When I executed 2025-04-02T034115.605289Z [warning ] Failed to create symlink to 'meltano.exe': administrator privilege required 2025-04-02T034115.616596Z [info ] The default environment 'dev' will be ignored for
    meltano config
    . To configure a specific environment, please use the option
    --environment=<environment name>
    . (meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend> I am getting no values. Is it the right way to set configuration if I want to use env variables. The purpose is not to store the config parameters in meltano.yml file. note: there is no .env file under my meltano project root directory. Thanks in advance !
    r
    v
    • 3
    • 7
  • s

    Siba Prasad Nayak

    04/02/2025, 7:46 AM
    Hi Team, Getting these warnings while running an ELT pipeline. Can anyone suggest what needs to be done from my side.
    2025-04-02T07:37:31.279724Z [warning  ] Certificate did not match expected hostname: <http://sp.meltano.com|sp.meltano.com>. Certificate: {'subject': ((('commonName', '_.<http://ops.snowcatcloud.com|ops.snowcatcloud.com>'),),), 'issuer': ((('countryName', 'US'),), (('organizationName', 'Amazon'),), (('commonName', 'Amazon RSA 2048 M02'),)), 'version': 3, 'serialNumber': '0668C0E7C8CD0F1A31A21E5DDD2FD67D', 'notBefore': 'Mar  7 00:00:00 2025 GMT', 'notAfter': 'Apr  5 23:59:59 2026 GMT', 'subjectAltName': (('DNS', '_.<http://ops.snowcatcloud.com|ops.snowcatcloud.com>'),), 'OCSP': ('*<http://ocsp.r2m02.amazontrust.com>*',), 'caIssuers': ('*<http://crt.r2m02.amazontrust.com/r2m02.cer>*',), 'crlDistributionPoints': ('*<http://crl.r2m02.amazontrust.com/r2m02.crl',)>}*
    2025-04-02T07:37:31.283308Z [warning  ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=0)) after connection broken by 'SSLError(CertificateError("hostname '<http://sp.meltano.com|sp.meltano.com>' doesn't match '_.<http://ops.snowcatcloud.com|ops.snowcatcloud.com>'"))': /com.snowplowanalytics.snowplow/tp2_
    _2025-04-02T073732.200577Z [warning ] Certificate did not match expected hostname: sp.meltano.com. Certificate: {'subject': ((('commonName', '_`.ops.snowcatcloud.com'),),), 'issuer': ((('countryName', 'US'),), (('organizationName', 'Amazon'),), (('commonName', 'Amazon RSA 2048 M02'),)), 'version': 3, 'serialNumber': '0668C0E7C8CD0F1A31A21E5DDD2FD67D', 'notBefore': 'Mar 7 000000 2025 GMT', 'notAfter': 'Apr 5 235959 2026 GMT', 'subjectAltName': (('DNS', '_.ops.snowcatcloud.com'),), 'OCSP': ('*http://ocsp.r2m02.amazontrust.com*',), 'caIssuers': ('*http://crt.r2m02.amazontrust.com/r2m02.cer*',), 'crlDistributionPoints': ('*http://crl.r2m02.amazontrust.com/r2m02.crl',)}*_`
    _2025-04-02T07:37:32.204045Z [warning  ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=0)) after connection broken by 'SSLError(CertificateError("hostname '<http://sp.meltano.com|sp.meltano.com>' doesn't match '_.<http://ops.snowcatcloud.com|ops.snowcatcloud.com>'"))': /com.snowplowanalytics.snowplow/tp2
    2025-04-02T073733.138758Z [warning ] Certificate did not match expected hostname: sp.meltano.com. Certificate: {'subject': ((('commonName', '`_.ops.snowcatcloud.com'),),), 'issuer': ((('countryName', 'US'),), (('organizationName', 'Amazon'),), (('commonName', 'Amazon RSA 2048 M02'),)), 'version': 3, 'serialNumber': '0668C0E7C8CD0F1A31A21E5DDD2FD67D', 'notBefore': 'Mar 7 000000 2025 GMT', 'notAfter': 'Apr 5 235959 2026 GMT', 'subjectAltName': (('DNS', '_.ops.snowcatcloud.com'),), 'OCSP': ('*http://ocsp.r2m02.amazontrust.com*',), 'caIssuers': ('*http://crt.r2m02.amazontrust.com/r2m02.cer*',), 'crlDistributionPoints': ('*http://crl.r2m02.amazontrust.com/r2m02.crl',)}*`
    2025-04-02T07:37:33.141577Z [warning  ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=0)) after connection broken by 'SSLError(CertificateError("hostname '<http://sp.meltano.com|sp.meltano.com>' doesn't match '_.<http://ops.snowcatcloud.com|ops.snowcatcloud.com>'"))': /com.snowplowanalytics.snowplow/tp2_
    _2025-04-02T073734.072331Z [warning ] Certificate did not match expected hostname: sp.meltano.com. Certificate: {'subject': ((('commonName', '_`.ops.snowcatcloud.com'),),), 'issuer': ((('countryName', 'US'),), (('organizationName', 'Amazon'),), (('commonName', 'Amazon RSA 2048 M02'),)), 'version': 3, 'serialNumber': '0668C0E7C8CD0F1A31A21E5DDD2FD67D', 'notBefore': 'Mar 7 000000 2025 GMT', 'notAfter': 'Apr 5 235959 2026 GMT', 'subjectAltName': (('DNS', '*.ops.snowcatcloud.com'),), 'OCSP': ('*http://ocsp.r2m02.amazontrust.com*',), 'caIssuers': ('*http://crt.r2m02.amazontrust.com/r2m02.cer*',), 'crlDistributionPoints': ('*http://crl.r2m02.amazontrust.com/r2m02.crl',)}*`
    (meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>
    a
    p
    r
    • 4
    • 8
  • s

    Siba Prasad Nayak

    04/02/2025, 4:06 PM
    Hi Team, I am trying to create a pipeline from tap-snowflake to target-salesforce. Below is the snippet of my root meltano.yml file for both tap-snowflake and target-salesforce.
    Copy code
    - name: tap-snowflake
        namespace: tap_snowflake
        pip_url: ./connectors/tap-snowflake
        executable: tap-snowflake
        capabilities:
        - state
        - catalog
        - discover
        - about
        - stream-maps
        settings:
        - name: account
          kind: string
          value: aigoiop-hq79023
          description: The Snowflake account identifier.
        - name: user
          kind: string
          value: udey
          description: The Snowflake username.
        - name: password
          kind: string
          value: ***********
          description: The Snowflake password.
          sensitive: true
        - name: database
          kind: string
          value: PARTHAN_DB
          description: The Snowflake database name.
        - name: warehouse
          kind: string
          value: COMPUTE_WH
          description: The Snowflake warehouse name.
        select:
        - public-account.* 
    -------------------------------------------------------------------
      - name: target-salesforce
        namespace: target_salesforce
        pip_url: ./connectors/target-salesforce
        executable: target-salesforce
        capabilities:
        - about
        - stream-maps
        - schema-flattening
        config:
          username: udey@vcs.sandbox
          password: ********
          security_token: nanLbbN3lexEw70gK7tLrzP4s
          api_type: sandbox
          #sobject: account
          action: insert
          stream_maps:
            public-employees:   
              target: Account     
              #key_properties: []   
              mappings:
              - source: name      
                target: Name
    I am getting an error as below for this stream_maps.
    Copy code
    2025-04-02T12:36:00.006657Z [info     ]        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-salesforce name=target-salesforce producer=False run_id=a2dbeee1-5078-41a8-a7d9-aba40dd70d46 stdio=stderr string_id=target-salesforce
    2025-04-02T12:36:00.008151Z [warning  ] Received state is invalid, incremental state has not been updated
    2025-04-02T12:36:00.098874Z [info     ] Incremental state has been updated at 2025-04-02 12:36:00.098808+00:00.
    2025-04-02T12:36:00.100101Z [info     ] TypeError: unhashable type: 'list' cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-salesforce name=target-salesforce producer=False run_id=a2dbeee1-5078-41a8-a7d9-aba40dd70d46 stdio=stderr string_id=target-salesforce
    2025-04-02T12:36:00.220899Z [error    ] Loader failed
    2025-04-02T12:36:00.221982Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
    Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    Run invocation could not be completed as block failed: Loader failed
    (meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>
    Seems there is some issue with the stream_maps ! Can anyone please guide
    ✅ 1
    r
    • 2
    • 9
  • t

    Tanner Wilcox

    04/03/2025, 10:58 PM
    I need to scp a file off a server then parse that into json with my tap. I'm leaning towards an extension but it's looking I could just call scp directly in my extractor. Is there a reason to go with an extension? Is there already a solution for SCP?
    r
    • 2
    • 3
  • s

    Siba Prasad Nayak

    04/04/2025, 10:19 AM
    Hi Team, Good Day ! I am working Snowflake->Salesforce connector.
    Copy code
    - name: tap-snowflake
        namespace: tap_snowflake
        pip_url: ./connectors/tap-snowflake
        executable: tap-snowflake
        capabilities:
        - state
        - catalog
        - discover
        - about
        - stream-maps
        settings:
        - name: account
          kind: string
          value: aigoiop-hq79023
          description: The Snowflake account identifier.
        - name: user
          kind: string
          value: udey
          description: The Snowflake username.
        - name: password
          kind: string
          value: ********
          description: The Snowflake password.
          sensitive: true
        - name: database
          kind: string
          value: PARTHAN_DB
          description: The Snowflake database name.
        - name: warehouse
          kind: string
          value: COMPUTE_WH
          description: The Snowflake warehouse name.
        select:
        - public-account.name 
      
    
      - name: target-salesforce
        namespace: target_salesforce
        pip_url: ./connectors/target-salesforce
        executable: target-salesforce
        capabilities:
        - about
        - stream-maps
        #- schema-flattening
        config:
          username: udey@vcs.sandbox
          password: ********
          security_token: nanLbbN3lexEw70gK7tLrzP4s
          api_type: sandbox
          sobject: Employee__c
          action: insert
          stream_maps:
            '*':   # Stream from the tap
              __alias__: Employee__c
              mappings:
                "name": "Name"
    Somehow this "mappings" configuration is not working. Throwing error as
    Copy code
    2025-04-04T10:11:29.098748Z [warning  ] Received state is invalid, incremental state has not been updated
    2025-04-04T10:11:29.163005Z [info     ] Incremental state has been updated at 2025-04-04 10:11:29.162967+00:00.
    2025-04-04T10:11:29.163935Z [info     ] TypeError: unhashable type: 'dict' cmd_type=elb consumer=True job_name=dev:tap-snowflake-to-target-salesforce name=target-salesforce producer=False run_id=0f693284-543d-4331-8029-ac397f2c6d83 stdio=stderr string_id=target-salesforce
    2025-04-04T10:11:29.182189Z [info     ] 2025-04-04 15:41:28,998 | INFO     | snowflake.connector.connection | Snowflake Connector for Python Version: 3.13.2, Python Version: 3.13.2, Platform: Windows-11-10.0.22631-SP0 cmd_type=elb consumer=False job_name=dev:tap-snowflake-to-target-salesforce name=tap-snowflake producer=True run_id=0f693284-543d-4331-8029-ac397f2c6d83 stdio=stderr string_id=tap-snowflake
    2025-04-04T10:11:29.183433Z [info     ] 2025-04-04 15:41:28,999 | INFO     | snowflake.connector.connection | Connecting to GLOBAL Snowflake domain cmd_type=elb consumer=False job_name=dev:tap-snowflake-to-target-salesforce name=tap-snowflake producer=True run_id=0f693284-543d-4331-8029-ac397f2c6d83 stdio=stderr string_id=tap-snowflake
    2025-04-04T10:11:29.184298Z [info     ] 2025-04-04 15:41:28,999 | INFO     | snowflake.connector.connection | This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity. cmd_type=elb consumer=False job_name=dev:tap-snowflake-to-target-salesforce name=tap-snowflake producer=True run_id=0f693284-543d-4331-8029-ac397f2c6d83 stdio=stderr string_id=tap-snowflake
    2025-04-04T10:11:29.241665Z [error    ] Loader failed
    2025-04-04T10:11:29.242795Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
    Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    Run invocation could not be completed as block failed: Loader failed
    (meltanoEnv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>
    So I am trying to map name (From Snowflake) to Name (To Salesforce) field. [Camelcase].
    TypeError: unhashable type: 'dict'
    Did not get much examples on internet related to this "mappings".
    ✅ 1
    r
    • 2
    • 2
  • t

    Tanner Wilcox

    04/08/2025, 4:55 PM
    I am very confused here. I'm trying to run scp as a utility and I can't get it to work. This works
    Copy code
    - name: scp
          namespace: scp
          commands:
            get_file:
              executable: scp
              args: -h
    This doesn't
    Copy code
    - name: test
          namespace: test
          commands:
            get_file:
              executable: /bin/bash
              args: -c "./test.sh"
    Neither does this
    Copy code
    - name: test
          namespace: test
          commands:
            get_file:
              executable: ssh
              args: -h
    Here is my output from the frist scp test
    Copy code
    [tanner@sato ubb-meltano]$ mel run scp
    2025-04-08T16:50:21.070409Z [info     ] Environment 'dev' is active   
    2025-04-08T16:50:21.117648Z [info     ] usage: scp [-346ABCOpqRrsTv] [-c cipher] [-D sftp_server_path] [-F ssh_config] cmd_type=command name=scp stdio=stderr
    2025-04-08T16:50:21.117864Z [info     ]            [-i identity_file] [-J destination] [-l limit] [-o ssh_option] cmd_type=command name=scp stdio=stderr
    2025-04-08T16:50:21.117973Z [info     ]            [-P port] [-S program] [-X sftp_option] source ... target cmd_type=command name=scp stdio=stderr
    Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    'NoneType' object is not subscriptable
    The other tests produce the same "NoneType" error but they don't print the command's help message. test.sh is just
    echo hello
    . I made an scp extension that used to be in my utilities. It looked like this
    Copy code
    - name: scp-ext
          namespace: scp-ext
          pip_url: '../scp-ext'
          executable: scp
    I wonder if that's cached and that's why the scp command is the only thing that works? I've been banging my head against this for days. Any help would really be appreciated
    ✅ 1
  • r

    Reuben (Matatika)

    04/08/2025, 6:40 PM
    You invoke a command in the format
    <plugin name>:<command name>
    , so in your case
    Copy code
    meltano run scp:get_file
    or have you already tried that?
  • t

    Tanner Wilcox

    04/08/2025, 8:00 PM
    That's what I was missing. Thank you
    👍 1
  • t

    Tanner Wilcox

    04/08/2025, 8:01 PM
    Yup, it's right there in the documents. My bad
    👍 1
  • o

    Oscar Gullberg

    04/09/2025, 10:28 AM
    What’s the best practice for ingesting multiple taps in parallel to separate datasets, and then merging them into a final
    stg
    dataset after completion? Use case: Ingesting data from multiple Shopify stores. Right now, we run one Meltano pipeline per store, which: • Extracts raw data into a shared
    raw_shopify
    dataset in BigQuery • Creates common views in a single
    stg_shopify
    dataset This setup causes some issues. Ideally, we want to: 1. Ingest each store's raw data into its own dataset (e.g.
    raw_shopify_store1
    ,
    raw_shopify_store2
    , etc.) in parallel 2. Run per-store transforms into separate staging datasets (e.g.
    stg_shopify_store1
    , etc.) 3. Run a final transform step that unions everything into a central
    stg_shopify
    dataset Is there a clean way to do this in Meltano? Any recommendations or patterns others are using?
    e
    • 2
    • 2
  • s

    Siddu Hussain

    04/11/2025, 10:28 AM
    Hi All, I am trying to understand how Meltano internals are working. I am not able to figure out why one of the cases below is working but not both. • I am using Multiprocessing to call, child stream with parent keys. the data sent to Target is getting into race condition and the data posted is broken at times like ◦ expected sample, I have not added signer format but consider this in singer format and type record ▪︎
    {"key1": "value1" }, {"key2" : "value2"}
    ◦ random race condition data sample: the second record is emitted and written to the target before completing the first record writing. ▪︎
    {"key1": {"key2" : "value2"}
    ◦ I was under the assumption this was happening because of batching at tap and data written to jsonl ◦ This is happening even on removing the batching at tap • I tried Multiprocessing outside tap and call the meltano el as a subprocess for each chunk of data like below . This works without race condition.
    Copy code
    def run(time_range):
        try:
            is_backfill = os.environ.get("TAP_ZOOM_IS_BACKFILL")
            start_time, end_time = time_range
            start_time = shlex.quote(start_time)
            # start = start_time.replace(" ", "").replace(":", "")
            end_time = shlex.quote(end_time)
            cmd = (
                f"export MELTANO_STATE_BACKEND_URI='<s3://iflow-prod/state/backfill/zoom/>'"
                f"export TAP_ZOOM_FROM_DATE={start_time} TAP_ZOOM_TO_DATE={end_time} TAP_ZOOM_IS_BACKFILL={is_backfill}; "
                f"source .venv/bin/activate ; "
                f"meltano el tap-zoom target-s3-zoom --force;"
            )
            subprocess.run(cmd, shell=True, check=True)
            return {"time_range": time_range, "status": "success"}
        except Exception as e:
            return {"time_range": time_range, "status": "error", "error": str(e)}
    I was wondering if both the approaches spin an individual stdout pipe for each process spun but why is it getting into race condition in case 1 and not in case 2. My understanding is meltano sends data to stdout as per the target emit code, • might be a silly question but. How is Meltano differentiating logs that are emitted and singer records emitted? • when I spin a separate process this stdout should be different from the main process stdout right or else, is it the same stdout pipe. thanks for the time to read through any help is much appreciated, Thanks and have a great day
    e
    • 2
    • 6
  • a

    Anthony Shook

    04/17/2025, 5:06 PM
    Random thought problem: Let’s say I have a table with 1billion+ rows, and for the longest time, I’ve been replicating it on an auto-incrementing
    id
    column. However, the table is mutable at the source and has an
    updated_at
    column, so that means I’m not catching changes in the source table once I’ve pulled the at-the-moment value of a row. So my situation is this: • I want to update meltano config from using
    id
    as my replication-key to using
    updated_at
    as my replication key, with
    id
    as a value in
    table-key-properties
    • I don’t want to start from the beginning of time, because it’s absolutely too much data to handle, so I’ve got to manually set a
    date
    So the question is — how would you go about it?
    e
    • 2
    • 8
  • t

    Tanner Wilcox

    05/01/2025, 6:50 PM
    We need to run a
    show arp
    command on all routers at our ISP and get that data into our warehouse. Ansible is really good at communicating with network devices. It has profiles for each type and is able to recognize when a command starts/ends and parses that data for you. I don't think there's an equivalent tap for that with meltano. I'm wondering what the best way is to merge the two. Maybe I could make a tap that will call out to ansible and ansible can write what I want to a json file then my meltano tap can read from that json and pump it in to a raw table. Seems kind of weird at that point because all my tap is doing is just reading from a file. I could have ansible write directly to my postgres db but that feels like it'd be stepping on Meltano's toes. Looking for input
    👀 1
    v
    • 2
    • 7
  • d

    Don Venardos

    05/05/2025, 10:57 PM
    What is the best practice for setting up a different sync schedule for a subset of tables in a database? Should that be a separate project or maybe just set that up as an environment in the same project? Use case is that we have a table that receives periodic large bulk inserts and don't want that interfere with the other tables that have small changes which we want to replicate on a faster interval.
    m
    • 2
    • 2
  • s

    Siba Prasad Nayak

    05/08/2025, 1:42 PM
    Hi Team, I am getting one issue with "meltano invoke tap-mysql". MySQL is installed on my local machine. (localhost)
    Copy code
    - name: tap-mysql
        namespace: tap_mysql
        pip_url: ./connectors/tap-mysql
        executable: tap-mysql
        capabilities:
        - about
        - batch
        - stream-maps
        - schema-flattening
        - discover
        - catalog
        - state
        settings:
        - name: host
          kind: string
          value: localhost
        - name: port
          kind: integer
          value: 3306  # Or whatever port your PostgreSQL is running on
        - name: user
          value: root
        - name: password
          kind: string
          value: *******  # Use an environment variable!
          sensitive: true
        - name: database
          kind: string
          value: world
        - name: is_vitess
          kind: boolean
          value: false
    Error:
    Copy code
    (sibaVenv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend> meltano invoke tap-mysql
    2025-05-08T13:38:51.175173Z [warning  ] Failed to create symlink to 'meltano.exe': administrator privilege required
    2025-05-08T13:38:51.189778Z [info     ] Environment 'dev' is active   
    Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    Catalog discovery failed: command ['C:\\Siba_\\Work\\POC_ConnectorFactory\\Gerrit\\Connector_Factory_Development\\meltano-backend\\.meltano\\extractors\\tap-mysql\\venv\\Scripts\\tap-mysql.exe', '--config', 'C:\\Siba_\\Work\\POC_ConnectorFactory\\Gerrit\\Connector_Factory_Development\\meltano-backend\\.meltano\\run\\tap-mysql\\tap.79a26421-4773-4b39-a35d-577aa37522b8.config.json', '--discover'] returned 1 with stderr:
     Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Scripts\tap-mysql.exe\__main__.py", line 7, in <module>
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 1161, in __call__
        return self.main(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 1081, in main
        with self.make_context(prog_name, args, **extra) as ctx:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 949, in make_context
        self.parse_args(ctx, args)
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 1417, in parse_args
        value, args = param.handle_parse_result(ctx, opts, args)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 2403, in handle_parse_result
        value = self.process_value(ctx, value)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\click\core.py", line 2365, in process_value
        value = self.callback(ctx, self, value)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\singer_sdk\tap_base.py", line 554, in cb_discover
        tap.run_discovery()
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\singer_sdk\tap_base.py", line 309, in run_discovery
        catalog_text = self.catalog_json_text
                       ^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\singer_sdk\tap_base.py", line 329, in catalog_json_text
        return dump_json(self.catalog_dict, indent=2)
                         ^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\tap_mysql\tap.py", line 333, in catalog_dict
        result["streams"].extend(self.connector.discover_catalog_entries())
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\.meltano\extractors\tap-mysql\venv\Lib\site-packages\singer_sdk\connectors\sql.py", line 998, in discover_catalog_entries
        (reflection.ObjectKind.TABLE, False),
         ^^^^^^^^^^^^^^^^^^^^^
    AttributeError: module 'sqlalchemy.engine.reflection' has no attribute 'ObjectKind'
    
    (sibaVenv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>
    Copy code
    (sibaVenv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend> python -m pip show SQLAlchemy
    Name: SQLAlchemy
    Version: 2.0.39
    Summary: Database Abstraction Library
    Home-page: <https://www.sqlalchemy.org>
    Author: Mike Bayer
    Author-email: <mailto:mike_mp@zzzcomputing.com|mike_mp@zzzcomputing.com>
    License: MIT
    Location: C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend\sibaVenv\Lib\site-packages
    Requires: greenlet, typing-extensions
    Required-by: alembic, meltano
    (sibaVenv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend> meltano --version
    meltano, version 3.7.4
    (sibaVenv) PS C:\Siba_\Work\POC_ConnectorFactory\Gerrit\Connector_Factory_Development\meltano-backend>
    e
    • 2
    • 4
  • t

    Tanner Wilcox

    05/09/2025, 10:36 PM
    I need to drop my raw schema every time before running running a specific tap. In my research I learned about macros. I wrote this one based on a macro I saw from a blog post:
    Copy code
    {%- macro drop_schema() -%}
        {%- set drop_query -%}
            drop schema {{ target.schema }}
        {%- endset -%}
        {% do run_query(drop_query) %}
    {%- endmacro -%}
    I'm assuming I should be able to do something like this:
    mel run dbt:run-operation:drop_schema sonar warehouse
    but I get an error saying it can't find drop_schema. I have it in
    ./macros/
    . I'm assuming I need to put it in my meltano.yml under my dbt transfromer section. Maybe it should go under utilities? I'm at a loss
    r
    • 2
    • 1