Meltano #getting-started

Chad Bell

03/06/2025, 3:56 AM

Hi 👋 looking into Meltano for our ingestion from GCP Cloud SQL into BigQuery. Running:

meltano run tap-postgres target-bigquery

Is there a way to load the data into bigquery columns directly, instead of one json "data" column?

✅ 1

Allan Whatmough

03/07/2025, 5:38 AM

I haven't used Meltano for a while but I'm trying to get back to it now - I have a CI pipeline I built a long time ago but I've noticed it's giving me this error when trying to install Meltano:

No matching distribution found for meltano==2.10.0

Was this version yanked? I can still see it in PyPI

Juan Pablo Herrera

03/12/2025, 10:41 PM

Hi all, new meltano user here. I am running into a "BrokenPipeError", and im not sure why. I have a csv file in my desktop and I trying to store in a parquet file. Here is also my meltano.yml. I thought it could be file size but right now my file has 10k rows. Thank you!

Copy code

version: 1
default_environment: dev
project_id: 2fc4aa94-ed4d-49cd-9b6b-c1644bf4608e
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
      - path: 'file:///Users/juanherrera/Desktop/subway-monthly-data'
        name: 'subway_monthly_data'
        pattern: 'MTA_Subway_Hourly_Ridership_small.csv'
        start_date: '2025-03-12T15:30:00Z'
        prefer_schema_as_string: true
        key_properties: ['id']
        format: csv

  loaders:
  - name: target-parquet
    variant: automattic
    pip_url: git+<https://github.com/Automattic/target-parquet.git>
    config:
        destination_path: data/subway_data
        compression_method: snappy
        logging_level: info
        disable_collection: true

✅ 1

Nivetha

03/17/2025, 7:32 PM

Hi, I'm new to Meltano and data engg in general, wondering if there's something I'm doing wrong here or if it's just that Hubspot private apps are not yet supported in Meltano (https://github.com/singer-io/tap-hubspot/issues/211). I'm trying to configure the tap-hubspot extractor with my Hubspot private app details. I have the client_id set to my Hubspot private app access token, client_secret set to the client secret, and a redirect_url set as well. When I test the configuration, I keep getting the error "Exception: Config is missing required keys: ['refresh_token']". However, there is no refresh token available that I can see in my Hubspot private app. Is there a way to disable this requirement? I see a section called "overriding discoverable plugin properties" in this documentation (https://docs.meltano.com/guide/configuration) but unsure if that applies here. Thanks for any help you can provide.

Oren Teich

03/20/2025, 4:21 AM

What's the suggested approach for handling transformations that are beyond DBT's capabilities? I've got a postgres warehouse, loading CSV data in. I need to do fuzzy matching on company name (e.g. ACME Corp. vs Acme Co. vs Acme). It's brutal in SQL/DBT. Ideally I could write a python script to do it. I see that there are plugins that are 'strongly discouraged'. Is there a recommended approach? Worst case, I'll write a custom script that is outside the meltano pipeline, but was hoping for something that might be more integrated, for example to help deal with incremental updates.

Alejandro Rodriguez

03/20/2025, 2:04 PM

Hi, I’m trying to set up replication from Cloud SQL MySQL to BigQuery and I’m having an issue with tap-mysql not using state. The replication works fine except it always starts from scratch due to not picking up on the state the previous run generated. It does generate the state well when a run completes, but on the next run I always see the following two lines:

2025-03-20T13:59:43.766320Z [debug    ] Could not find state.json in /projects/.meltano/extractors/tap-mysql/state.json, skipping.

2025-03-20T13:59:43.793158Z [warning  ] No state was found, complete import.

and then every table says it requires a full resync. Even when I manually copy the state to the location in the first log line, it doesn’t pick it up. Any ideas?

Don Venardos

04/22/2025, 1:03 AM

I am having issues with tap-mssql variant SpaceCondor using log based replication (Change Tracking not CDC): https://hub.meltano.com/extractors/tap-mssql--spacecondor/ I have Change Tracking configured in SQL Server but am getting a full replication on each run when using meltano run (switched to target-jsonl from target-snowflake for debugging):

meltano run tap-mssql target-jsonl

The state gets updated with each run:

{"bookmarks": {"dbo-c_logical_field_user_values": {}}}

Extractor config:

extractors:

- name: tap-mssql

config:

host: PROJECT01

port: 60065

database: rss_test

username: svcTestAccount

default_replication_method: LOG_BASED

sqlalchemy_url_query_options:

- key: driver

value: ODBC Driver 18 for SQL Server

- key: TrustServerCertificate

value: yes

select:

- dbo-c_logical_field_user_values.*

I think this might be a configuration issue, not sure but perhaps isn't picking up the default replication method?

{"event": "Visiting CatalogNode.STREAM at '.streams[352]'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.910101Z"}

{"event": "Setting '.streams[352].selected' to 'False'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910162Z"}

{"event": "Setting '.streams[352].selected' to 'True'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910211Z"}

{"event": "Skipping node at '.streams[352].tap_stream_id'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910259Z"}

{"event": "Skipping node at '.streams[352].table_name'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910306Z"}

{"event": "Skipping node at '.streams[352].replication_method'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910354Z"}

{"event": "Skipping node at '.streams[352].key_properties[0]'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910402Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.logical_field_sid'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.910457Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.enabled_flag'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.910513Z"}

{"event": "Skipping node at '.streams[352].schema.properties.enabled_flag.maxLength'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910604Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.modified_by_user_sid'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.910779Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.modified_datetime'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.910924Z"}

{"event": "Skipping node at '.streams[352].schema.properties.modified_datetime.format'", "level": "debug", "timestamp": "2025-04-22T00:34:06.910988Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.timestamp'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.911099Z"}

{"event": "Visiting CatalogNode.PROPERTY at '.streams[352].schema.properties.system_modified_datetime'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.911160Z"}

{"event": "Skipping node at '.streams[352].schema.properties.system_modified_datetime.format'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911212Z"}

{"event": "Skipping node at '.streams[352].schema.type'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911262Z"}

{"event": "Skipping node at '.streams[352].schema.required[0]'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911312Z"}

{"event": "Skipping node at '.streams[352].schema.$schema'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911361Z"}

{"event": "Skipping node at '.streams[352].is_view'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911410Z"}

{"event": "Skipping node at '.streams[352].stream'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911458Z"}

{"event": "Visiting CatalogNode.METADATA at '.streams[352].metadata[0]'.", "level": "debug", "timestamp": "2025-04-22T00:34:06.911509Z"}

{"event": "Visiting metadata node for tap_stream_id 'dbo-c_logical_field_user_values', breadcrumb '['properties', 'logical_field_sid']'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911558Z"}

{"event": "Setting '.streams[352].metadata[0].metadata.selected' to 'False'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911616Z"}

{"event": "Setting '.streams[352].metadata[0].metadata.selected' to 'True'", "level": "debug", "timestamp": "2025-04-22T00:34:06.911665Z"}

Anyone have suggestions on troubleshooting? No errors like the previous question about not finding the state. SQL Server tables have Change Tracking enabled in SQL Server as:

ALTER TABLE dbo.' + @ls_table_name + N'

ENABLE CHANGE_TRACKING

WITH (TRACK_COLUMNS_UPDATED = OFF);

jack yang

04/24/2025, 2:15 AM

Does Meltano support Change Data Capture (CDC) functionality for MySQL?

Rafael Rotter

04/28/2025, 6:47 PM

Hello! I'm starting out in data engineering and I need to integrate a MongoDB database with BigQuery. I found Meltano with a solution for this, but I'm having problems; when I try to test the connection (

meltano config tap-mongodb test

) I get the message:

Copy code

m-meltano:~/prj-mdb-gbq$ meltano config tap-mongodb test
2025-04-28T17:50:03.990046Z [info     ] The default environment 'dev' will be ignored for `meltano config`. To configure a specific environment, please use the option `--environment=<environment name>`.
2025-04-28T18:03:11.496374Z [warning  ] Stream `classe` was not found in the catalog
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to join our friendly Slack community.
Plugin configuration is invalid
No RECORD or BATCH message received. Verify that at least one stream is selected using 'meltano select tap-mongodb --list'.

The meltano.yml looks like this:

Copy code

version: 1
default_environment: dev
project_id: c1ac854b-545d
environments:
- name: dev
plugins:
  extractors:
  - name: tap-mongodb
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/tap-mongodb.git>
    config:
      mongo:
        host: 12.34.5.678
        port: 27017
        directConnection: true
        readPreference: primary
        username: datalake
        password: ****
        authSource: db
        tls: false
      strategy: infer
    select:
    - classe.*
    metadata:
      dbprocapi_classe:
        replication_key: replication_key
        replication-method: LOG_BASED

For testing purposes I am trying to load only the "classe" collection (- classe.*) from the db database. When I use the command "`meltano select tap-mongodb --list --all`" I have :

Copy code

Enabled patterns: classe.*

but also appears in

Copy code

[excluded   ] db_classe.field1
[excluded   ] db_classe.field2
[excluded   ] db_classe.field3

It is important to note that MongoDB does not have replicas. I'm using: • a VM on Google Cloud to access MongoDB, both on the same network; • the tap-mongodb extractor (z3z1ma). Could someone please help me? Thank you.

✅ 1

Tanner Wilcox

04/29/2025, 8:37 PM

We have a bunch of network devices all running the same software. I need to query each of them with the same extractor. The queries will all be the same and the schemas will also all be the same. What's the correct way to do that? I'm planning on using the rest tap but can write my own if that's cleaner. Ideally I would be able to supply a list of hosts to reach out to based on the response from an initial api call to get a list of hosts

Jordan Lee

04/30/2025, 1:29 AM

The containerization documentation (https://docs.meltano.com/guide/containerization/) recommends

meltano add files files-docker-compose

, but this adds a broken

docker-compose.yml

definition that doesn't start, throwing

Error: No such command 'ui'.

Steven Searcy

05/02/2025, 3:17 PM

Hello! I am new to Meltano and I’m working on a pipeline using tap-csv to ingest a CSV file and would like to load its data into multiple Postgres tables, depending on column mappings. Has anyone done this with Meltano before? Curious if you’d recommend using stream maps, custom plugins, or something like dbt for post-load splitting. Any best practices or patterns would be greatly appreciated!

Rafael Rotter

05/09/2025, 2:41 PM

Hello! I'm using

target-bigqeury

(z3z1ma) to receive data from MongoDB into BigQuery. I managed to send some collections to the target (not all), but some questions arose. If you could help me when you can, please, I would appreciate it: 1. How can I specify in

target-bigquery

some tables in BigQuery that should be partitioned by field X and clustered by Y, Z? 2. Why are two tables created in BigQuery: one with the execution time suffix, with data, and another without suffix and without data? Is a new table created with each load? (attached file) 3. I would like to confirm: normally there is no change in the MongoDB schema, but it can occur in case of an update. I am using denormalized: true. In case of a change, this can impact the load, correct? 4. The last error I got was "`ParseError: null is not allowed to be used as an element in a repeated field at processo.prioridade[0]`". Is it possible to handle this in stream-maps? Thanks!

Christian Hilden

05/12/2025, 9:11 AM

Hello, I am trying to setup the rest-api tap. But, there seems to be a problem either with auth or my api url. Is there anyway to see which requests are beeing run by the plugin during config test (config tap-rest-api-msdk test)?

Florian Bergmann

05/12/2025, 11:45 AM

Hi all, I have a question concerning the use_singer_decimal property. My pipeline is from tap-oracle (variant: s7clarke10) to target-snowflake (variant: meltanolabs). The source column MYCOLUMN is Number(20,2) in the Oracle DB. My extractor's properties look like this:

Copy code

plugins:
  extractors:
  - name: tap-oracle
    variant: s7clarke10
    pip_url: git+<https://github.com/s7clarke10/pipelinewise-tap-oracle.git>
    config:
      ...
      use_singer_decimal: true
    select:
    - TEST-TEST_TABLE.*
    schema:
      TEST-TEST_TABLE:
        MYCOLUMN:
          type: [string, 'null']
          format: x-singer.decimal
          precision: 20
          scale: 2

- The column MYCOLUMN is extracted in the JSON as expected: ...,"MYCOLUMN":"1234.56",... - However, in Snowflake the column MYCOLUMN is created as NUMBER(38,0) instead of NUMBER(20,2) and inserted as the rounded value '1235' - In case I create the target table in advance as NUMBER(20,2), the inserted value looks like '1235.00' What am I missing here / doing wrong?

Rafael Rotter

05/15/2025, 7:13 PM

Hi everyone! I would like to confirm some informations, please: 1. _Does tap-mongodb (

z3z1ma

) support the LOG_BASED replication-method?_ 2. If so, then the MongoDB database needs a replica set, right? Thanks!

Dries Beheydt

05/20/2025, 8:11 PM

Hi Melty Crew, I wanted to set up a toy example of a dbt-meltano-dagster combo. I used this awesome starting guide: https://medium.com/@kenokumura/how-to-orchestrate-dbt-with-dagster-in-multi-containers-on-docker-ebf0d171a3a9 and now added meltano and dagster-meltano to the Dockerfile_user_code, mounted my meltano project, and added a "load_jobs_from_meltano_project" to repo.py. Doing only dagster+meltano or dagster+dbt this way works, but together it breaks (something wrong with snowplow-tracker...), which I suspect is a version thing. To get started, which version of these packages should be compatible here? I trial-errored some combinations but got nowhere. Thanks a lot!!

hammad_khan

05/22/2025, 3:09 PM

Hey there, I am using tap-salesforce to pull data, first time full load and then incremental. There are situations when I need to pull data only latest 10 records from Contact or Account. Is it possible to do in MeltanoLab variant?

Copy code

plugins:
  extractors:
  - name: tap-salesforce
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git@v1.9.0>
    config:
      client_id: xxxxx
      max_workers: 8
      api_type: REST
      instance_url: <https://xx.xx.salesforce.com>
    select:
    - Account.*

Miroslav Nedyalkov

06/19/2025, 8:23 AM

Hey, everyone. I’m trying to move some data from MongoDB to Snowflake. We already have a lot of dbt transformations running on Snoflake, so I’m just re-implementing our EL part of the ELT. I’ve decided to use the MongoDB Tap by MeltanoLabs and it successfully connected to my MongoDB database. I’m not sure, though, how am I supposed to: 1. Define which collections (and ideally which of their fields) to get moved to Snowflake (I figured I can use

select

in the

meltano.yml

, but not sure if that’s the right way to do it) 2. Define what strategy to use to sync the data - I’d like to generally use LOG_BASED, but need to do a complete sync first. It sees this particular tap doesn’t support FULL_SYNC. I also couldn’t get

LOG_BASED

to work (I set it up with

metadata

section in the

meltano.yml

file, where I set

under which

replication-key: replication_key

and

replication-method: LOG_BASED

) - I get

resume token string was not a valid hex string

which I assume is because I’ve never done a full sync, but when I try to do it, I get an error this tap doesn’t work with full sync. My use case sounds rather standard - I just need to move data from MongoDB (limited set of collections) => Snowflake tracking changes, but without applying any transformations nor parsing. I need initial full import, and log-based import after that, but unfortunately I couldn’t get it to work…

Vinay Mahindrakar

06/30/2025, 8:24 AM

Hi Everyone, Is it possible to store

select:

and

metadata:

in a separate file instead of including them directly in the

meltano.yml

file?

Copy code

plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: meltanolabs-tap-postgres
    config:
      host: localhost
      port: 5432
      user: postgres
      password: '123123'
      database: postgres
      filter_schemas: [public]
    select:
    - public-work_orders.*
    - public-trips.*
    metadata:
      public-work_orders:
        replication_method: "INCREMENTAL"
        replication_key: created_at
      public-trips:
        replication_method: "INCREMENTAL"
        replication_key: created_at

Jonathan Nunes

07/03/2025, 9:22 PM

plugins: extractors: - name: tap-mysql variant: transferwise pip_url: git+https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql config: host: $MYSQL_HOST port: $MYSQL_PORT user: $MYSQL_USER password: $MYSQL_PASSWORD database: $MYSQL_DATABASE use_gtid: true filter_dbs: $MYSQL_DATABASE how can I configure so the tap just get the selected database and selected tables that I want to extract

✅ 1

Jonathan Nunes

07/04/2025, 12:21 PM

any of you guys know about this error i'm having when using target-redshift --variant ticketswap? and how to solve it? already checked if my loader is up to date 2025-07-04T121112.175516Z [error ] Loading failed code=1 message="TypeError: prepare_column() missing 1 required positional argument: 'cursor'" name=meltano run_id=f7f213ef-0e5e-4dd2-87c2-45f83e0f983f state_id=2025-07-04T120759--tap-mysql--target-redshift

✅ 1

Jonathan Nunes

07/07/2025, 1:18 PM

Is there any documentation on how to config your catalog.json after I run a --discover ? I'm using tap-mysql target-redshift both from transferwise, but I'm having some issues with log_based load method, since every run it runs discover again and it seems to do a full load of the table again, instead of just updating based on the binlog extractors: - name: tap-mysql variant: transferwise pip_url: git+https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql config: host: $MYSQL_HOST port: $MYSQL_PORT user: $MYSQL_USER password: $MYSQL_PASSWORD database: $MYSQL_DATABASE filter_dbs: $MYSQL_DATABASE use_gtid: true select: /*tables ##### */

✅ 1

Jonathan Nunes

07/10/2025, 3:12 PM

Hi guys, I have a quick question regarding the best way to handle multiple taps in Meltano, each targeting around 1,000 tables. These tables are filtered from a source database that contains over 5,000 tables. Due to the volume, it's not feasible to list all 1,000 tables under the

select

parameter in the

meltano.yml

file. On the other hand, if I use a custom

catalog.json

, Meltano seems to ignore the tap configuration defined in

meltano.yml

. What would be the recommended approach in this scenario? If there are any documented use cases or best practices for managing large-scale tap configurations like this, I’d really appreciate it if you could share them.

Lennart

07/16/2025, 4:04 PM

Hi all, I've just started using meltano and testing, but I'm at a loss. I'm using https://github.com/MeltanoLabs/tap-salesforce and been able to set this up and get it to work, but not the way I expect it to and can't figure out why. Please see my meltano yml for this tap below, using this as is without a catalog results into an empty run as if nothing is selected. If I point it to a catalog by uncommenting it, it tells me that it doesn't have the capabilities for which reason I added it manually. Running a simple el still gives me nothing and nothing is selected from the catalog. Only when I point it at a different catalog I made named filtered_tap-salesforce.catalog.json and run el it gives me an output, but not as expected as it outputs both Account and Opportunity. Difference between the 2 catalogs is that the filtered only includes Account and Opportunity and in the metadata for each property "selected": true is added, whereas the other catalog doesn't have this. I'm very confused why only this setup will let me extract data as my understanding is that I should be able to use the select block to set this and have meltano deal with the discovery, but it doesn't select anything if I run it this way. And even when I use a overall catalog, why does it ignore the select block and return with nothing. Would like to be able to run the tap without having to define for all 1000+ objects and many more properties "selected": true and be able to handle new fields. Thanks!

Copy code

# --- Salesforce ---
  - name: tap-salesforce
    namespace: tap_salesforce_meltanolabs
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git@v1.9.0>
    capabilities:
    - catalog
    - discover
    - state
    #catalog: catalogs/tap-salesforce.catalog.json
    config:
      username: ${SF_USERNAME}
      password: ${SF_PASSWORD}
      security_token: ${SF_SECURITY_TOKEN}
      api_type: REST
      start_date: '2025-07-01T00:00:00Z'
      select_fields_by_default: true
      #streams_to_discover:
      #- Account
      #- Opportunity
      select:
      #- Account.*
      - Opportunity.*

✅ 1

Lazaros Panitsidis

08/13/2025, 11:31 AM

Hello everyone, I’m a junior data engineer doing a POC to see if Meltano can be used for our bronze layer EL (extract + load raw data into Databricks tables/S3) while taking advantage of Spark and distributed compute. Main things I’m trying to find out: 1. Has anyone successfully run Meltano inside Databricks Jobs/Clusters? Was it stable and easy to maintain? 2. Has anyone packaged a Meltano project as a Python wheel and installed it on Databricks clusters? How did you handle dependencies/plugins? 3. Is Databricks Container Service a good option for Meltano, or is there a simpler, proven approach? 4. For orchestration, is Airflow still better, or can Databricks-native workflows handle Meltano well? Thanks in advance!

Cristiano Motta

08/23/2025, 6:59 PM

Hi everyone 👋 Does anyone know how I can configure the target-postgres so that it truncates a table as the first step of a data load? Thanks in advance!

Matěj Novák

09/17/2025, 12:33 PM

Hi, does anyone have experience running Meltano on GCP or on Snowflake?

Matěj Novák

09/17/2025, 12:40 PM

Also, is there somewhere explained how does Meltano work? Does it load everything into memory and then pushes everything into the target? Is it extracting and loading partially?

hawkar_mahmod

11/19/2025, 4:12 PM

how can i setup two variants of the same plugin (a target), in the same project? The docs explain how to configure two of the same plugin, with the same variant, but not the other way round, it seems to me.