https://linen.dev logo
Join Slack
Powered by
# docker
  • j

    jan_soubusta

    02/01/2024, 11:05 PM
    Hey, I am building custom images for meltano, dbt, and GoodData client. Meltano build takes way longer than the others. Also, the size of the image is at least 2x bigger than the others. Thinking about to create an optimized Dockerfile building base Meltano image (without plugins, ...). Could you point me to the Dockerfile you currently use to build images which you push to Dockerhub? Have you tried to use smaller base (OS) image?
    e
    m
    • 3
    • 5
  • a

    Andy Carter

    02/12/2024, 3:51 PM
    Moving over to deploying my docker image via azure pipelines (rather than running docker locally): it emerges that my
    dbt_packages
    dir was not getting rebuilt and just picking up my local copy. How can I run a
    dbt deps
    command in my dockerfile if dbt is installed as part of my dagster utility env? Here's the relevant section of my
    meltano.yml
    Copy code
    utilities:
      - name: dagster
        variant: quantile-development
        pip_url: dagster-ext dagster-postgres dagster-dbt dbt-postgres dagster-azure pendulum==2.1.2 dagster_msteams
        settings:
        - name: dagster_home
          env: DAGSTER_HOME
          value: $MELTANO_PROJECT_ROOT/orchestrate/dagster
        commands:
          dev:
            args: dev -f $REPOSITORY_DIR/repository.py --dagit-host 0.0.0.0 -d $REPOSITORY_DIR
            executable: dagster_invoker
    βœ… 1
    e
    a
    • 3
    • 7
  • c

    choudary kukkapalli

    02/12/2024, 8:14 PM
    πŸ‘‹ Hello, team! I am trying to install meltano on a docker container in my work PC, using a custom ca certificate issued by zscaler , however i get SSL verify failed error , appreciate if anyone can direct me to any blog on resolving this error ,
    meltano==3.3.1
    --> this is what in requirements file
    Copy code
    FROM python:3.11-slim
    
    LABEL maintainer="WFS Corp"
    
    # Set working directory
    WORKDIR /opt
    
    # Install OS dependencies
    RUN apt-get update && \
        apt-get install -y build-essential freetds-bin freetds-dev git libkrb5-dev libssl-dev tdsodbc unixodbc unixodbc-dev && \
        rm -rf /var/cache/apt/archives /var/lib/apt/lists/*
    
    # ssl cert
    COPY certs/zscaler.pem /usr/local/share/ca-certificates/zscaler.crt
    RUN update-ca-certificates --fresh
    
    #ENV HTTP_PROXY=
    #ENV HTTPS_PROXY=
    ENV REQUESTS_CA_BUNDLE=/usr/local/share/ca-certificates/zscaler.crt
    ENV CURL_CA_BUNDLE=/usr/local/share/ca-certificates/zscaler.crt
    
    # Make sure we are using latest pip
    RUN pip install --upgrade pip wheel
    
    # Copy requirements.txt
    COPY ../requirements.txt requirements.txt
    
    # Install dependencies
    RUN pip install -r requirements.txt
    j
    • 2
    • 2
  • i

    Ian OLeary

    03/12/2024, 6:40 PM
    How do you guys set environment variables in your container? Do you pull them from your .env? Build them into your container as part of your CI/CD pipeline? Do you pull them from a key vault? What are some best practices to follow? Locally I just have them set in my .env in plaintext - is that fine?
    h
    v
    +2
    • 5
    • 10
  • y

    yao_zhang

    03/19/2024, 11:34 PM
    Hi, we're trying to upgrade our meltano docker image to
    meltano/meltano:v3.3-python3.10
    and have a plugin that is dependent on
    python3.9
    while the rest are compatible with
    3.10
    . I want to structure the dockerfile so that it supports multiple python versions and do
    meltano install
    of the plugins during different build stages with different python versions. Can someone show me how the dockerfile should be structured or provide an example? https://docs.meltano.com/reference/settings/#python
    βœ… 1
    e
    p
    • 3
    • 4
  • c

    Chris Goodell

    03/21/2024, 1:26 PM
    Hi everyone, we are using the
    tap-linkedin-ads
    extractor (meltanolabs implementation) to obtain LinkedIn Ads data, and I am a little curious about how the token refresh is intended to be managed? When you obtain an OAuth 2.0 access token for Linkedin, you also are provided a refresh token in order to make a secondary call to refresh your access token. (The access token has an expiry of 60 days, and the refresh token has an expiry of 365 days.) I didn't see anything in the client.py where the refresh token is actively managed, so the access token would expire in 60 days by default, unless I am missing something? Is the tap able to manage this somehow? I also looked at the other versions for this same `tap-linkedin-ads`; the Stitch Data implementation seems to have a provision for that functionality in client.py, in two functions named
    refresh_access_token
    and
    fetch_and_set_access_token
    , but I am unsure how the new access token would be stored, as it would refresh the token and receive the new one, but it is self contained within the k8s pod and won't persist. https://hub.meltano.com/extractors/tap-linkedin-ads https://github.com/MeltanoLabs/tap-linkedin-ads https://github.com/singer-io/tap-linkedin-ads
    v
    h
    • 3
    • 6
  • w

    Willi Langlitz

    03/22/2024, 11:44 AM
    Hi all, I am trying to run meltano via docker on our Gitlab CI/CD pipeline. So far, I can mount my meltano project to the docker instance, but for some curious reason it looks like the meltano.yml is not read at all. By running "docker run -v $(pwd)/dbmigrator:/project -w /project meltano/meltano:v1.99.0-python3.7 install" meltano is not installing any plugins and I have to run "docker run -v $(pwd)/dbmigrator:/project -w /project meltano/meltano:v1.99.0-python3.7 add extractor tap-mysql". And when I am running the elt pipeline meltano is complaining about replication method, despite replication-method set to FULL_TABLE in the meltano.yml
    v
    e
    • 3
    • 13
  • w

    Willi Langlitz

    03/22/2024, 11:45 AM
    So for me it looks like the meltano.yml is not read oder found properly
    βœ… 1
    v
    • 2
    • 1
  • w

    Willi Langlitz

    03/22/2024, 11:45 AM
    Does anyone have an idea what I might do wrong?
    βœ… 1
    v
    • 2
    • 2
  • s

    Siva Achyuth

    04/02/2024, 10:57 AM
    Hi, we are creating meltano using docker via ansible. we are using WORKDIR as /project and and the volume mount is also "project". we want to sync the files in the project volume to custom location. we have tried using -v {custom_location}/projectz but the files are not being copied to custom location. Could anyone help me over here
  • f

    Fayaz Ahmed

    05/06/2024, 6:50 AM
    Hi all, just looking for a confirmation, please. While building the image using docker, I am simply doing this "meltano add extractor tap-rest-api-msdk". My question is 1. Is there anyway I can specify the version as an argument to specify which version of this tap I want to install? 2. Or is the only way to specify this in .yml as specified in https://docs.meltano.com/guide/plugin-management/#pypi-package
    βœ… 1
    e
    • 2
    • 2
  • j

    joshua_janicas

    05/09/2024, 7:16 PM
    Hi all, I have been banging my head against a wall trying to get Docker to start Dagster as an entrypoint via Meltano. I am currently using the https://hub.meltano.com/utilities/dagster/ utility. If I run meltano inside the container to invoke dagster to start it up, it works just fine. However if I try
    CMD
    or
    ENTRYPOINT
    , nothing seems to happen and Docker composes without ever starting Dagster. Looking for thoughts as to what I could be doing wrong here.
    e
    • 2
    • 23
  • a

    Andy Carter

    05/16/2024, 11:01 AM
    FYI I have been working on a meltano deployment with dagster and dbt in Azure for about a year now, I have tried various containerised platforms in Azure and have just migrated onto App Service which I hope will be a 'final resting place' for our infra, seems to have good value and high performance. If anyone has any questions I would be happy to answer them.
    j
    e
    • 3
    • 30
  • s

    Siddu Hussain

    05/29/2024, 5:12 AM
    Hi Team, What is the best way to test if all my code is working fine after building an image, do you guys run the el with dev/ staging creds for all pipelines you have or is there a simpler/better approach.
    j
    • 2
    • 1
  • s

    sreepriya m

    05/31/2024, 3:01 PM
    Hi, I have created a docker image of meltano code and deployed in cloudrun and it will be executed from a python wrapper code which will pass parameters on the fly. But on running parallel table loads, it throws an error that 'duplicate plugin'. Seems there is a clash with the meltano.yml file both jobs are creating on the fly. If we set concurrency in cloudrun to '1', this issue is not present because there is no duplicate file since container instances are seperate. But if we set concurrency to a value higher than 1, this issue of 'duplicate plugin' arises and the second parallel meltano job fails. Can you guide me here on how parallel execution is possible with meltano image deployed in a service like cloudrun. Appreciate a quick response if possible.
    e
    • 2
    • 1
  • s

    sreepriya m

    05/31/2024, 3:06 PM
    meltano run tap-mssql tap-bigquery - 2 jobs are using same command @Edgar RamΓ­rez (Arch.dev)
    e
    • 2
    • 1
  • s

    sreepriya m

    05/31/2024, 3:09 PM
    oh ok, any idea on how will the --state-id-suffix be different for both the jobs? any reference?
    e
    • 2
    • 3
  • a

    Ahmed Hamid

    06/02/2024, 10:10 AM
    Hi all, I'm working on a project where I aim to leverage Docker for containerizing both Meltano and Dagster. My goal is to create a demo setup that showcases how these two technologies can work together, particularly focusing on data transformation workflows. Do any of you have experience or know of existing demo projects that use Docker to run Meltano alongside Dagster? Ideally, I'm looking for setups where Meltano acts as either a source or target within Dagster pipelines, all managed through Docker containers. Any insights, resources, or even example Dockerfiles/Dagster pipelines would be incredibly helpful This will not only help me understand best practices but also inspire my own project.
    a
    a
    • 3
    • 3
  • s

    Siddu Hussain

    06/03/2024, 7:53 PM
    Hi All , did anyone deploy using python 3.11-alpine , I started doing it getting into one or other missing packages first at Meltano level then at tao and target level
    e
    p
    • 3
    • 2
  • s

    sreepriya m

    06/12/2024, 6:43 PM
    Hi,- I have a scenario to load a table with close to 100 million records from MSSQL server(on-premise) to Bigquery in GCP. I am using tap-mssql extractor and target-bigquery loader(default variants). Can you tell me settings/configurations which need to be set in both extractor ad loader in meltano.yml to complete the data load fast and in an efficient manner ?
    e
    v
    • 3
    • 4
  • s

    sreepriya m

    06/21/2024, 11:21 AM
    Hi, we are looking for a way to apply table level filter when extracting data from MSSQL server using tap-mssql plugin in meltano. We tried below option, but it is not working . Can you help to provide right method to achieve table level filtering during extraction? " extractors: - name: tap-mssql variant: wintersrd pip_url: tap-mssql config: database: host: port: tds_version: '7.3' use_date_datatype: true user: dev password: ** select: - dbo-emp.* filter: empId: =3 metadata: dbo-emp: replication-method: FULL_TABLE (edited)
    e
    j
    • 3
    • 6
  • j

    Jens Christian Hillerup

    07/14/2024, 5:49 PM
    Hi. I'm working with PII data, so in order to avoid a lot of paperwork I want to deploy Meltano in an environment which we're already clear for in terms of compliance audits. That means Heroku. Part of the EL workflow is to access our prod database and strip/mask the PII before loading it in another database (a new Heroku Postgres deployment with different credentials etc.) I got
    tap-postgres
    and
    target-postgres
    up and running pretty quickly and I'm ready to try deploying it to Heroku, but I'm wondering about the
    .meltano
    directory: besides the meltano.db SQLite database, does it contain anything that must be persisted? This docs page lists what's in the directory: β€’ I can live without the log files for prod (or potentially find a way to get the logs themselves extracted and loaded somehow) β€’ I suppose the `venv`s of the needed Python packages could be created at
    docker build
    -time? I know Meltano supports pluggable system databases, and I'm planning on just using letting Meltano have a schema in my BI database for that. Other than that, what else do I need to know for a stateless Docker deployment (on Heroku, in my case)?
    c
    v
    e
    • 4
    • 6
  • m

    Matthew Hooson

    08/10/2024, 10:28 AM
    I am really struggling to get my docker container running using cloud composer and Kubernetes engine. Locally the container runs, but in the live environment I see a β€˜[base] exec /venv/bin/meltano: exec format error’. From search I take this to be an issue with architectural mismatch?
    βœ… 1
    v
    • 2
    • 3
  • h

    haleemur_ali

    09/20/2024, 1:52 PM
    For folks deploying meltano on Azure App Jobs or AWS batch (or comparable service). How do you handle scaling up your container if it encounters an OOM error?
    v
    l
    • 3
    • 9
  • n

    Nghia Nguyen Truong Tri

    11/07/2024, 5:21 PM
    πŸ‘‹ Hello, team!
    waves 1
  • n

    Nghia Nguyen Truong Tri

    11/07/2024, 5:23 PM
    I run docker command to mount my project with command : Podman run -v /Users/nghiamac/Development/dashboard_project:/project meltano/meltano init /project then it returned error as below. Could anyone help me ?
    Copy code
    Creating system database...Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
    join our friendly Slack community.
    
    Failed to initialize database: (sqlite3.OperationalError) disk I/O error
    [SQL: PRAGMA journal_mode=WAL]
    (Background on this error at: <https://sqlalche.me/e/20/e3q8>)
    v
    e
    • 3
    • 9
  • a

    Abednego Santoso

    11/26/2024, 2:35 PM
    hello everyone! I tried using docker following guidance from https://docs.meltano.com/guide/installation-guide/ My project was successfully created, I went to the my project folder. Then, when I tried to do
    meltano --version
    , it said
    meltano: command not found
    . Can you please help me?
    βœ… 1
    v
    g
    e
    • 4
    • 19
  • j

    Jacob Ukokobili

    01/21/2025, 2:01 PM
    Hi! It’s my first time using Meltano to perform an EL workflow, and I must admit, putting this together has been quite challenging. I’d love for this EL pipeline to be reviewed and to hear your suggestions on how it can be improved. Your recommendations would be greatly appreciated!
    Copy code
    version: 1
    default_environment: dev
    project_id: 751cca76-711b-46ec-8e5c-26afb7f94623
    environments:
      - name: dev
      - name: staging
      - name: prod
    plugins:
      extractors:
        - name: tap-mysql
          variant: transferwise
          pip_url: git+<https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql>
          config:
            database: ${TAP_MYSQL_DATABASE} 
            user: ${TAP_MYSQL_USER} 
            port: ${TAP_MYSQL_PORT} 
            host: ${TAP_MYSQL_HOST} 
          select:
            '*.*': true  # Select all tables by default
          metadata:
            '*.*':
              replication-method: INCREMENTAL  # Use INCREMENTAL replication for all tables
              replication_key: update_time    # Replace with your timestamp column
              key_properties:
                - id                         # Replace with your primary key column
            '*.*_audit':
              selected: false  # Exclude tables ending with "_audit"
    
      loaders:
      - name: target-bigquery
        variant: z3z1ma
        pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
        config:
          dataset: ${TARGET_BIGQUERY_DATASET}
          location: ${TARGET_BIGQUERY_LOCATION}
          project: ${TARGET_BIGQUERY_PROJECT} 
          credentials_json: ${TARGET_BIGQUERY_CREDENTIALS_JSON}
    r
    • 2
    • 12
  • v

    Victor Castro

    01/23/2025, 12:46 AM
    Hi everyone, I recently started working with Meltano and Docker, but I’m running into some issues. I have an urgent project where Airflow needs to act as the orchestrator, and Meltano will handle the extract and load (EL) processes. The data will be extracted from a PostgreSQL database, and the entire setup must run using Docker. I’m using the docker-compose file provided by Airflow as a template. However, when I add a Meltano container to the docker-compose configuration, the Meltano container keeps restarting (not sure if that’s intended). Additionally, I’m unable to execute Meltano commands in an Airflow DAG using the BashOperator (gives an error "meltano command not found"), which is a key requirement for my project. I’ve spent over 20 hours searching for a solution, but as a beginner, I might be missing something fundamental. I’m feeling very lost and could really use some help! Where do I start to effectively use Docker, Airflow, and Meltano at once, while building this connection that enables me to run Meltano commands with BashOperators? I'm sorry if this is an unrelated question, I'm very lost and starting to lose hope.
    βž• 1
    j
    • 2
    • 4
  • m

    Mario

    02/06/2025, 12:30 AM
    Hello, I've got a bit of an interesting problem that I haven't had much luck in solving. We use the
    tap-braze
    plugin which relies on Airbyte. Recently (as of this PR it seems), airbyte stopped making their pypi package available (based on their registry), forcing meltano to use the docker image. I currently use our existing Airflow to orchestrate meltano jobs. A
    KubernetesPodOperator
    using the following arguments
    Copy code
    arguments=["run", "tap-braze", "target-postgres"],
    runs the image that we built which contains our project. Up until about a week ago, when said PR was merged, this was working just fine (I assume because it was using the pypi package with no problem). Now the
    tap-braze
    plugin fails with launching the airbyte docker image since we're running our Airflow cluster on EKS. The Airbyte wrapper assumes usage of
    docker
    even though
    OCI_RUNTIME
    is being overridden. I'm running on EKS 1.31 which no longer uses docker as a runtime but rather
    containerd
    . I've worked around this by installing
    nerdctl
    into my project docker image (as it should be a drop-in replacement for docker cli) and `ln`'ing
    nerdctl
    to
    docker
    . I've also gone ahead and mounted the following host directories to get
    nerdctl
    to at least be able to pull the airbyte image. β€’ /tmp β€’ /run/containerd β€’ /var/lib/containerd β€’ /var/lib/nerdctl My problem is that I still get the following error message and I'm kinda lost lol, my next guess is that the host running the airbyte image needs to have
    nerdctl
    installed but I'm hoping someone else has come across something similar and solved this in a different way.
    time="2025-02-06T000356Z" level=fatal msg="failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: fork/exec /usr/bin/nerdctl: no such file or directory: unknown"