xiang chen
08/07/2022, 8:22 AMZach Brak
08/09/2022, 3:29 PMbig_query_array
into the schema, therefore corrupting the source schema.
• There is a pending fix on this, but has been stalled as the schema change can impact existing running connections and require data resets.
• This fix as well only removes big_query_array
schema values from top-level objects but not all that may be nested lower down.
We also look at the action of bigquery-denormalized connector with GCS staging:
• Read data from application
• Load to GCS as Avro
• Upload Avro to BigQuery
So leading me to think - why would I rely on the Avro interpretation of the source schema, if I feel it's not aligning with my source schema?
Our newer approach:
• Use GCS destination over the bigquery-denormalized destination, loading JSON files direct to GCS bucket.
• Create external table definitions on BigQuery to read from the GCS bucket.
◦ This then allows in options to interpret or declare a schema
◦ You can also set ignore_unknown_values
to true
, allowing for reading across changing schemas.
◦ Multiple table definitions can be made on the same source data to serve different purposes.
◦ Should allow more granular control over managing historical data.
The main challenge with this newer approach is being selective on what parts of your GCS bucket to be efficient on querying the objects. To solve this I have been rapidly creating and destroying external definitions to only look at the last day or a few days of return.
This is why I would love to have the hive partition spec be an option for the upload file path, as it would solve for reading only portions of the bucket, and would effectively be a partition filter for the entire collection.
Appreciate any comments, questions, suggestions here.
I'll say we've had decent success in the past week and fewer load failures overall taking this approach.Don H
08/11/2022, 5:41 PMJordan Fox
08/16/2022, 5:05 PMkylashpriya NA
08/18/2022, 8:56 AMalpha
. From this point i can absolutely not recommend to use an alpha verions in production environments. Was it carefully evaluated, tested, decided to really use it for a production environment? Here you can find what is an alpha versions and what are the risks: Software release life cycle
Could someone helps us with the above? Is that still in alpha phase or we shall try with “stable” release?
We have passed setup documentation page as : https://docs.airbyte.com/quickstart/deploy-airbyte/?_ga=2.89522395.1160840054.1659428690-1169156739.1659428688
Thanks in advance!Lenin Mishra
08/19/2022, 12:26 PMRamesh Shanmugam
08/23/2022, 10:23 PMIgnacio Reyna
08/24/2022, 12:28 AM/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: IPv6 listen already enabled
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
20-envsubst-on-templates.sh: Running envsubst on /etc/nginx/templates/default.conf.template to /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2022/08/24 00:22:26 [emerg] 1#1: unknown "is_demo" variable
nginx: [emerg] unknown "is_demo" variable
I wanted to take a look at the source code so I went to the repo and I saw that the last commit removed demo mode from UI. Could this commit be related to my error?Daniel Meyer
08/24/2022, 7:11 AMRamon Vermeulen
08/24/2022, 7:47 AMGopal Chand
08/24/2022, 10:50 AMRolex
08/25/2022, 1:20 PMRolex
08/25/2022, 1:21 PMAndre Gallo
08/27/2022, 12:23 AMFederico Cipriani Corvalan
08/31/2022, 2:20 AMFederico Cipriani Corvalan
09/02/2022, 2:12 AMNORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST
is there anything about it anywhere?Craig Condie
09/12/2022, 4:56 PMHakeem Olu
09/14/2022, 4:58 PMDominik Mall
09/15/2022, 12:02 PMhelm search repo <name>
it’s missing the <name>/airbyte
chart. The github repo seems to have been updated ~2 hours ago, which I guess broke something, is there a way that I can use the previous version when doing helm repo add …
?Pedro Manuel
09/15/2022, 1:24 PMIhor Konovalenko
09/15/2022, 1:48 PMeksctl
command line utility. Does exist some Terraform
module (or just script) that creates all needed stuff to deploy Airbyte in EKS?Andrii Zelinskyi
09/16/2022, 1:00 PM/v1/sources/get
Airbyte API call and trying to get the values in the connectionConfiguration
which in spec.json are marked as "airbyte_secret": true
.
Is it any way to get the value of the airbyte_secret
property via API instead of '**********'
? Or I have to update all Source connectors by removing "airbyte_secret": true
?Don H
09/16/2022, 3:15 PMhelm install <name> airbyte/airbyte
It starts all the services but the the server fails to start with the following message.
2022-09-16 15:05:10 ERROR i.a.w.WorkerApp(main):592 - Worker app failed
java.lang.IllegalArgumentException: 'INTERNAL_API_HOST' environment variable cannot be null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220) ~[guava-31.0.1-jre.jar:?]
at io.airbyte.config.EnvConfigs.getEnsureEnv(EnvConfigs.java:1107) ~[io.airbyte.airbyte-config-config-models-0.40.6.jar:?]
at io.airbyte.config.EnvConfigs.getAirbyteApiHost(EnvConfigs.java:490) ~[io.airbyte.airbyte-config-config-models-0.40.6.jar:?]
at io.airbyte.workers.WorkerApiClientFactoryImpl.<init>(WorkerApiClientFactoryImpl.java:35) ~[io.airbyte-airbyte-workers-0.40.6.jar:?]
at io.airbyte.workers.WorkerApp.initializeCommonDependencies(WorkerApp.java:442) ~[io.airbyte-airbyte-workers-0.40.6.jar:?]
at io.airbyte.workers.WorkerApp.main(WorkerApp.java:578) [io.airbyte-airbyte-workers-0.40.6.jar:?]
I can see that in airbyte/charts/airbyte/templates/env-configmap.yaml it is supposed to be set with the following value:
INTERNAL_API_HOST: {{ .Release.Name }}-server-svc:{{ .Values.server.service.port }}
Any insight into why I am seeing this issue and how to proceed?
ThanksZaza Javakhishvili
09/17/2022, 12:48 AMAlbert Lie
09/18/2022, 2:03 AMJovan Sakovic
09/21/2022, 6:55 PMlast_modified_at
type of column, and then using the start time of the pipeline at runtime. Unfortunately, not all of our tables have a timestamp column that would allow this…
Note that these tables do have an auto-increment id
field, so technically a form of incremental extracts is possible if we would save that state somewhere. 💡
I believe we bounced ideas around being hacky with the AWS data pipelines by fetching the last loaded id of the table from DynamoDB (or anything else for that matter) and injecting into the Data Pipeline (DP) parameter value, but that’s out of bounds of what this DP service can do octavia rolling eyes
Bottom line Q: Ideas for dumping MySQL data into S3, reproducible across multiple DB’s, and allowing for incremental updates.
If you’ve read this far, thanks a bunch, I appreciate your time! ♥️
Here for any additional questions, context or feedback 🙏Göktuğ Aşcı
09/28/2022, 5:06 PMterekete
09/29/2022, 1:17 PMERRO[0000] json-file logging specified but not supported. Choosing k8s-file logging instead
podman start -a airbyte-db
2022-09-29 05:45:50 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable SHOULD_RUN_SYNC_WORKFLOWS: 'true'
2022-09-29 05:45:50 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable WORKER_PLANE: 'CONTROL_PLANE'
2022-09-29 05:45:50 INFO c.z.h.HikariDataSource(<init>):80 - HikariPool-1 - Starting...
Exception in thread "main" java.lang.RuntimeException: Driver org.postgresql.Driver claims to not accept jdbcUrl, ${CONFIG_DATABASE_URL:-}
at com.zaxxer.hikari.util.DriverDataSource.<init>(DriverDataSource.java:110)
at com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:326)
at com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:112)
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:93)
at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
at io.airbyte.db.factory.DataSourceFactory$DataSourceBuilder.build(DataSourceFactory.java:304)
at io.airbyte.db.factory.DataSourceFactory.create(DataSourceFactory.java:40)
at io.airbyte.bootloader.BootloaderApp.main(BootloaderApp.java:224)
exit code: 1
ERRO[0000] json-file logging specified but not supported. Choosing k8s-file logging instead
Don H
09/29/2022, 7:56 PMhelm install airbyte-helm airbyte/airbyte --set webapp.service.type=NodePort
This "works", but it requires me to keep track of the node's DNS name, and if that node is replaced by another in the cluster I don't think it would work anymore.
curl --location --request POST 'ip-xx-x-x-xx.ec2.internal:30334/api/v1/workspaces/list'
I am new to K8s and tried to use ingress as an option to see what the results were.
helm install --values ../airbyte/charts/airbyte/test-values.yaml airbyte-helm airbyte/airbyte
where test-values.yaml looked like this
webapp:
ingress:
enabled: true
className: ""
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
However, I received the following error (looks like an issue in ingress.yaml):
Error: INSTALLATION FAILED: template: airbyte/charts/webapp/templates/ingress.yaml:53:33: executing "airbyte/charts/webapp/templates/ingress.yaml" at <.Release.Name>: nil pointer evaluating interface {}.Name
I will be deploying this cluster via AWS CDK and need to know how to reach the webapp at the time of deployment. This is a requirement since I will pass the hostname down to other CDK stacks to reference. Using NodePort will work because I can get the node's address from the cluster. But my concern about the nodes changing in the future still exists. If I use a loadbalancer, I will need to know the hostname before it is deployed. The service type LoadBalancer may work, but it will generate the hostname when helm runs and I would need to query the cluster to figure it out.
How do you recommend that I expose that service?
Thanks in advance I know there is a lot to that question.tanuj soni
09/30/2022, 9:26 AM