Bassem Ben Jemaa
09/25/2022, 8:42 PMTiri Georgiou
09/26/2022, 7:33 AMsalesforce connector
. We notice every time we sync Case
salesforce object there are significant number of duplicates (i.e. per case_id we could have up to 13 duplicates) in the source table (i.e. _airbyte_raw_case
). It looks like the normalization stage takes care of the deduplication in the transform stage, however the duplication of the raw data is causing significant overhead when loading into a DWH. I was thinking of raising this as an issue on GH but want to first make sure:
1. This is not expected behaviour?
2. Could this be a quick fix and if so does anybody know where in the codebase this issue might be originating from?
ThanksMaykon Lopes
09/26/2022, 11:47 AMChirag Gupta
09/26/2022, 1:50 PMResford Rouzer
09/26/2022, 5:32 PMAirbyteForumPost
09/23/2022, 4:28 PMCaio Henrique
09/27/2022, 1:09 AM2022-09-27 01:05:09 INFO i.a.w.p.KubeProcessFactory(create):100 - Attempting to start pod = source-mysql-check-db3ddf73-2c06-4734-b9b3-566809c430c4-0-rpvnv for airbyte/source-mysql:0.6.14 with resources io.airbyte.config.ResourceRequirements@681e40af[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=]
Someone could help me?Akshay Baura
09/27/2022, 8:41 AMSanjar Baghchehsaraee
09/27/2022, 12:32 PMCreate service account
.
I start to create a service account but on step two (optional) i can select a role (see picture), do i need to choose a role
or can i just leave it? If so, what role do i need to choose?
docs.airbyte.com
Google Analytics 4 (GA4) | Airbyte Documentation
This page guides you through the process of setting up the Google Analytics source connector.Kyle Rosenstein
09/27/2022, 8:22 PM{
"_id": "<airbyte-id>",
"active": true,
"identifier": [
{
"system": "<some-link>",
"value": "<some-value>"
}
],
"managingOrganization": {
"identifier": {
"system": "<some-link>",
"value": "<some-value>"
}
},
"meta": {
"lastUpdated": "<some-date>",
"source": "<some-link>"
}
}
The same document in Snowflake:
{
"_id": "<airbyte-id>",
"active": true,
"identifier": "",
"managingOrganization": {
"identifier": ""
},
"meta": {
"lastUpdated": "<some-date>",
"source": "<some-link>"
}
}
When checking the export logs, I see:
WARN i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$5):339 - Schema validation errors found for stream <stream-name>. Error messages: [$.identifier is of an incorrect type. Expected it to be object, $.managingOrganization.identifier is of an incorrect type. Expected it to be object]
For some reason regardless of the data type, if the field is called “identifier” it is being wiped and replaced as an empty string during export. I was able to double check by duplicating these Documents in the DocDB collection but changing the “identifier” fields to something like “org_identifier” and when I did that, the data was being preserved just find. This makes me think it is not a data type issue but rather that the word “identifier” is either throwing Airbyte off or is some DocumentDB preserved key word. Please let me know if you have seen anything like this, have any ideas, or know of ways to cast these fields during the connection such that I can either fix this for all existing records or get a better sense if this issue is coming from Airbyte or DocDBCraig Condie
09/27/2022, 10:59 PMKrisjan Oldekamp
09/28/2022, 11:20 AMMycchaka Kleinbort
09/28/2022, 1:10 PMTanmay Kulkarni
09/28/2022, 1:13 PMSELECT department, count(distinct employeeId)
FROM Employee
GROUP BY department
Then the result of this query will be dumped on the S3 target.Bruno Agresta González
09/28/2022, 5:09 PMGiovani Freitas
09/29/2022, 12:11 AMSUB_BUILD=PLATFORM ./gradlew build
it starts to build, but after a few minutes it always fails:
FAILURE: Build failed with an exception.
* What went wrong:
Gradle build daemon disappeared unexpectedly (it may have been killed or may have crashed)
I have no idea how to proceed, I searched a lot about this error but because I don't have much knowledge in docker, I'm pretty lost. Can anyone give me a light?Chirag Gupta
09/29/2022, 7:50 AMChirag Gupta
09/29/2022, 7:52 AMDan Siegel
09/29/2022, 12:18 PM2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable AWS_ACCESS_KEY_ID: ''
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable AWS_SECRET_ACCESS_KEY: ''
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable SHOULD_RUN_SYNC_WORKFLOWS: 'true'
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable WORKER_PLANE: 'CONTROL_PLANE'
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable CONFIG_DATABASE_USER: 'airbyte'
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable CONFIG_DATABASE_PASSWORD: '**********'
2022-09-29 00:11:20 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable CONFIG_DATABASE_URL: 'REMOVED for SLACK'
2022-09-29 00:11:21 INFO c.z.h.HikariDataSource(<init>):80 - HikariPool-1 - Starting...
2022-09-29 00:11:21 INFO c.z.h.HikariDataSource(<init>):82 - HikariPool-1 - Start completed.
2022-09-29 00:11:21 INFO c.z.h.HikariDataSource(<init>):80 - HikariPool-2 - Starting...
2022-09-29 00:11:21 INFO c.z.h.HikariDataSource(<init>):82 - HikariPool-2 - Start completed.
2022-09-29 00:11:22 WARN c.n.s.JsonMetaSchema(newValidator):278 - Unknown keyword existingJavaType - you should define your own Meta Schema. If the keyword is irrelevant for valida
2022-09-29 00:11:23 ERROR i.a.s.ServerApp(main):336 - Server failed
java.lang.IllegalArgumentException: null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131) ~[guava-31.0.1-jre.jar:?]
at io.airbyte.config.storage.DefaultS3ClientFactory.validateBase(DefaultS3ClientFactory.java:36) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.storage.MinioS3ClientFactory.validate(MinioS3ClientFactory.java:33) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.storage.MinioS3ClientFactory.<init>(MinioS3ClientFactory.java:27) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.CloudLogs.createCloudLogClient(CloudLogs.java:48) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.LogClientSingleton.createCloudClientIfNull(LogClientSingleton.java:164) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.LogClientSingleton.setWorkspaceMdc(LogClientSingleton.java:151) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.server.ServerApp.getServer(ServerApp.java:177) ~[io.airbyte-airbyte-server-0.40.9.jar:?]
at io.airbyte.server.ServerApp.main(ServerApp.java:333) ~[io.airbyte-airbyte-server-0.40.9.jar:?]
2022-09-29 00:11:23 INFO c.z.h.HikariDataSource(close):350 - HikariPool-1 - Shutdown initiated...
2022-09-29 00:11:23 INFO c.z.h.HikariDataSource(close):352 - HikariPool-1 - Shutdown completed.
2022-09-29 00:11:23 INFO c.z.h.HikariDataSource(close):350 - HikariPool-2 - Shutdown initiated...
2022-09-29 00:11:23 INFO c.z.h.HikariDataSource(close):352 - HikariPool-2 - Shutdown completed.
Worker Pod Log:
___ _ __ __
/ | (_)____/ /_ __ __/ /____
/ /| | / / ___/ __ \/ / / / __/ _ \
/ ___ |/ / / / /_/ / /_/ / /_/ __/
/_/ |_/_/_/ /_.___/\__, /\__/\___/
/____/
: airbyte-workers :
Micronaut (v3.6.3)
2022-09-29 00:14:42 INFO i.m.c.e.DefaultEnvironment(<init>):159 - Established active environments: [k8s, cloud, ec2, control]
2022-09-29 00:14:43 INFO c.z.h.HikariDataSource(<init>):71 - HikariPool-1 - Starting...
2022-09-29 00:14:43 INFO c.z.h.HikariDataSource(<init>):73 - HikariPool-1 - Start completed.
2022-09-29 00:14:43 INFO c.z.h.HikariDataSource(<init>):71 - HikariPool-2 - Starting...
2022-09-29 00:14:43 INFO c.z.h.HikariDataSource(<init>):73 - HikariPool-2 - Start completed.
2022-09-29 00:14:43 INFO i.m.l.PropertiesLoggingLevelsConfigurer(configureLogLevelForPrefix):107 - Setting log level 'DEBUG' for logger: 'io.airbyte.bootloader'
2022-09-29 00:14:44 INFO i.a.w.c.DatabaseBeanFactory(configsDatabaseMigrationCheck):139 - Configs database configuration: removedfromslack
2022-09-29 00:14:44 WARN i.a.a.TrackingClientSingleton(get):30 - Attempting to fetch an initialized track client. Initializing a default one.
2022-09-29 00:14:44 INFO i.a.w.t.TemporalUtils(getTemporalClientWhenConnected):220 - Waiting for temporal server...
2022-09-29 00:14:44 WARN i.a.w.t.TemporalUtils(getTemporalClientWhenConnected):231 - Waiting for namespace default to be initialized in temporal...
2022-09-29 00:14:46 INFO i.t.s.WorkflowServiceStubsImpl(<init>):188 - Created GRPC client for channel: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=1, target=airby
2022-09-29 00:14:51 INFO i.a.w.t.TemporalUtils(getTemporalClientWhenConnected):248 - Temporal namespace default initialized!
2022-09-29 00:14:51 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable AWS_ACCESS_KEY_ID: ''
2022-09-29 00:14:51 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable AWS_SECRET_ACCESS_KEY: ''
2022-09-29 00:14:51 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable SHOULD_RUN_SYNC_WORKFLOWS: 'true'
2022-09-29 00:14:51 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable WORKER_PLANE: 'CONTROL_PLANE'
2022-09-29 00:14:51 WARN i.a.m.l.MetricClientFactory(getMetricClient):46 - MetricClient has not been initialized. Must call MetricClientFactory.CreateMetricClient before using Metr
2022-09-29 00:14:52 INFO i.a.w.ApplicationInitializer(initializeCommonDependencies):174 - Initializing common worker dependencies.
2022-09-29 00:14:52 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable METRIC_CLIENT: ''
2022-09-29 00:14:52 INFO i.a.c.EnvConfigs(getEnvOrDefault):1096 - Using default value for environment variable METRIC_CLIENT: ''
2022-09-29 00:14:52 WARN i.a.m.l.MetricClientFactory(initialize):74 - MetricClient was not recognized or not provided. Accepted values are `datadog` or `otel`.
2022-09-29 00:14:52 ERROR i.m.r.Micronaut(handleStartupException):338 - Error starting Micronaut server: null
java.lang.IllegalArgumentException: null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:131) ~[guava-31.1-jre.jar:?]
at io.airbyte.config.storage.DefaultS3ClientFactory.validateBase(DefaultS3ClientFactory.java:36) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.storage.MinioS3ClientFactory.validate(MinioS3ClientFactory.java:33) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.storage.MinioS3ClientFactory.<init>(MinioS3ClientFactory.java:27) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.CloudLogs.createCloudLogClient(CloudLogs.java:48) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.LogClientSingleton.createCloudClientIfNull(LogClientSingleton.java:164) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.config.helpers.LogClientSingleton.setWorkspaceMdc(LogClientSingleton.java:151) ~[io.airbyte.airbyte-config-config-models-0.40.9.jar:?]
at io.airbyte.workers.ApplicationInitializer.initializeCommonDependencies(ApplicationInitializer.java:180) ~[io.airbyte-airbyte-workers-0.40.9.jar:?]
at io.airbyte.workers.ApplicationInitializer.onApplicationEvent(ApplicationInitializer.java:153) ~[io.airbyte-airbyte-workers-0.40.9.jar:?]
at io.airbyte.workers.ApplicationInitializer.onApplicationEvent(ApplicationInitializer.java:65) ~[io.airbyte-airbyte-workers-0.40.9.jar:?]
at io.micronaut.context.event.ApplicationEventPublisherFactory.notifyEventListeners(ApplicationEventPublisherFactory.java:262) ~[micronaut-inject-3.6.3.jar:3.6.3]
at io.micronaut.context.event.ApplicationEventPublisherFactory.access$200(ApplicationEventPublisherFactory.java:60) ~[micronaut-inject-3.6.3.jar:3.6.3]
at io.micronaut.context.event.ApplicationEventPublisherFactory$2.publishEvent(ApplicationEventPublisherFactory.java:229) ~[micronaut-inject-3.6.3.jar:3.6.3]
at io.micronaut.http.server.netty.NettyHttpServer.lambda$fireStartupEvents$15(NettyHttpServer.java:574) ~[micronaut-http-server-netty-3.6.3.jar:3.6.3]
at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
at io.micronaut.http.server.netty.NettyHttpServer.fireStartupEvents(NettyHttpServer.java:568) ~[micronaut-http-server-netty-3.6.3.jar:3.6.3]
at io.micronaut.http.server.netty.NettyHttpServer.start(NettyHttpServer.java:297) ~[micronaut-http-server-netty-3.6.3.jar:3.6.3]
at io.micronaut.http.server.netty.NettyHttpServer.start(NettyHttpServer.java:104) ~[micronaut-http-server-netty-3.6.3.jar:3.6.3]
at io.micronaut.runtime.Micronaut.lambda$start$2(Micronaut.java:81) ~[micronaut-context-3.6.3.jar:3.6.3]
at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
at io.micronaut.runtime.Micronaut.start(Micronaut.java:79) ~[micronaut-context-3.6.3.jar:3.6.3]
at io.micronaut.runtime.Micronaut.run(Micronaut.java:323) ~[micronaut-context-3.6.3.jar:3.6.3]
at io.micronaut.runtime.Micronaut.run(Micronaut.java:309) ~[micronaut-context-3.6.3.jar:3.6.3]
at io.airbyte.workers.Application.main(Application.java:12) ~[io.airbyte-airbyte-workers-0.40.9.jar:?]
Renan Rigo Calesso
09/29/2022, 5:01 PMCaused by: com.google.cloud.bigquery.BigQueryException: Cannot query rows larger than 100MB limit.
Do you know how I can solve it?Markus Notti
09/29/2022, 10:04 PMRobert Put
09/29/2022, 10:36 PMErik Eppel
09/29/2022, 10:38 PMdocker-compose up -d
. It seems to me that the more straightforward approach would be to simply create your own repo with a Docker Compose file that effectively mirrors the one that ships with the codebase, but the total absence of any mention of this approach makes me suspect I'm overlooking something important.
FYI, I've currently incorporated the docs for AWS into a Packer build that produces an AMI with all of the prerequisite technologies installed, so I'm not blocked by any means. I'm more anticipating the (possibly very obvious) question from my DevOps/SRE team about why I'm using Packer instead of just using Docker.
Any and all information in greatly appreciated.Slackbot
09/30/2022, 6:48 AMJerri Comeau (Airbyte)
Venkat Dasari
09/30/2022, 3:51 PMOpeyemi Daniel
09/30/2022, 4:53 PMSteve Palm
09/30/2022, 6:05 PMChasen Sherman
09/30/2022, 6:36 PM*POST* /v1/source_definitions/delete_custom
but cant figure out how to do this from the UI 😖Simon Thelin
10/01/2022, 6:26 AMincremental append
and my cursor is updated_at
. And I get a new row with PK
id
in a postgres source, it would still append the row with the new id?
And if I do incremental append with let’s say id
PK
will I then loose track of updated values? Since the id could be the same
Assuming this is not CDC
.