https://linen.dev logo
#ask-ai
Title
# ask-ai
v

Vasiliy Knizhnikov

07/26/2021, 1:46 PM
Hey guys! Need some help 🙏 I’m developing a custom source connector and have source data structure represented as JSON:
Copy code
{
   "visitId":"xxx",
   "visitStartTime":"xxx",
   "date":"xxx",
   "geoNetwork":{
      "country":"xxx"
   },
   "hits":[
      {
         "page":{
            "pagePath":"xxx",
            "hostname":"xxx,
            "pageTitle":"xxx"
         }
      }
   ]
}
Corresponding catalog for the schema:
Copy code
{
  "streams": [
    {
      "sync_mode": "full_refresh",
      "destination_sync_mode": "overwrite",
      "stream": {
        "name": "data",
        "json_schema": {
          "type": "object",
          "properties": {
            "visitId": {
              "type": "number"
            },
            "visitStartTime": {
              "type": "number"
            },
            "date": {
              "type": "string"
            },
            "geoNetwork": {
              "type": "object",
              "properties": {
                "country": {
                  "type": "string"
                }
              }
            },
            "hits": {
              "type": "array",
              "properties": {
                "page": {
                  "type": "object",
                  "properties": {
                    "pagePath": {
                      "type": "string"
                    },
                    "hostname": {
                      "type": "string"
                    },
                    "pageTitle": {
                      "type": "string"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  ]
}
So effectively,
hits
property is an array where one of properties is an object. I’m struggling with normalization here. Basic normalization works fine for flat nested objects like geoNetwork, but
hits
table not even being created Do I need to create a custom dbt transform in order to normalize records like these or there’s other way to do this? Thank you!
1
c

Chris (deprecated profile)

07/26/2021, 1:49 PM
can you share logs of your sync/normalization run? (fyi you could post this kind of question in #troubleshooting which is more appropriate than random)
v

Vasiliy Knizhnikov

07/26/2021, 1:58 PM
Hi Chris! Thank you, this is my first message here, so was not sure which channel is the best for this :) Have resent it there! Here’s the sync log
Copy code
2021-07-26 13:16:26 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):200 - attempt summaries: [io.airbyte.config.ReplicationOutput@6a800ce8[replicationAttemptSummary=io.airbyte.config.ReplicationAttemptSummary@4c0a83a[status=completed,recordsSynced=5,bytesSynced=1738,startTime=1627305382981,endTime=1627305386535],state=io.airbyte.config.State@583a932c[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4003e474[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@1cae0995[stream=io.airbyte.protocol.models.AirbyteStream@7b8a9e41[name=google_analytics_data,jsonSchema={"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"date":{"type":"string"},"hits":{"type":"array","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"page":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"hostname":{"type":"string"},"pagePath":{"type":"string"},"pageTitle":{"type":"string"}}}}},"visitId":{"type":"number"},"clientId":{"type":"string"},"geoNetwork":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"country":{"type":"string"}}},"fullVisitorId":{"type":"string"},"visitStartTime":{"type":"number"}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[visitId],destinationSyncMode=append,primaryKey=[],additionalProperties={}]],additionalProperties={}]]]
2021-07-26 13:16:26 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):201 - sync summary: io.airbyte.config.StandardSyncOutput@22f6166f[standardSyncSummary=io.airbyte.config.StandardSyncSummary@d21ece6[status=completed,recordsSynced=5,bytesSynced=1738,startTime=1627305382981,endTime=1627305386535],state=io.airbyte.config.State@583a932c[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4003e474[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@1cae0995[stream=io.airbyte.protocol.models.AirbyteStream@7b8a9e41[name=google_analytics_data,jsonSchema={"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"date":{"type":"string"},"hits":{"type":"array","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"page":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"hostname":{"type":"string"},"pagePath":{"type":"string"},"pageTitle":{"type":"string"}}}}},"visitId":{"type":"number"},"clientId":{"type":"string"},"geoNetwork":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"country":{"type":"string"}}},"fullVisitorId":{"type":"string"},"visitStartTime":{"type":"number"}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[visitId],destinationSyncMode=append,primaryKey=[],additionalProperties={}]],additionalProperties={}]]
2021-07-26 13:16:26 INFO () TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.26.4-alpha
2021-07-26 13:16:26 INFO () DefaultNormalizationWorker(run):61 - Running normalization.
2021-07-26 13:16:26 INFO () LineGobbler(voidCall):85 - Checking if airbyte/normalization:0.1.33 exists...
2021-07-26 13:16:26 INFO () LineGobbler(voidCall):85 - airbyte/normalization:0.1.33 was found locally.
2021-07-26 13:16:26 INFO () DockerProcessFactory(create):127 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/23/0/normalize --network host airbyte/normalization:0.1.33 run --integration-type postgres --config destination_config.json --catalog destination_catalog.json
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Running: transform-config --config destination_config.json --integration-type postgres --out /data/23/0/normalize
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Namespace(config='destination_config.json', integration_type=<DestinationType.postgres: 'postgres'>, out='/data/23/0/normalize')
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - transform_postgres
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Running: transform-catalog --integration-type postgres --profile-config-dir /data/23/0/normalize --catalog destination_catalog.json --out /data/23/0/normalize/models/generated/ --json-column _airbyte_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Processing destination_catalog.json...
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab1.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab2.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab3.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_tables/public/google_analytics_data.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab1.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab2.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab3.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_tables/public/google_analytics_data_geonetwork.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:29 INFO () LineGobbler(voidCall):85 - Running with dbt=0.19.1
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - [[33mWARNING[0m]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - There are 1 unused configuration paths:
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - - models.airbyte_utils.generated.airbyte_views
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - Found 8 models, 0 tests, 0 snapshots, 0 analyses, 364 macros, 0 operations, 0 seed files, 1 source, 0 exposures
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | Concurrency: 32 threads (target='prod')
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 |
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 1 of 2 START table model public.google_analytics_data........................................................ [RUN]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 1 of 2 OK created table model public.google_analytics_data................................................... [[32mSELECT 5[0m in 0.13s]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 2 of 2 START table model public.google_analytics_data_geonetwork............................................. [RUN]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 2 of 2 OK created table model public.google_analytics_data_geonetwork........................................ [[32mSELECT 5[0m in 0.07s]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 |
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | Finished running 2 table models in 0.46s.
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - [32mCompleted successfully[0m
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
2021-07-26 13:16:31 INFO () DefaultNormalizationWorker(run):76 - Normalization executed in 0.
2021-07-26 13:16:31 INFO () TemporalAttemptExecution(get):133 - Stopping cancellation check scheduling...
o

Oleksandr Bazarnov [GL]

07/26/2021, 2:02 PM
You should define
items
for the
array
of
hits
like this
Copy code
{
  "hits": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "page": {
          "type": "object",
          "properties": {
            "pagePath": {
              "type": "string"
            },
            "hostname": {
              "type": "string"
            },
            "pageTitle": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}
👍 1
v

Vasiliy Knizhnikov

07/26/2021, 3:14 PM
@Oleksandr Bazarnov [GL] after cpl of fixes in my schema and addition of yours suggested example, this worked out! 👍 Thank you and @Chris (deprecated profile) so much for help!
c

Chris (deprecated profile)

07/26/2021, 3:15 PM
Nice!
o

Oleksandr Bazarnov [GL]

07/26/2021, 6:44 PM
Glad you solved it)
4 Views