Vasiliy Knizhnikov
07/26/2021, 1:46 PM{
"visitId":"xxx",
"visitStartTime":"xxx",
"date":"xxx",
"geoNetwork":{
"country":"xxx"
},
"hits":[
{
"page":{
"pagePath":"xxx",
"hostname":"xxx,
"pageTitle":"xxx"
}
}
]
}
Corresponding catalog for the schema:
{
"streams": [
{
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite",
"stream": {
"name": "data",
"json_schema": {
"type": "object",
"properties": {
"visitId": {
"type": "number"
},
"visitStartTime": {
"type": "number"
},
"date": {
"type": "string"
},
"geoNetwork": {
"type": "object",
"properties": {
"country": {
"type": "string"
}
}
},
"hits": {
"type": "array",
"properties": {
"page": {
"type": "object",
"properties": {
"pagePath": {
"type": "string"
},
"hostname": {
"type": "string"
},
"pageTitle": {
"type": "string"
}
}
}
}
}
}
}
}
}
]
}
So effectively, hits
property is an array where one of properties is an object.
I’m struggling with normalization here. Basic normalization works fine for flat nested objects like geoNetwork, but hits
table not even being created
❓ Do I need to create a custom dbt transform in order to normalize records like these or there’s other way to do this?
Thank you!Chris (deprecated profile)
Vasiliy Knizhnikov
07/26/2021, 1:58 PMVasiliy Knizhnikov
07/26/2021, 1:58 PM2021-07-26 13:16:26 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):200 - attempt summaries: [io.airbyte.config.ReplicationOutput@6a800ce8[replicationAttemptSummary=io.airbyte.config.ReplicationAttemptSummary@4c0a83a[status=completed,recordsSynced=5,bytesSynced=1738,startTime=1627305382981,endTime=1627305386535],state=io.airbyte.config.State@583a932c[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4003e474[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@1cae0995[stream=io.airbyte.protocol.models.AirbyteStream@7b8a9e41[name=google_analytics_data,jsonSchema={"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"date":{"type":"string"},"hits":{"type":"array","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"page":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"hostname":{"type":"string"},"pagePath":{"type":"string"},"pageTitle":{"type":"string"}}}}},"visitId":{"type":"number"},"clientId":{"type":"string"},"geoNetwork":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"country":{"type":"string"}}},"fullVisitorId":{"type":"string"},"visitStartTime":{"type":"number"}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[visitId],destinationSyncMode=append,primaryKey=[],additionalProperties={}]],additionalProperties={}]]]
2021-07-26 13:16:26 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):201 - sync summary: io.airbyte.config.StandardSyncOutput@22f6166f[standardSyncSummary=io.airbyte.config.StandardSyncSummary@d21ece6[status=completed,recordsSynced=5,bytesSynced=1738,startTime=1627305382981,endTime=1627305386535],state=io.airbyte.config.State@583a932c[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@4003e474[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@1cae0995[stream=io.airbyte.protocol.models.AirbyteStream@7b8a9e41[name=google_analytics_data,jsonSchema={"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"date":{"type":"string"},"hits":{"type":"array","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"page":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"hostname":{"type":"string"},"pagePath":{"type":"string"},"pageTitle":{"type":"string"}}}}},"visitId":{"type":"number"},"clientId":{"type":"string"},"geoNetwork":{"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"country":{"type":"string"}}},"fullVisitorId":{"type":"string"},"visitStartTime":{"type":"number"}}},supportedSyncModes=[full_refresh],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[visitId],destinationSyncMode=append,primaryKey=[],additionalProperties={}]],additionalProperties={}]]
2021-07-26 13:16:26 INFO () TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.26.4-alpha
2021-07-26 13:16:26 INFO () DefaultNormalizationWorker(run):61 - Running normalization.
2021-07-26 13:16:26 INFO () LineGobbler(voidCall):85 - Checking if airbyte/normalization:0.1.33 exists...
2021-07-26 13:16:26 INFO () LineGobbler(voidCall):85 - airbyte/normalization:0.1.33 was found locally.
2021-07-26 13:16:26 INFO () DockerProcessFactory(create):127 - Preparing command: docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/23/0/normalize --network host airbyte/normalization:0.1.33 run --integration-type postgres --config destination_config.json --catalog destination_catalog.json
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Running: transform-config --config destination_config.json --integration-type postgres --out /data/23/0/normalize
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Namespace(config='destination_config.json', integration_type=<DestinationType.postgres: 'postgres'>, out='/data/23/0/normalize')
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - transform_postgres
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Running: transform-catalog --integration-type postgres --profile-config-dir /data/23/0/normalize --catalog destination_catalog.json --out /data/23/0/normalize/models/generated/ --json-column _airbyte_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Processing destination_catalog.json...
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab1.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab2.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_ab3.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_tables/public/google_analytics_data.sql from google_analytics_data
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab1.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab2.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_ctes/public/google_analytics_data_geonetwork_ab3.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:27 INFO () LineGobbler(voidCall):85 - Generating airbyte_tables/public/google_analytics_data_geonetwork.sql from google_analytics_data/geoNetwork
2021-07-26 13:16:29 INFO () LineGobbler(voidCall):85 - Running with dbt=0.19.1
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - [[33mWARNING[0m]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - There are 1 unused configuration paths:
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - - models.airbyte_utils.generated.airbyte_views
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 - Found 8 models, 0 tests, 0 snapshots, 0 analyses, 364 macros, 0 operations, 0 seed files, 1 source, 0 exposures
2021-07-26 13:16:30 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | Concurrency: 32 threads (target='prod')
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 |
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 1 of 2 START table model public.google_analytics_data........................................................ [RUN]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 1 of 2 OK created table model public.google_analytics_data................................................... [[32mSELECT 5[0m in 0.13s]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 2 of 2 START table model public.google_analytics_data_geonetwork............................................. [RUN]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | 2 of 2 OK created table model public.google_analytics_data_geonetwork........................................ [[32mSELECT 5[0m in 0.07s]
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 |
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - 13:16:31 | Finished running 2 table models in 0.46s.
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - [32mCompleted successfully[0m
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 -
2021-07-26 13:16:31 INFO () LineGobbler(voidCall):85 - Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
2021-07-26 13:16:31 INFO () DefaultNormalizationWorker(run):76 - Normalization executed in 0.
2021-07-26 13:16:31 INFO () TemporalAttemptExecution(get):133 - Stopping cancellation check scheduling...
Oleksandr Bazarnov [GL]
07/26/2021, 2:02 PMitems
for the array
of hits
like this
{
"hits": {
"type": "array",
"items": {
"type": "object",
"properties": {
"page": {
"type": "object",
"properties": {
"pagePath": {
"type": "string"
},
"hostname": {
"type": "string"
},
"pageTitle": {
"type": "string"
}
}
}
}
}
}
}
Vasiliy Knizhnikov
07/26/2021, 3:14 PMChris (deprecated profile)
Oleksandr Bazarnov [GL]
07/26/2021, 6:44 PM