https://datahubproject.io logo
#ingestion
Title
# ingestion
h

high-ice-84066

06/03/2022, 5:31 PM
Hey all! I'm working on ingesting Stats data from Postgres. I've seen some historic examples of people having had a similar issue. I can see the stats being sent via datahub-cli debug. The dataset is created successfully, however in the UI stats tab remains greyed out. I can see that we are sending the stats. Any advice on where to look next? or how to resolve this issue? cc: @User Will share recipe/logs/results datahub: v0.8.36
datahub --debug ingest -c recipe-postgres.yaml
Copy code
###recipe-postgres.yaml
source:
    type: postgres
    config:
      host_port: xxxx
      database: postgres
      username: postgres
      password: xxx
      include_tables: true
      include_views: true
      profiling:
        enabled: true
        include_field_null_count: true
        include_field_min_value: true
        include_field_max_value: true
        include_field_mean_value: true
        include_field_median_value: false
  sink:
    type: datahub-rest
    config:
      server: 'xxxx'
      token: 'xxx'
Copy code
curl -X POST -H 'User-Agent: python-requests/2.26.0' -H 'Accept-Encoding: gzip, deflate' \
-H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' \
-H 'Content-Type: application/json' -H 'Authorization: Bearer xxx' \
--data '{"proposal": {"entityType": "dataset", "entityUrn": "urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,DEV)", "changeType": "UPSERT", "aspectName": "datasetProfile", "aspect": {"value": "{\"timestampMillis\": 1654273217172, \"partitionSpec\": {\"type\": \"FULL_TABLE\", \"partition\": \"FULL_TABLE_SNAPSHOT\"}, \"rowCount\": 20412, \"columnCount\": 3, \"fieldProfiles\": [{\"fieldPath\": \"id\", \"uniqueCount\": 20412, \"uniqueProportion\": 1.0, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\", \"10\", \"11\", \"12\", \"13\", \"14\", \"15\", \"16\", \"17\", \"18\", \"19\", \"20\"]}, {\"fieldPath\": \"code\", \"uniqueCount\": 20412, \"uniqueProportion\": 1.0, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"111\", \"123\", \"135\", \"147\", \"159\", \"171\", \"183\", \"195\", \"207\", \"219\", \"231\", \"243\", \"255\", \"267\", \"279\", \"291\", \"303\", \"315\", \"327\", \"339\"]}, {\"fieldPath\": \"name\", \"uniqueCount\": 2, \"uniqueProportion\": 9.798157946306095e-05, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\"]}]}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1654273219726, "runId": "postgres-2022_06_03-09_20_03"}}}' '<https://xxxx/aspects?action=ingestProposal>'
Copy code
##JSON BODY##
{
    "proposal": {
        "entityType": "dataset",
        "entityUrn": "urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,DEV)",
        "changeType": "UPSERT",
        "aspectName": "datasetProfile",
        "aspect": {
            "value": "{\"timestampMillis\": 1654273217172, \"partitionSpec\": {\"type\": \"FULL_TABLE\", \"partition\": \"FULL_TABLE_SNAPSHOT\"}, \"rowCount\": 20412, \"columnCount\": 3, \"fieldProfiles\": [{\"fieldPath\": \"id\", \"uniqueCount\": 20412, \"uniqueProportion\": 1.0, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\", \"10\", \"11\", \"12\", \"13\", \"14\", \"15\", \"16\", \"17\", \"18\", \"19\", \"20\"]}, {\"fieldPath\": \"code\", \"uniqueCount\": 20412, \"uniqueProportion\": 1.0, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"111\", \"123\", \"135\", \"147\", \"159\", \"171\", \"183\", \"195\", \"207\", \"219\", \"231\", \"243\", \"255\", \"267\", \"279\", \"291\", \"303\", \"315\", \"327\", \"339\"]}, {\"fieldPath\": \"name\", \"uniqueCount\": 2, \"uniqueProportion\": 9.798157946306095e-05, \"nullCount\": 0, \"nullProportion\": 0.0, \"sampleValues\": [\"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\", \"testA\", \"testB\"]}]}",
            "contentType": "application/json"
        },
        "systemMetadata": {
            "lastObserved": 1654273219726,
            "runId": "postgres-2022_06_03-09_20_03"
        }
    }
}
Copy code
## Nested proposal.aspect.value json pretty
{
    "timestampMillis": 1654273217172,
    "partitionSpec": {
        "type": "FULL_TABLE",
        "partition": "FULL_TABLE_SNAPSHOT"
    },
    "rowCount": 20412,
    "columnCount": 3,
    "fieldProfiles": [
        {
            "fieldPath": "id",
            "uniqueCount": 20412,
            "uniqueProportion": 1.0,
            "nullCount": 0,
            "nullProportion": 0.0,
            "sampleValues": [
                "1",
                "2",
                "3",
                "4",
                "5",
                "6",
                "7",
                "8",
                "9",
                "10",
                "11",
                "12",
                "13",
                "14",
                "15",
                "16",
                "17",
                "18",
                "19",
                "20"
            ]
        },
        {
            "fieldPath": "code",
            "uniqueCount": 20412,
            "uniqueProportion": 1.0,
            "nullCount": 0,
            "nullProportion": 0.0,
            "sampleValues": [
                "111",
                "123",
                "135",
                "147",
                "159",
                "171",
                "183",
                "195",
                "207",
                "219",
                "231",
                "243",
                "255",
                "267",
                "279",
                "291",
                "303",
                "315",
                "327",
                "339"
            ]
        },
        {
            "fieldPath": "name",
            "uniqueCount": 2,
            "uniqueProportion": 9.798157946306095e-05,
            "nullCount": 0,
            "nullProportion": 0.0,
            "sampleValues": [
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB",
                "testA",
                "testB"
            ]
        }
    ]
}
GMS pod logs
Copy code
16:07:49.723 [qtp1830908236-16] INFO  c.l.m.r.entity.AspectResource:76 - GET ASPECT urn: urn:li:telemetry:clientId aspect: telemetryClientId version: 0
16:07:49.727 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /aspects/urn%3Ali%3Atelemetry%3AclientId?aspect=telemetryClientId&version=0 - get - 200 - 4ms
16:07:57.769 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=containerProperties, systemMetadata={lastObserved=1654272477712, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:0202f800c992262c01ae6bbd5ee313f7, entityType=container, aspect={contentType=application/json, value=ByteString(length=110,bytes=7b226375...6573227d)}, changeType=UPSERT}
16:07:57.781 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 12ms
16:07:57.895 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=dataPlatformInstance, systemMetadata={lastObserved=1654272477714, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:0202f800c992262c01ae6bbd5ee313f7, entityType=container, aspect={contentType=application/json, value=ByteString(length=44,bytes=7b22706c...6573227d)}, changeType=UPSERT}
16:07:57.906 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 11ms
16:07:58.012 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=subTypes, systemMetadata={lastObserved=1654272477714, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:0202f800c992262c01ae6bbd5ee313f7, entityType=container, aspect={contentType=application/json, value=ByteString(length=27,bytes=7b227479...65225d7d)}, changeType=UPSERT}
16:07:58.023 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 11ms
16:07:58.129 [qtp1830908236-15] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=containerProperties, systemMetadata={lastObserved=1654272477904, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:a208486b83be39fa411922e07701d984, entityType=container, aspect={contentType=application/json, value=ByteString(length=128,bytes=7b226375...6963227d)}, changeType=UPSERT}
16:07:58.141 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 12ms
16:07:58.246 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=dataPlatformInstance, systemMetadata={lastObserved=1654272477904, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:a208486b83be39fa411922e07701d984, entityType=container, aspect={contentType=application/json, value=ByteString(length=44,bytes=7b22706c...6573227d)}, changeType=UPSERT}
16:07:58.255 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 9ms
16:07:58.364 [qtp1830908236-15] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=subTypes, systemMetadata={lastObserved=1654272477904, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:a208486b83be39fa411922e07701d984, entityType=container, aspect={contentType=application/json, value=ByteString(length=25,bytes=7b227479...61225d7d)}, changeType=UPSERT}
16:07:58.375 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 11ms
16:07:58.488 [qtp1830908236-15] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=container, systemMetadata={lastObserved=1654272477905, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:container:a208486b83be39fa411922e07701d984, entityType=container, aspect={contentType=application/json, value=ByteString(length=66,bytes=7b22636f...6637227d)}, changeType=UPSERT}
16:07:58.509 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 21ms
16:07:58.669 [pool-4-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
16:07:58.899 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=container, systemMetadata={lastObserved=1654272478849, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.accounts_master,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=66,bytes=7b22636f...3834227d)}, changeType=UPSERT}
16:07:58.919 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 20ms
16:07:59.032 [qtp1830908236-198] INFO  c.l.metadata.entity.EntityService:1115 - INGEST urn urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.accounts_master,PROD) with system metadata {lastObserved=1654272478850, runId=postgres-2022_06_03-09_07_49}
16:07:59.044 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /entities?action=ingest - ingest - 200 - 12ms
16:07:59.153 [qtp1830908236-15] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=subTypes, systemMetadata={lastObserved=1654272478850, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.accounts_master,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=24,bytes=7b227479...65225d7d)}, changeType=UPSERT}
16:07:59.172 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 19ms
16:07:59.772 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=container, systemMetadata={lastObserved=1654272479700, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=66,bytes=7b22636f...3834227d)}, changeType=UPSERT}
16:07:59.792 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 20ms
16:07:59.908 [qtp1830908236-198] INFO  c.l.metadata.entity.EntityService:1115 - INGEST urn urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,PROD) with system metadata {lastObserved=1654272479700, runId=postgres-2022_06_03-09_07_49}
16:07:59.918 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /entities?action=ingest - ingest - 200 - 11ms
16:08:00.379 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=subTypes, systemMetadata={lastObserved=1654272479701, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=24,bytes=7b227479...65225d7d)}, changeType=UPSERT}
16:08:00.400 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 21ms
16:08:02.740 [qtp1830908236-15] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=datasetProfile, systemMetadata={lastObserved=1654272482685, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.accounts_master,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=565,bytes=7b227469...5d7d5d7d)}, changeType=UPSERT}
16:08:02.755 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 15ms
16:08:03.092 [qtp1830908236-198] INFO  c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=datasetProfile, systemMetadata={lastObserved=1654272483025, runId=postgres-2022_06_03-09_07_49}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:postgres,postgres.public.datahub_sample_profile_data,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=991,bytes=7b227469...5d7d5d7d)}, changeType=UPSERT}
16:08:03.108 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 16ms
16:08:03.214 [I/O dispatcher 1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/_bulk?timeout=1m>] returned 1 warnings: [299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html> to enable security."]
16:08:03.214 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:28 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
3 Views