agreeable-hamburger-38305
10/05/2021, 11:11 PMagreeable-hamburger-38305
10/06/2021, 5:39 AMboundless-room-44377
10/06/2021, 8:45 PMdef make_lineage_mce(
upstream_urns: List[str],
downstream_urn: str,
lineage_type: str = DatasetLineageTypeClass.TRANSFORMED,
) -> MetadataChangeEventClass:
mce = MetadataChangeEventClass(
proposedSnapshot=DatasetSnapshotClass(
urn=downstream_urn,
aspects=[
UpstreamLineageClass(
upstreams=[
UpstreamClass(
dataset=upstream_urn,
type=lineage_type,
)
for upstream_urn in upstream_urns
]
)
],
)
)
return mce
it looks like a DatasetSnapshotClass
is required when defining the downstream URN, while I would. like to be able to set that as an ML Model or even FeatureTablerough-eye-60206
10/06/2021, 9:16 PMbetter-orange-49102
10/07/2021, 7:20 AM/prod/source/dataset_name
, i'm just wondering if there are any implications for renaming it such that it becomes /prod/source/x
, where x is not a short string that is not necessarily unique. I don't see any issues navigating to the datasets if i were to use the same value for x for all datasets.agreeable-hamburger-38305
10/07/2021, 5:43 PMsource:
type: bigquery-usage
config:
env: "DEV"
projects:
- <project-id>
top_n_queries: 10
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
gentle-father-80172
10/07/2021, 10:40 PMLooker
metadata ingestion has a nice set of permissions so that I can query the API. However, LookML
ingestion via API requires Admin level access. Is there a specific reason for this? Trying to get Admin access from my boss so any extra info can be useful! Thanks! 😄calm-sunset-28996
10/08/2021, 12:35 PMclean-piano-28976
10/08/2021, 2:23 PMLooker
and LookerML
)?numerous-yak-58823
10/08/2021, 7:43 PMmammoth-lawyer-49919
10/11/2021, 3:00 AMbland-orange-13353
10/11/2021, 7:56 AMwitty-keyboard-20400
10/11/2021, 12:57 PMOperationFailure: BSONObj size: 17365481 (0x108F9E9) is invalid. Size must be between 0 and 16793600(16MB)
Is there any configuration which let's the Datahub skip the document with larger field values ?agreeable-hamburger-38305
10/11/2021, 8:20 PMfresh-fish-73471
10/11/2021, 9:09 PMrough-eye-60206
10/11/2021, 10:15 PMwitty-keyboard-20400
10/12/2021, 5:33 AMnice-planet-17111
10/12/2021, 6:21 AMCloudSQL
(on GCP) -> datahub
? If it is, is there any quickstart guide or docs? Thanks in advance 🙂bumpy-activity-74405
10/12/2021, 11:41 AMn
last versions of aspects? Mysql is growing rapidly in size and I don’t really have any use for old aspectsnice-planet-17111
10/12/2021, 12:35 PMboundless-room-44377
10/12/2021, 6:16 PMagreeable-hamburger-38305
10/12/2021, 8:21 PMMin
, Max
, Mean
, Median
and Standard Deviation
, while all the others just show “unknown”. Null
and Distinct
stats and sample values
are all working fine. Anyone know what might be causing this? The columns with missing stats have a high % of null (99.9x%), but there are still some valid values in there.rapid-piano-43271
10/13/2021, 2:59 AMcuddly-family-62352
10/13/2021, 8:12 AMmelodic-helmet-78607
10/13/2021, 8:55 AMmelodic-helmet-78607
10/13/2021, 8:59 AMrough-eye-60206
10/13/2021, 4:48 PMwitty-keyboard-20400
10/14/2021, 6:38 AMOperationFailure: not authorized on db_kg to execute command { aggregate: "system.views", pipeline: [ { $sample: { size: 100 } } ], allowDiskUse: true, cursor: {}, lsid: { id: UUID("305f9d4f-fd8b-4fbd-8cf6-9257c4399403") }, $clusterTime: { clusterTime: Timestamp(1634193294, 4), signature: { hash: BinData(0, 9B140107B447AC1BFBE704B411400CF7EEF4E04D), keyId: 7012684930027094017 } }, $db: "db_kg", $readPreference: { mode: "primaryPreferred" } }, full error: {'operationTime': Timestamp(1634193295, 1), 'ok': 0.0, 'errmsg': 'not authorized on db_kg to execute command { aggregate: "system.views", pipeline: [ { $sample: { size: 100 } } ], allowDiskUse: true, cursor: {}, lsid: { id: UUID("305f9d4f-fd8b-4fbd-8cf6-9257c4399403") }, $clusterTime: { clusterTime: Timestamp(1634193294, 4), signature: { hash: BinData(0, 9B140107B447AC1BFBE704B411400CF7EEF4E04D), keyId: 7012684930027094017 } }, $db: "db_kg", $readPreference: { mode: "primaryPreferred" } }', 'code': 13, 'codeName': 'Unauthorized', '$clusterTime': {'clusterTime': Timestamp(1634193295, 1), 'signature': {'hash': b'\x8f\x98\x0b\x97l\xbd\xab\x96\xcc\x91\x14QQ7\xc8)d\xd7W"', 'keyId': 7012684930027094017}}}
sample size is just 100: schemaSamplingSize: 100
What does ingestion's schema inference need to execute aggregate: "system.views",
?witty-keyboard-20400
10/14/2021, 11:50 AMcollection_pattern.deny
is mentioned in the config section and its sample values.crooked-wolf-53758
10/14/2021, 4:26 PM