brief-ability-41819
10/07/2022, 1:00 PMsource:
type: dbt
config:
aws_connection:
aws_region: us-east-1
aws_role: 'arn:aws:iam::************:role/DataHub-role'
target_platform: s3
manifest_path:
- '<s3://bucket1/manifest.json>'
- '<s3://bucket2/manifest.json>'
test_results_path:
- '<s3://bucket1/run_results.json>'
- '<s3://bucket2/run_results.json>'
sources_path:
- '<s3://bucket1/sources.json>'
- '<s3://bucket2/sources.json>'
catalog_path:
- '<s3://bucket1/catalog.json>'
- '<s3://bucket2/catalog.json>'
Whenever I try to create a list of them (in []
brackets) ingestion stops working and throws:
Failed to configure source (dbt) due to \n'
"\t\t'4 validation errors for DBTConfig\n"
'manifest_path\n'
' str type expected (type=type_error.str)
but it’s perfectly fine with only one bucket configured. Am I missing something in my recipe?alert-fall-82501
10/07/2022, 1:51 PMowners: List[OwnerClass] = self.maybe_extract_owners(
File "/home/kiranto@cybage.com/.local/lib/python3.8/site-packages/datahub/ingestion/source/csv_enricher.py", line 594, in maybe_extract_owners
row["ownership_type"] if row["ownership_type"] else OwnershipTypeClass.NONE
KeyError: 'ownership_type'
[2022-10-07 19:05:26,767] ERROR {datahub.entrypoints:195} - Command failed:
'ownership_type'.
Run with --debug to get full stacktrace.
e.g. 'datahub --debug ingest -c csv.yaml'
alert-fall-82501
10/07/2022, 1:51 PMalert-fall-82501
10/07/2022, 1:53 PMadamant-furniture-37835
10/07/2022, 6:09 PMtransformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:TestTagToBeAppliedAutomatically"
replace_existing: false
semantics: PATCH
- type: "simple_add_dataset_ownership"
config:
semantics: PATCH
owner_urns:
- "urn:li:corpuser:USER_ID"
ownership_type: "PRODUCER"
Mentioned Tag and Owner (USER_ID) is already available in datahub-gms before we run the ingestion process (tag availability doesn't affect result )
Token used in recipe is the personal token created by a user who has admin access to Datahub.
Datahub CLI as well as server have version v0.8.45 (elasticsearch-setup-job has version v0.8.44, version 45 seems to have bug )
Please guide if we are missing something heremany-rainbow-50695
10/09/2022, 6:24 AMbreezy-camera-11182
10/10/2022, 2:48 AMlimited-cricket-18852
10/10/2022, 6:02 PMappName
on the SparkSession, I keep producing Databricks Shell
. Has anyone got more success?limited-forest-73733
10/10/2022, 2:12 PMfuture-hair-23690
10/11/2022, 5:17 AMsource:
type: mssql
config:
password: ---------
database: sandbox_validation
host_port: 'az-uk-mssql-accept-01.logex.cloud:1433'
username: ------
use_odbc: 'true'
uri_args:
driver: 'ODBC Driver 17 for SQL Server'
Encrypt: 'Yes'
TrustServerCertificate: 'Yes'
ssl: 'True'
env: STG
profiling:
enabled: true
limit: 10000
report_dropped_profiles: false
profile_table_level_only: false
include_field_null_count: true
include_field_min_value: true
include_field_max_value: true
include_field_mean_value: true
include_field_median_value: true
include_field_stddev_value: true
include_field_quantiles: true
include_field_distinct_value_frequencies: true
include_field_sample_values: true
turn_off_expensive_profiling_metrics: false
include_field_histogram: true
catch_exceptions: false
max_workers: 4
query_combiner_enabled: true
max_number_of_fields_to_profile: 100
profile_if_updated_since_days: null
partition_profiling_enabled: false
schema_pattern:
deny:
- DS\\oleksii
- ds*
- Logex*
allow:
- dbo.*
- dbo
cheers!little-spring-72943
10/11/2022, 8:30 AMdamp-ambulance-34232
10/11/2022, 9:59 AMfamous-florist-7218
10/11/2022, 10:02 AMalert-fall-82501
10/11/2022, 10:18 AMmammoth-apple-56011
10/11/2022, 10:56 AMripe-tailor-61058
10/11/2022, 4:56 PMripe-tailor-61058
10/11/2022, 4:58 PMripe-tailor-61058
10/11/2022, 5:00 PMsalmon-jackal-36326
10/11/2022, 6:51 PM'[2022-10-11 18:20:04,294] ERROR {datahub.ingestion.source.ge_data_profiler:934} - Encountered exception while profiling ', ["Profiling exception \'partial_unexpected_list\'"]},\n' "KeyError: 'partial_unexpected_list'\n"
'[2022-10-11 18:20:26,841] ERROR {datahub.ingestion.source.ge_data_profiler:315} - Failed to get unique count for column '
Some of my tables have columns with spaces in the name and don't have a primary key, I don't know if this is relevant. As it's my first time using it in docker on an ec2, I don't know the best practices properly.
DATAHUB VERSION: v0.8.45
source:
type: snowflake
config:
include_table_lineage: true
password: '${SNOWFLAKE_PASSWORD}'
account_id: ACCOUNT
role: accountadmin
profiling:
enabled: true
include_view_lineage: true
warehouse: DEV
stateful_ingestion:
enabled: true
schema_pattern:
deny:
- '.*DEV'
- '.*INFORMATION_SCHEMA'
- '.*PUBLIC'
database_pattern:
allow:
- ^DATABASE_A$
- ^DATABASE_B$
- ^DATABASE_C$
- ^DATABASE_D$
- ^DATABASE_E$
- ^DATABASE_F$
- ^DATABASE_G$
- ^DATABASE_H$
username: '${SNOWFLAKE_USER}'
pipeline_name: 'urn:li:dataHubIngestionSource:5a8d58a3-dc4e-43b4-a59e-05c6ef9e0bce'
Same problem here as I can see and I tested using the params
https://www.linen.dev/s/datahubspace/t/439789/hi-all-i-enable-profiling-but-got-an-error-called-partial-uncool-translator-98249
10/11/2022, 7:15 PMnarrow-toddler-80534
10/12/2022, 7:35 AMbillowy-pager-44683
10/12/2022, 9:35 AMcareful-action-61962
10/12/2022, 10:14 AMcolossal-hairdresser-6799
10/12/2022, 12:20 PMdelightful-barista-90363
10/12/2022, 3:28 PMbrainy-crayon-53549
10/12/2022, 4:19 PMrich-state-73859
10/12/2022, 6:12 PMtest-group
. Does anyone have a suggestion on how I can resolve this?
Invalid urn format for aspect: {owners=[{owner=urn:li:corpgroup:test-group, type=PRODUCER, source={type=MANUAL}}], lastModified={actor=urn:li:corpuser:datahub, time=1665598238050}} for entity: urn:li:dataset:(urn:li:dataPlatform:athena,<table name here>,DEV)
Cause: ERROR :: /owners/0/owner :: \"Provided urn urn:li:corpgroup:test-group\" is invalid: Entity type for urn: urn:li:corpgroup:test-group is not a valid destination for field path: /owners/*/owner
brainy-table-99728
10/12/2022, 6:27 PMquiet-wolf-56299
10/12/2022, 8:23 PMquiet-wolf-56299
10/12/2022, 9:03 PM