bland-orange-13353
05/31/2023, 6:53 AMfierce-agent-11572
05/31/2023, 8:46 AMproud-school-44110
05/31/2023, 9:19 AMbillions-rose-75566
05/31/2023, 12:44 PMpowerful-tent-14193
05/31/2023, 1:32 PMpowerful-tent-14193
05/31/2023, 1:53 PMlate-arm-1146
06/01/2023, 9:41 AMcsv-enricher
with v0.8.45. I am unable to get an existing 'domain' added to a dataset. I have checked the URN for the domain created through UI and the CSV ingestion does not throw an error either. Is there anything I might be missingstrong-wall-16201
06/01/2023, 11:37 AMsilly-intern-25190
06/01/2023, 2:30 PMnutritious-lifeguard-19727
06/01/2023, 6:16 PMlimited-train-99757
06/02/2023, 8:03 AMlimited-forest-73733
06/02/2023, 1:16 PMlittle-spring-72943
06/03/2023, 2:15 AMERROR {datahub.entrypoints:199} - Command failed: failed to reach RUNNING, got State.STOPPED: current status: State.STOPPED
Do we know how can we increase wait time for warehouse cluster to be warmed up?shy-dog-84302
06/05/2023, 7:58 AMcreamy-ram-28134
06/05/2023, 1:56 PMacoustic-quill-54426
06/05/2023, 2:24 PMhappy-branch-61686
06/05/2023, 2:59 PMnumerous-address-22061
06/05/2023, 5:47 PMbroad-yak-43537
06/06/2023, 10:05 AMnutritious-megabyte-12020
06/06/2023, 1:37 PMERROR {datahub.entrypoints:199} - Command failed: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
is it supported at all?
recipe:
source:
type: "file"
config:
path: ./exampleimgs/
file_extension: ".jpg"
sink:
type: "datahub-rest"
config:
server: scret
token: secretelegant-salesmen-99143
06/06/2023, 3:15 PMprofile_table_level_only
set to true
. But in Stats tab for tables I see Stats for each column.
Is that correct behaviour?
I have the same thing on Presto and on Postgres. The Datahub version is 10.1.
profiling:
enabled: true
profile_table_level_only: true
include_field_sample_values: false
wide-florist-83539
06/06/2023, 5:27 PMfew-sugar-84064
06/07/2023, 7:48 AM- <s3://test-datalake/user-event/etl_year={year}/etl_month={month}/etl_day={day}/etl_hour={hour}/{i}.parquet>
- <s3://test-datalake/platform-event/pn/aos/etl_year={year}/etl_month={month}/etl_day={day}/etl_hour={hour}/{i}.parquet>
- <s3://test-datalake/platform-event/pn/ios/etl_year={year}/etl_month={month}/etl_day={day}/etl_hour={hour}/{i}.parquet>
• [recipe]
...
path_specs:
- include: "<s3://test-datalake/{table}/{partition_key[0]}={partition[0]}/{partition_key[1]}={partition[1]}/{partition_key[2]}={partition[2]}/{partition_key[3]}={partition[3]}/{partition_key[4]}={partition[4]}/*.parquet>"
- include: "<s3://test-datalake/platform-event/pn/{table}/{partition_key[0]}={partition[0]}/{partition_key[1]}={partition[1]}/{partition_key[2]}={partition[2]}/{partition_key[3]}={partition[3]}/{partition_key[4]}={partition[4]}/*.parquet>"
• [expectation]
◦ the first path: a dataset named "user-event" is created under "test-datake" folder
◦ the second path: two datasets named "aos", "ios" are created under "test-datalake/platform-event/pn"
• [result on datahub]
◦ the first path: a dataset named "1.parquet" is created under "test-datalake/user-event/etl_year=2023/etl_month=1/etl_date=1/etl_hour=3"
◦ the second path: didn't create any datasetcold-father-66356
06/07/2023, 8:44 AMproud-dusk-671
06/07/2023, 9:24 AMprehistoric-kangaroo-75605
06/07/2023, 4:15 PM[ODBC Driver 18 for SQL Server][SQL Server]Transaction (Process ID 207) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205)
We're seeing failures prior to that like:
2023-06-07 10:49:31,337 INFO sqlalchemy.engine.Engine BEGIN (implicit)
[2023-06-07 10:49:31,337] INFO {sqlalchemy.engine.Engine:1032} - BEGIN (implicit)
2023-06-07 10:49:31,338 INFO sqlalchemy.engine.Engine SELECT object_id(?, 'U')
[2023-06-07 10:49:31,338] INFO {sqlalchemy.engine.Engine:1858} - SELECT object_id(?, 'U')
2023-06-07 10:49:31,338 INFO sqlalchemy.engine.Engine [generated in 0.00016s] ('tempdb.dbo.[#ge_temp_133ecb0f]',)
[2023-06-07 10:49:31,338] INFO {sqlalchemy.engine.Engine:1863} - [generated in 0.00016s] ('tempdb.dbo.[#ge_temp_133ecb0f]',)
|[2023-06-07 10:49:31,427] ERROR {datahub.utilities.sqlalchemy_query_combiner:257} - Failed to execute query normally, using fallback:
CREATE TABLE "#ge_temp_133ecb0f" (
condition INTEGER NOT NULL
)
Traceback (most recent call last):
File "/Users/jerrythome/Library/Python/3.9/lib/python/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 253, in _sa_execute_fake
handled, result = self._handle_execute(conn, query, args, kwargs)
File "/Users/jerrythome/Library/Python/3.9/lib/python/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 218, in _handle_execute
if not self.is_single_row_query_method(query):
File "/Users/jerrythome/Library/Python/3.9/lib/python/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 228, in _is_single_row_query_method
column_names = [column.name for column in query_columns]
File "/Users/jerrythome/Library/Python/3.9/lib/python/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 228, in <listcomp>
column_names = [column.name for column in query_columns]
AttributeError: 'CreateColumn' object has no attribute 'name'
2023-06-07 10:49:31,428 INFO sqlalchemy.engine.Engine
CREATE TABLE [#ge_temp_133ecb0f] (
condition INTEGER NOT NULL
)
===========
SETUP
===========
Datahub: Quickstart: v.0.10.3
Datasource: Azure SQL database
General setup:
source:
type: mssql
config:
# Coordinates
host_port: <mysever>.<http://database.windows.net:1433|database.windows.net:1433>
database: <mydb>
# Credentials
username: <username>
password: <password>
# Options
use_odbc: True
uri_args:
driver: "ODBC Driver 18 for SQL Server"
Encrypt: "yes"
TrustServerCertificate: "Yes"
ssl: "True"
Has anyone experienced this? Is there an alternate connection for Azure SQL that's different than local SQL Server? I tried this with an admin user (vs. read only) too with the same results.
Thanks for any thoughts.early-hydrogen-27542
06/07/2023, 8:00 PMsource:
type: kafka
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP}
consumer_config:
security.protocol: "SSL"
schema_registry_url: ${KAFKA_SCHEMAURL}
env: ${DATAHUB_ENV}
stateful_ingestion:
enabled: true
remove_stale_metadata: true
pipeline_name: kafka_ingest
sink:
type: "datahub-rest"
config:
server: ${DATAHUB_REST}
retry_max_times: 10
We have a set of Avro schemas (.avsc) that are grouped in folders. For example, test_folder
holds test_one.avsc
and test_two.avsc
. The above recipe only ingests test_folder
as a topic. How would we also tell it to ingest test_one.avsc
and test_two.avsc
?freezing-sunset-28534
06/08/2023, 3:09 AMorange-river-19475
06/08/2023, 3:25 AMmicroscopic-room-90690
06/08/2023, 6:10 AM