Hi Team, I use Airbyte with Kafka source to GCS - ...
# feedback-and-requests
b
Hi Team, I use Airbyte with Kafka source to GCS - I cannot understand how can I parse the value message from Kafka to json schema ? I setup JSON schema via API but json validation fails :
Copy code
2022-03-14 09:31:22 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed. 
errors: $: null found, object expected
2022-03-14 09:31:22 ERROR i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$1):70 - Validation failed: null
2022-03-14 09:31:22 destination > 2022-03-14 09:31:22 INFO i.a.i.b.IntegrationRunner(runInternal):154 - Completed integration: io.airbyte.integrations.destination.gcs.GcsDestination
2022-03-14 09:31:22 INFO i.a.w.DefaultReplicationWorker(run):165 - Source and destination threads complete.
a
All right, I understand your problem better now 🙂 Our kafka connector considers all the incoming message value to be strings and do not perform any schema validation. If your really need schema validation I think you should to this after the sync, directly on the parquet file in GCS. Or if you consider using a datawarehouse, use a custom DBT transformation to perform custom validation on the records (parsing the string as json and performing the validation you want).
You could also open an issue on our repo and suggest that this connector receive a schema parameter against which it will perform schema validation.
b
I understand that I can do that @[DEPRECATED] Augustin Lafanechere but if my value column is a valid json object- then way just parse the json and let me write - I cannot find any reason to write parquet file with one files with json as string 😲