Hello, All. I am trying to create an Azure Table S...
# ask-community-for-troubleshooting
b
Hello, All. I am trying to create an Azure Table Storage source via the AirByte GUI. Azure Table <> S3 bucket. The connection and sync succeeds, but I only get the first column of my data in the S3 destination file. This seems to be expected based on what I see for the connection's Replication streams output:
u
Hey Brian! This is a great question. After a short overview I haven't found anything in our docs/tutorials yet, so let me look a bit further. For the future - could you please keep your question to one thread? This helps our team stay organized and do our jobs better :)
b
Thanks, Nataly.
Moving to this thread... Under the Source Settings, the docs warn us about this:
image.png
The trouble is, I don't understand what
data
is being discussed here. Any hints? How / where do I get or set this information?
u
Hey Brian! Sorry for the wait - I'm still looking into this. It seems like the
data
object should automatically get mapped in the S3 bucket. What filetype are you using in S3? I'm stumped as to why it's not happening
b
The trouble is on the source side - the Azure Table Storage connector. I have a table defined in Azure. When AirByte tries to pull over streams for me to select from, I get one and only one stream:
You can see at the bottom of the image, the only thing it is showing me is the name of the first column in the Azure table: PartitionKey. When uploading a table into Azure from CSV, you are required to have 'PartitionKey' and "RowIndex" columns in your CSV file. Outside of Azure, these columns are meaningless. The point is, the Azure connector is having trouble generating a schema from the Azure table. The connector documentation says that this is a challenge:
Copy code
This Source have generic schema for all streams. Azure Table storage is a service that stores non-relational structured data (also known as structured NoSQL data). There is no efficient way to read schema for the given table.
The author of the documentation suggests that there is a way to work around this problem:
Copy code
We use data property to have all the properties for any given row.

data - This property contain all values
additionalProperties - This property denotes that all the values are in data property.
    {     "$schema": "<http://json-schema.org/draft-07/schema#>",     "type": "object",     "properties": {         "data": {             "type": "object"         },         "additionalProperties": {             "type": "boolean"         }     } }
The trouble is that the author provides no context for someone like me. Where is this
data
property? It seems to be related to the source???? And what does this sentence mean:
data - This property contain all values
?
I think the author is suggesting that I can create
data
which is a schema to represent the table. How do I do this? Where does such a definition live?
I figured out how to make this work. I used the AirByte API to create the connection and passed in a schema that corresponded to my table. http://localhost:8000/api/v1/connections/create { "name": "Azure Table Storage 4 <> data-ingestor-dest", "namespaceDefinition": "source", "namespaceFormat": "${SOURCE_NAMESPACE}", "prefix": "", "sourceId": "13150ab9-9a02-47a5-8e0a-a48d3d54cde3", "destinationId": "8fac964e-42c4-4a21-8c75-77ab95756627", "operationIds": [], "status": "active", "syncCatalog": { "streams": [ { "stream": { "name": "swayairbytetesttable", "jsonSchema": { "type": "object", "$schema": "http://json-schema.org/draft-07/schema#", "properties": { "PartitionKey": { "type": "string" }, "RowKey": { "type": "string" }, "CRIM": { "type": "number" }, "ZN": { "type": "number" }, "INDUS": { "type": "number" }, "CHAS": { "type": "number" }, "NOX": { "type": "number" }, "RM": { "type": "number" }, "AGE": { "type": "number" }, "DIS": { "type": "number" }, "RAD": { "type": "number" }, "TAX": { "type": "number" }, "PTRATIO": { "type": "number" }, "B": { "type": "number" }, "LSTAT": { "type": "number" }, "target": { "type": "number" } } }, "supportedSyncModes": [ "full_refresh", "incremental" ], "sourceDefinedCursor": true, "defaultCursorField": [ "PartitionKey" ], "sourceDefinedPrimaryKey": [] }, "config": { "syncMode": "full_refresh", "cursorField": [ "PartitionKey" ], "destinationSyncMode": "overwrite", "primaryKey": [], "aliasName": "swayairbytetesttable", "selected": true } } ] } }
n
That's wonderful to hear, thanks for posting the details! I'll make an issue to include this in our documentation!