Hi team, I am trying to use csv as source and got ...
# contributing-to-airbyte
k
Hi team, I am trying to use csv as source and got the error saying one of the expected column should be in string. eg, "externalid" - Should be 'string', but found 'integer' . This is the error msg. How can i typecast this particular column from int to string. Found a way through "Additional Reader Options" . But sample code for this type cast will be appreciated. Thanks in advance
Hi @Kabilan Ravi, you can set reader options according to your need, for csv these options are similar to those offered by `pandas.read_csv`(reference here). You could set additionnal reader options to
{"dtype": "str"}
to make all your columns interpreted as string
Let me check this out. Thanks for the quick response.
Gave the dtype as you mentioned. But received an error while fetching the schema.
@[DEPRECATED] Augustin Lafanechere ☝️ Any luck?
Which source connector are you using?
S3
Ok, I thought you were using the file connector. For the S3 connector I suggest you to define the schema of your CSV in the Schema field
Sure. I will try that. Also alternatively i found a solution
Copy code
{
  "column_types": {
    "externalid": "string",
    "currency": "string"
  }
}
in additional reader options seems like working. I am trying the sync now. Hopefully it should work
I hope you've read the Additional reader options description and realize that forcing column types might cause problems 😄
Now the conversion error while setting up is solved. But while syncing i am getting conversion error. But it doesn't show which column has the conversion error. Because of this its getting difficult to fix the issue. Eg, error below
Copy code
2022-01-18 14:45:56 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):203 - Records read: 1000
2022-01-18 14:45:57 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):203 - Records read: 2000
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 - Traceback (most recent call last):
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -   File "/airbyte/integration_code/main.py", line 13, in <module>
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -     launch(source, sys.argv[1:])
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/entrypoint.py", line 108, in launch
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -     for message in source_entrypoint.run(parsed_args):
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/entrypoint.py", line 99, in run
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -     for message in generator:
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 110, in read
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -     raise e
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -   File "/usr/local/lib/python3.7/site-packages/airbyte_cdk/sources/abstract_source.py", line 106, in read
source - 2022-01-18 14:45:58 ERROR () LineGobbler(voidCall):82 -     internal_config=internal_config,
Please define your schemas in the
Schemas
field, it's the go to way to solve your issue.