Hello, I have a question regarding Unicode charact...
# general
t
Hello, I have a question regarding Unicode characters in Contracts. I'm using expressions like
/.{1,4}/
to just assert that a string has a certain length. The dot also includes unicode characters, which is intentional. Pact-JVM saves Unicode characters in the contract file using a backslash escape notation, like
\uDAA5
for a surrogate, even though the file is UTF-8. However, the Pact CLI Tools seem not to be able to handle some of those sequence. By example, the surrogate escape causes the throw of "incomplete surrogate pair at '\uDAA5". Where lies the problem here? Should Pact-JVM just use the actual values or should the CLI tools be able to handle the escaped sequences?
m
Good question. I think we may have seen some of these errors in PactFlow (probably the same issues in the Ruby CLI tooling as they use the same underlying libs). I don’t have the answers, but looping in a few people to start a convo cc: @phanindra srungavarapu @rholshausen
and @Steven Holloway
t
@rholshausen mentioned [here](https://github.com/pact-foundation/pact-jvm/issues/1536#issuecomment-2533208036) that pact-jvm intentionally escapes all non-ascii sequences.
I opened an issue with a reproduction example on the pact-ruby-standalone repo: https://github.com/pact-foundation/pact-ruby-standalone/issues/156
Found a little time to dig into how unicode works. Seems my problem is caused by something completely different. The CLI is refusing an incomplete unicode sequence of
\uDAA5
(a high surrogate) which in valid unicode should be followed by a low surrogate. This sequence was generated by pact-jvm's regex generator. The corresponding library is also unmaintained since at least 5 years. Opened an issue on the pact-jvm repo: https://github.com/pact-foundation/pact-jvm/issues/1848
r
Oh, that old chestnut. Yes, we recommend giving an example value (second parameter) to avoid this.
t
I find this a bit sad, because generating the example value from the regex forces the application to actually support the specified format. By example, if I specify
.{9}
in the contract tests but actually just support
[a-Z0-9]
this wouldn't be caught by pact as long as the example value I provide matches
[a-Z0-9]
. Of course this may still happen with a working generator, as long as I CAN provide an example.