-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(spark): Implement Spark functions url_encode, url_decode and try_url_decode
#17399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| /// Replace b'+' with b' ' | ||
| fn replace_plus(input: &[u8]) -> Cow<'_, [u8]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test case that covers this replacement?
| /// * `Ok(String)` - The encoded string | ||
| /// | ||
| fn encode(value: &str) -> Result<String> { | ||
| Ok(byte_serialize(value.as_bytes()).collect::<String>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for encoding you don't need to do the inverse operations you do in decoding? Like the + -> ' ' manipulation? I am unfamiliar with the specific needs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timsaucer Let me explain a bit about the specific needs here. What we are trying to archieve in encode/decode functions are to transform the string from/to application/x-www-form-urlencoded format. The url::form_urlencoded crate is already support this transformation.
- So for the
encodefunction, we can just simply use thebyte_serializefunction exported from the crates - For
decodefunction, there are 2 drawbacks:- The crate doesn't expose
decodefunction for public use - It doesn't return Error when there is invalid format, for example: %2s is an invalid percent encoded string
- The crate doesn't expose
This is the reason why on the url_decode, I tried to do the transformation manually. It's actually the same code as the decode function in the crate except that there is a valiation step.
So to answer your question:
encode: The function already did the necessary manipulationdecode: Thedecodefunction in the crate is not exposed and there's no format validation, so we try to copy the logic of thedecodefunction in the crate with a validation step
Let me know if you have any suggestion on this
| # For more information, please see: | ||
| # https://github.com/apache/datafusion/issues/15914 | ||
| query T | ||
| SELECT url_decode('https%3A%2F%2Fspark.apache.org'::string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with the slt files, but if there was a way to have these do all 3 types of input (utf8, utf8view, largeutf8) that would be nice.
|
Thank @timsaucer for your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more test cases such as different types of inputs (as mentioned), nulls, empty strings, etc. would be beneficial 🙂
Edit: also perhaps a few roundtrip tests?
abfac53 to
741f9a8
Compare
|
Still seeing a lack of test cases mentioned by previous review comments: nulls, empty strings, roundtrips, plus replacement |
|
Thank @Jefffrey Let me know if you have any suggestions |
Yes, we should still have a test case for this, preferably in an SLT case. Consider what happens if someone decides to refactor that code; there is currently no test case that would prevent them breaking it, I believe? |
741f9a8 to
b60dd69
Compare
url_encode and url_decode url_encode, url_decode and
url_encode, url_decode and url_encode, url_decode and try_url_decode
|
Hi @Jefffrey
Please help review and let me know your feedback |
|
CI failures seem unrelated 🤔 |
Seem so, the error code is unfamiliar to me. |
|
Thanks @anhvdq & @timsaucer |
Which issue does this PR close?
datafusion-sparkSpark Compatible Functions #15914Rationale for this change
What changes are included in this PR?
Implement Spark functions
url_encode,url_decodeandtry_url_decodeAre these changes tested?
Yes
Are there any user-facing changes?
Yes