Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ea46579
Cache UTF-8 partials of ConhostConnection output pipe
german-one Jun 30, 2019
7c83c38
Revert "Cache UTF-8 partials of ConhostConnection output pipe"
german-one Jul 6, 2019
7a4a814
Fix for UTF-8 partials in functions `ConhostConnection::_OutputThread…
german-one Jul 6, 2019
a424a16
Revert "Fix for UTF-8 partials in functions `ConhostConnection::_Outp…
german-one Jul 6, 2019
85b5509
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
5780770
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
d0e6e82
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
d98c589
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 7, 2019
7df6609
Fix for UTF-8 partials in function `ApiRoutines::WriteConsoleOutputCh…
german-one Jul 7, 2019
507f526
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 7, 2019
7d8c4b1
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
f08f31c
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
bb299d7
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
c395f3b
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
23c4286
Utf8OutPipeReader class added
german-one Jul 11, 2019
99d0713
Utf8OutPipeReader class added
german-one Jul 12, 2019
c877edd
Utf8OutPipeReader class added
german-one Jul 12, 2019
e35a78f
Utf8OutPipeReader class added
german-one Jul 12, 2019
31db29b
Unit Test added
german-one Jul 14, 2019
d09ac87
use specific macros and WIL classes
german-one Jul 15, 2019
f23bcd9
avoid possible deadlock caused by unclosed pipe handle
german-one Jul 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions src/cascadia/TerminalConnection/ConhostConnection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "pch.h"
#include "ConhostConnection.h"
#include "windows.h"
#include "wil/common.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have wil/common, string_view, algorithm and type_traits in our precompiled header through pch.h -> LibraryIncludes.h!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to including the related headers. It's difficult for me to find the way across other header files to figure out which have been already included over detours.
Before I push the next commit I should probably check if it compiles without string_view, algorithm, and type_traits included directly.

#include <sstream>
#include <string_view>
#include <algorithm>
Expand Down Expand Up @@ -234,16 +235,16 @@ namespace winrt::Microsoft::Terminal::TerminalConnection::implementation
const BYTE* const endPtr{ buffer + dwRead };
const BYTE* backIter{ endPtr - 1 };
// If the last byte in the buffer was a byte belonging to a UTF-8 multi-byte character
if ((*backIter & 0b10000000) == 0b10000000)
if (WI_AreAllFlagsSet(*backIter & 0b10000000, 0b10000000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually pretty cool: WI_AreAllFlagsSet lets you skip the & 0b000 part. This would just be WI_AreAllFlagsSet(*backIter, 0b10000000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. That was dumb of me.

{
// Check only up to 3 last bytes, if no Lead Byte was found then the byte before must be the Lead Byte and no partials are in the buffer
for (DWORD dwSequenceLen{ 1UL }, stop{ dwRead < 4UL ? dwRead : 4UL }; dwSequenceLen < stop; ++dwSequenceLen, --backIter)
{
// If Lead Byte found
if ((*backIter & 0b11000000) == 0b11000000)
if (WI_AreAllFlagsSet(*backIter & 0b11000000, 0b11000000))
{
// If the Lead Byte indicates that the last bytes in the buffer is a partial UTF-8 code point then cache them
if ((*backIter & bitmasks[dwSequenceLen]) != bitmasks[dwSequenceLen - 1])
if (WI_IsAnyFlagClear(*backIter & bitmasks[dwSequenceLen], bitmasks[dwSequenceLen - 1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one deserves a comment -- since you're masking with the sequence prefix and checking the bits from the next sequence prefix down, you may want to call that out specifically. It took me three reviews to figure out why you were doing this 😄

I'm assuming it's because 0b11100000 is the prefix for a 3-byte sequence, but for it to be a 3-byte introducer those three high bits must be 110 (not 111), and we're just using the off-by-one array index because those mask/check values are luckily adjacent?

Copy link
Contributor Author

@german-one german-one Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I begin with dwSequenceLen = 1 which is the reason why index dwSequenceLen - 1 can't be off by one in the array.
Actually only the WI_IsAnyFlagClear(*backIter & 0b11000000, 0) in the first iteration is tricky to understand. I checked before that a lead byte was found (that is, the two highest bits are 1). WI_IsAnyFlagClear will always return true here and the byte is getting cached.
The rest behaves exactly as you said. We need 0b111'00000 as mask to compare the result with 0b110'00000 etc. And that's the reason why naming these bytes would be ambiguous as I explained in the conversation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DHowett-MSFT Now that I read it twice I get this uncomfortable feeling. Should the first argument the second and vice versa?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.

WI_IsAnyFlagClear(val, flags) Any bit specified by flags is not set in val.

I would prefer the direct equality comparison here (which you had before without WI_). You want to make sure that all bit values match within the mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I figured something is going wrong here. Thanks!

{
std::move(backIter, endPtr, utf8Partials);
dwRead -= dwSequenceLen;
Expand Down