Skip to content
Merged
Changes from 12 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
ea46579
Cache UTF-8 partials of ConhostConnection output pipe
german-one Jun 30, 2019
7c83c38
Revert "Cache UTF-8 partials of ConhostConnection output pipe"
german-one Jul 6, 2019
7a4a814
Fix for UTF-8 partials in functions `ConhostConnection::_OutputThread…
german-one Jul 6, 2019
a424a16
Revert "Fix for UTF-8 partials in functions `ConhostConnection::_Outp…
german-one Jul 6, 2019
85b5509
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
5780770
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
d0e6e82
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 6, 2019
d98c589
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 7, 2019
7df6609
Fix for UTF-8 partials in function `ApiRoutines::WriteConsoleOutputCh…
german-one Jul 7, 2019
507f526
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 7, 2019
7d8c4b1
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
f08f31c
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
bb299d7
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
c395f3b
Fix for UTF-8 partials in function `ConhostConnection::_OutputThread`
german-one Jul 9, 2019
23c4286
Utf8OutPipeReader class added
german-one Jul 11, 2019
99d0713
Utf8OutPipeReader class added
german-one Jul 12, 2019
c877edd
Utf8OutPipeReader class added
german-one Jul 12, 2019
e35a78f
Utf8OutPipeReader class added
german-one Jul 12, 2019
31db29b
Unit Test added
german-one Jul 14, 2019
d09ac87
use specific macros and WIL classes
german-one Jul 15, 2019
f23bcd9
avoid possible deadlock caused by unclosed pipe handle
german-one Jul 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 57 additions & 16 deletions src/cascadia/TerminalConnection/ConhostConnection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
#include "pch.h"
#include "ConhostConnection.h"
#include "windows.h"
#include "wil/common.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have wil/common, string_view, algorithm and type_traits in our precompiled header through pch.h -> LibraryIncludes.h!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to including the related headers. It's difficult for me to find the way across other header files to figure out which have been already included over detours.
Before I push the next commit I should probably check if it compiles without string_view, algorithm, and type_traits included directly.

#include <sstream>
#include <string_view>
#include <algorithm>
#include <type_traits>
// STARTF_USESTDHANDLES is only defined in WINAPI_PARTITION_DESKTOP
// We're just gonna manually define it for this prototyping code
#ifndef STARTF_USESTDHANDLES
Expand Down Expand Up @@ -190,38 +194,75 @@ namespace winrt::Microsoft::Terminal::TerminalConnection::implementation
DWORD ConhostConnection::_OutputThread()
{
const size_t bufferSize = 4096;
BYTE buffer[bufferSize];
DWORD dwRead;
BYTE buffer[bufferSize]{ 0 };
BYTE utf8Partials[4]{ 0 }; // buffer for code units of a partial UTF-8 code point that have to be cached
DWORD dwRead{}; // length of a chunk read
DWORD dwPartialsLen{}; // number of cached UTF-8 code units
bool fSuccess{};
const BYTE bitmasks[]{ 0, 0b11000000, 0b11100000, 0b11110000 }; // for comparisons after the Lead Byte was found

// process the data of the output pipe in a loop
while (true)
{
dwRead = 0;
bool fSuccess = false;
// copy UTF-8 code units that were remaining from the previously read chunk (if any)
if (dwPartialsLen != 0)
{
std::move(utf8Partials, utf8Partials + dwPartialsLen, buffer);
}

// try to read data
fSuccess = !!ReadFile(_outPipe.get(), &buffer[dwPartialsLen], std::extent<decltype(buffer)>::value - dwPartialsLen, &dwRead, nullptr);

dwRead += dwPartialsLen;
dwPartialsLen = 0;

fSuccess = !!ReadFile(_outPipe.get(), buffer, bufferSize, &dwRead, nullptr);
if (!fSuccess)
if (dwRead == 0) // quit if no data has been read and no cached data was left over
{
return 0;
}
else if (!fSuccess)
{
if (_closing.load())
{
// This is okay, break out to kill the thread
return 0;
}
else
{
_disconnectHandlers();
return (DWORD)-1;
}

_disconnectHandlers();
return (DWORD)-1;
}
if (dwRead == 0)

const BYTE* const endPtr{ buffer + dwRead };
const BYTE* backIter{ endPtr - 1 };
// If the last byte in the buffer was a byte belonging to a UTF-8 multi-byte character
if (WI_AreAllFlagsSet(*backIter, 0b10000000))
{
continue;
// Check only up to 3 last bytes, if no Lead Byte was found then the byte before must be the Lead Byte and no partials are in the buffer
for (DWORD dwSequenceLen{ 1UL }, stop{ dwRead < 4UL ? dwRead : 4UL }; dwSequenceLen < stop; ++dwSequenceLen, --backIter)
{
// If Lead Byte found
if (WI_AreAllFlagsSet(*backIter, 0b11000000))
{
// If the Lead Byte indicates that the last bytes in the buffer is a partial UTF-8 code point then cache them
if (WI_IsAnyFlagClear(*backIter & bitmasks[dwSequenceLen], bitmasks[dwSequenceLen - 1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one deserves a comment -- since you're masking with the sequence prefix and checking the bits from the next sequence prefix down, you may want to call that out specifically. It took me three reviews to figure out why you were doing this 😄

I'm assuming it's because 0b11100000 is the prefix for a 3-byte sequence, but for it to be a 3-byte introducer those three high bits must be 110 (not 111), and we're just using the off-by-one array index because those mask/check values are luckily adjacent?

Copy link
Contributor Author

@german-one german-one Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I begin with dwSequenceLen = 1 which is the reason why index dwSequenceLen - 1 can't be off by one in the array.
Actually only the WI_IsAnyFlagClear(*backIter & 0b11000000, 0) in the first iteration is tricky to understand. I checked before that a lead byte was found (that is, the two highest bits are 1). WI_IsAnyFlagClear will always return true here and the byte is getting cached.
The rest behaves exactly as you said. We need 0b111'00000 as mask to compare the result with 0b110'00000 etc. And that's the reason why naming these bytes would be ambiguous as I explained in the conversation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DHowett-MSFT Now that I read it twice I get this uncomfortable feeling. Should the first argument the second and vice versa?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.

WI_IsAnyFlagClear(val, flags) Any bit specified by flags is not set in val.

I would prefer the direct equality comparison here (which you had before without WI_). You want to make sure that all bit values match within the mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I figured something is going wrong here. Thanks!

{
std::move(backIter, endPtr, utf8Partials);
dwRead -= dwSequenceLen;
dwPartialsLen = dwSequenceLen;
}

break;
}
}
}
// Convert buffer to hstring
char* pchStr = (char*)(buffer);
std::string str{ pchStr, dwRead };
auto hstr = winrt::to_hstring(str);
const std::string_view strView{ reinterpret_cast<char*>(buffer), dwRead };
auto hstr{ winrt::to_hstring(strView) };

// Pass the output to our registered event handlers
_outputHandlers(hstr);
}

return 0;
}
}