Skip to content

OpenApiDocument.LoadAsync with non-seekable JSON stream and no format fails to read document #2638

@YanikCeulemans

Description

@YanikCeulemans

Describe the bug
Calling OpenApiDocument.LoadAsync with a non-seekable JSON stream (stream.CanSeek -> false) and without a given format fails to load the open api document correctly. This is due to the fact that the code, as seen below, tries to guess the format by "reading the first bytes". However, the input stream is fully read to the end, and is returned as such. This leads to code further down to try to read the JSON open api document from a stream which is already fully read to the end, which doesn't work.

OpenApi File To Reproduce

{
  "openapi" : "3.0.0",
  "info" : {
    "title" : "Weather API",
    "version" : "1.0.0"
  },
  "paths" : {
    "/api/weather" : {
      "get" : {
        "summary" : "Get current weather information",
        "operationId" : "checkWeatherUsingGET",
        "parameters" : [ {
          "name" : "location",
          "in" : "query",
          "required" : true,
          "description" : "Location for which to retrieve weather information.",
          "schema" : {
            "type" : "string"
          }
        } ],
        "responses" : {
          "200" : {
            "description" : "Current weather information",
            "content" : {
              "application/json" : {
                "schema" : {
                  "type" : "object",
                  "properties" : {
                    "location" : {
                      "type" : "object",
                      "properties" : {
                        "name" : {
                          "type" : "string"
                        },
                        "region" : {
                          "type" : "string"
                        },
                        "country" : {
                          "type" : "string"
                        },
                        "lat" : {
                          "type" : "number"
                        },
                        "lon" : {
                          "type" : "number"
                        },
                        "tz_id" : {
                          "type" : "string"
                        },
                        "localtime_epoch" : {
                          "type" : "integer"
                        },
                        "localtime" : {
                          "type" : "string"
                        }
                      }
                    },
                    "current" : {
                      "type" : "object",
                      "properties" : {
                        "last_updated" : {
                          "type" : "string",
                          "description" : "Local time when the real time data was updated"
                        },
                        "last_updated_epoch" : {
                          "type" : "integer",
                          "description" : "Local time when the real time data was updated in unix time"
                        },
                        "temp_c" : {
                          "type" : "number",
                          "description" : "Temperature in celsius"
                        },
                        "temp_f" : {
                          "type" : "number",
                          "description" : "Temperature in fahrenheit"
                        },
                        "is_day" : {
                          "type" : "integer",
                          "description" : "1 = Yes 0 = No, Whether to show day condition icon or night icon"
                        },
                        "condition" : {
                          "type" : "object",
                          "properties" : {
                            "text" : {
                              "type" : "string",
                              "description" : "Weather condition text"
                            },
                            "icon" : {
                              "type" : "string",
                              "description" : "Weather icon url"
                            },
                            "code" : {
                              "type" : "integer",
                              "description" : "Weather condition unique code"
                            }
                          }
                        },
                        "wind_mph" : {
                          "type" : "number",
                          "description" : "Wind speed in miles per hour"
                        },
                        "wind_kph" : {
                          "type" : "number",
                          "description" : "Wind speed in kilometer per hour"
                        },
                        "wind_degree" : {
                          "type" : "integer",
                          "description" : "Wind direction in degrees"
                        },
                        "wind_dir" : {
                          "type" : "string",
                          "description" : "Wind direction as 16 point compass, e.g., NSW"
                        },
                        "pressure_mb" : {
                          "type" : "number",
                          "description" : "Pressure in millibars"
                        },
                        "pressure_in" : {
                          "type" : "number",
                          "description" : "Pressure in inches"
                        },
                        "precip_mm" : {
                          "type" : "number",
                          "description" : "Precipitation amount in millimeters"
                        },
                        "precip_in" : {
                          "type" : "number",
                          "description" : "Precipitation amount in inches"
                        },
                        "humidity" : {
                          "type" : "integer",
                          "description" : "Humidity as percentage"
                        },
                        "cloud" : {
                          "type" : "integer",
                          "description" : "Cloud cover as percentage"
                        },
                        "feelslike_c" : {
                          "type" : "number",
                          "description" : "Feels like temperature in celsius"
                        },
                        "feelslike_f" : {
                          "type" : "number",
                          "description" : "Feels like temperature in fahrenheit"
                        },
                        "vis_km" : {
                          "type" : "number",
                          "description" : "Visibility in kilometers"
                        },
                        "vis_miles" : {
                          "type" : "number",
                          "description" : "Visibility in miles"
                        },
                        "uv" : {
                          "type" : "number",
                          "description" : "UV Index"
                        },
                        "gust_mph" : {
                          "type" : "number",
                          "description" : "Wind gust in miles per hour"
                        },
                        "gust_kph" : {
                          "type" : "number",
                          "description" : "Wind gust in kilometer per hour"
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Expected behavior
The input non-seekable stream is either not fully consumed (by using ReadAsync) and the returned stream takes the read part from the input stream into account. Or the input stream is fully read, but the MemoryStream into which it was read is returned instead of the input stream. This doesn't provide the performance improvement the original code was aiming for however.

Screenshots/Code Snippets
I will paste some code snippets that apply to, as far as I've checked, v3.0.2, v3.0.1, v3.0.0, v2.3.11 which I have annotated with inline comments:
Calling OpenApiDocument.LoadAsync calls LoadAsync on OpenApiModelFactory In https://github.com/microsoft/OpenAPI.NET/blob/v3.0.2/src/Microsoft.OpenApi/Reader/OpenApiModelFactory.cs#L126

        public static async Task<ReadResult> LoadAsync(Stream input, string? format = null, OpenApiReaderSettings? settings = null, CancellationToken cancellationToken = default)
        {
#if NET6_0_OR_GREATER
            ArgumentNullException.ThrowIfNull(input);
#else
            if (input is null) throw new ArgumentNullException(nameof(input));
#endif
            settings ??= new OpenApiReaderSettings();

            Stream? preparedStream = null;
            if (format is null) // <---- we hit this check because we didn't pass a format parameter, which then defaults to null
            {
                (preparedStream, format) = await PrepareStreamForReadingAsync(input, format, cancellationToken).ConfigureAwait(false);
            }

Which calls https://github.com/microsoft/OpenAPI.NET/blob/v3.0.2/src/Microsoft.OpenApi/Reader/OpenApiModelFactory.cs#L400

        private static async Task<(Stream, string)> PrepareStreamForReadingAsync(Stream input, string? format, CancellationToken token = default)
        {
            Stream preparedStream = input;

            if (!input.CanSeek) // <---- we hit this check because our stream is not seekable
            {
                // Use a temporary buffer to read a small portion for format detection // <---- This comment is misleading because the whole input stream is consumed. If it wants to read a small portion, the `ReadAsync` method on the stream is likely the desired call to make
                using var bufferStream = new MemoryStream();
                await input.CopyToAsync(bufferStream, 1024, token).ConfigureAwait(false); // <---- This line consumes the whole input stream and copies it to the `bufferStream`
                bufferStream.Position = 0;

                // Inspect the format from the buffered portion
                format ??= InspectStreamFormat(bufferStream);

                // If format is JSON, no need to buffer further — use the original stream.
                if (format.Equals(OpenApiConstants.Json, StringComparison.OrdinalIgnoreCase)) // <---- we hit this check because our document is JSON
                {
                    preparedStream = input; // <---- The resulting preparedStream is set to the read-to-end input stream
                }
                else
                {
                    // YAML or other non-JSON format; copy remaining input to a new stream.
                    preparedStream = new MemoryStream();
                    bufferStream.Position = 0;
                    await bufferStream.CopyToAsync(preparedStream, 81920, token).ConfigureAwait(false); // Copy buffered portion
                    await input.CopyToAsync(preparedStream, 81920, token).ConfigureAwait(false); // Copy remaining data
                    preparedStream.Position = 0;
                }
            }

Suggested fix
A suggested fix which I have not tested is something like this:

        private static async Task<(Stream, string)> PrepareStreamForReadingAsync(Stream input, string? format, CancellationToken token = default)
        {
            Stream preparedStream = input;

            if (!input.CanSeek)
            {
                preparedStream = new MemoryStream();
                await input.CopyToAsync(preparedStream, 1024, token).ConfigureAwait(false);
                preparedStream.Position = 0;

                // Inspect the format from the seekable memory stream, which does reset the position after reading some bytes
                format ??= InspectStreamFormat(preparedStream);
            }

Where the potential performance difference between YAML and JSON would be ignored, but that seems out of scope for this fix.

Additional context
A workaround for this issue is to pass the format along in the first place. That way the stream isn't read from to guess the format and the code works as expected. However, that obviously isn't possible when the consumer of the code doesn't know the format.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions