Skip to content

Conversation

@jan-cerny
Copy link
Collaborator

Description:

The build system resolves ("compiles") items and saves them to files in the build directory in the first phase of the build process and then loads these files to load items in later phases of the build process.

Currently, YAML format is used as a format of these interim artifacts. This PR changes the format used for these files to JSON.

This PR affects only the build process, it doesn't change format of the files in the git repository. Since JSON is a subset of YAML, the files are still consumable by tools that work with YAML files.

Rationale:

The reason is that Python JSON library is faster than Python YAML library.

This change means a significant speed up of the build process. On my machine, the ./build_product rhel8 command reduced from average 54 seconds to 40 seconds. That's 25 % reduction of the build time.

Review Hints:

Compare built data streams, XCCDFs and other build outputs built using this PR and using current master. Hint: Set SOURCE_DATE_EPOCH=1 environment variable to eliminate differences in time stamps so that you can easily use meld or diff to compare files.

Review the files in directories in build directory, eg. build/rhel9/rules, build/rhel9/groups ...

For more details, please read commit messages of each commit.

jan-cerny added 2 commits May 9, 2025 10:53
This commit adds methods to XCCDFEntity that allow to use JSON format
for storing XCCDFEntity similar to YAML. The `dump_json` method can
serialize XCCDFEntity class instances to JSON format. The
`from_compiled_json` can instantiate XCCDFEntity objects by loading a
JSON file. These methods can be used or overridden in child classes of
XCCDFEntity, eg. Rule, Profile, Group. It's important to notice that
`from_compiled_json` doesn't do any Jinja 2 processing, the input file
must be valid JSON, and the method should be used to load only fully
resolved data.
The build system resolves ("compiles") items and saves them to files in
the build directory in the first phase of the build process and then
loads these files to load items in later phases of the build process.
Currently, YAML format is used as a format of these interim artifacts.
This commit changes the format to JSON. The reason is that Python JSON
library is faster than Python YAML library. This change affects only the
build process, it doesn't change format of the files in the git
repository.
@jan-cerny jan-cerny added this to the 0.1.77 milestone May 9, 2025
@jan-cerny jan-cerny added the Infrastructure Our content build system label May 9, 2025
jan-cerny added 7 commits May 9, 2025 11:43
Reduce code duplication by extracting common code to a new method.
Reduce code duplication by extracting common code to a new function
Refactor the method to reduce the cognitive complexity.
@qlty-cloud-legacy
Copy link

Code Climate has analyzed commit b507749 and detected 3 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 2
Style 1

Note: there is 1 critical issue.

The test coverage on the diff in this pull request is 69.5% (50% is the threshold).

This pull request will bring the total coverage in the repository to 61.9% (0.0% change).

View more on Code Climate.

@Mab879 Mab879 self-assigned this May 9, 2025
@Mab879
Copy link
Member

Mab879 commented May 13, 2025

Since this changes the format of build/rhel10/profiles/stig.profile which means coping that file to tests/data/profile_stability/rhel10/stig_gui.profile will now be in JSON vs the stored YAML. Should we also change those to JSON? It seems that current test will pass with JSON data.

@jan-cerny
Copy link
Collaborator Author

@Mab879

The current tests works with both YAML and JSON data.

We could change the stat in the profile_stability directory to JSON. But, I think that making the mass change of profile stability data in this PR would make this PR too big and difficult to review.

People can easily convert json to yaml using a simple command. For example:

python -c 'import sys, yaml, json; yaml.dump(json.load(sys.stdin), sys.stdout)' \
 < build/rhel9/profiles/cis.profile > tests/data/profile_stability/rhel9/cis.profile

I suggest changing the test so that it will suggest using this command instead the cp command.

In future, I would like to improve the test. The test compares only the selections list. I would like to change the data in profile_stability to a simple text file containing only sorted selections. That will be resistant to format changes eg. whitespace or key addition.

@Mab879
Copy link
Member

Mab879 commented May 15, 2025

@jan-cerny So for now, I will merge this as is. We should have a follow up PR to fix the stabilization test like you suggested.

Copy link
Member

@Mab879 Mab879 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is faster, and the builds seem to fine.

@Mab879 Mab879 merged commit 4953088 into ComplianceAsCode:master May 15, 2025
110 checks passed
jan-cerny added a commit to jan-cerny/scap-security-guide that referenced this pull request May 15, 2025
This is a follow up on ComplianceAsCode#13445
where we started to use JSON as a file format for resolved items.
This change caused that the suggestion to update profile stability data
by copying the resolved profile files doesn't work and the copying needs
to be replaced by converting the files from JSON to YAML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Infrastructure Our content build system

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants