Compare commits

...

63 Commits
v1.9.1 ... main

Author SHA1 Message Date
leafspark 4458ad1e58
Merge pull request #139 from leafspark/dependabot/pip/huggingface-hub-approx-eq-0.33.1
build(deps): update huggingface-hub requirement from ~=0.31.2 to ~=0.33.1
2025-07-03 11:38:30 -07:00
dependabot[bot] fe5c943b7d
build(deps): update huggingface-hub requirement
Updates the requirements on [huggingface-hub](https://github.com/huggingface/huggingface_hub) to permit the latest version.
- [Release notes](https://github.com/huggingface/huggingface_hub/releases)
- [Commits](https://github.com/huggingface/huggingface_hub/compare/v0.31.2...v0.33.1)

---
updated-dependencies:
- dependency-name: huggingface-hub
  dependency-version: 0.33.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-29 11:09:34 +00:00
leafspark 8681a36452
Merge pull request #132 from leafspark/dependabot/pip/pyside6-approx-eq-6.9.1
build(deps): update pyside6 requirement from ~=6.9.0 to ~=6.9.1
2025-06-27 10:48:20 -07:00
dependabot[bot] 9516762dae
build(deps): update pyside6 requirement from ~=6.9.0 to ~=6.9.1
Updates the requirements on [pyside6](https://pyside.org) to permit the latest version.

---
updated-dependencies:
- dependency-name: pyside6
  dependency-version: 6.9.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-08 11:09:56 +00:00
leafspark f7a83b3cd8
docs: changelog for v2.0.1 2025-05-28 16:17:16 -07:00
leafspark cd16f3eab6
ci: automate checksums and fix build errors 2025-05-24 21:41:33 -07:00
leafspark 8f3d93461a
ci: update upload-artifact to v4 2025-05-24 21:32:56 -07:00
leafspark 50792722e9
docs: update readme for v2.0.1
- added more information about patches and new updates
- edited quick start instructions
2025-05-24 21:27:02 -07:00
BuildTools 7c2a0b7ec1
feat(ui): update display of properties and add certifi
- updated project files
- added certifi to backend download and update checking
- add and fix type hints
- small file formatting changes
- update formatting of KV pairs to be cleaner
- update tensor data formatting and remove redundant KV pairs property
- add human readable mappings from KV pairs into model properties
- update CUDA backend check for latest llama.cpp format
- use urllib globally
2025-05-24 21:12:22 -07:00
BuildTools 1381665d00
Merge branch 'main' of https://github.com/leafspark/AutoGGUF 2025-05-15 19:02:44 -07:00
BuildTools 35ad690198
feat(core): update llama.cpp, improve backend UI, logging, and task handling
- update llama.cpp python to `bc098c3` (now adds support for Qwen3, Llama 4, etc.)
- update requirements and general maint
- UI fixes in AutoGGUF
- Updated backend selection box to sort by newest version
- Fixed log information box inserting newlines on open and autoscroll
- Modified task deletion behavior
- Fixed logging for cancellation/deletion
- Updated readme information
2025-05-15 19:01:51 -07:00
leafspark 0d97ea1d46
Merge pull request #116 from leafspark/dependabot/pip/torch-approx-eq-2.7.0
build(deps): update torch requirement from ~=2.5.1 to ~=2.7.0
2025-04-27 10:01:28 -06:00
dependabot[bot] 0625a0776e
build(deps): update torch requirement from ~=2.5.1 to ~=2.7.0
Updates the requirements on [torch](https://github.com/pytorch/pytorch) to permit the latest version.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.5.1...v2.7.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-27 11:04:50 +00:00
leafspark 6f74245f29
Merge pull request #114 from leafspark/dependabot/pip/uvicorn-approx-eq-0.34.2
build(deps): update uvicorn requirement from ~=0.34.0 to ~=0.34.2
2025-04-21 21:12:44 -06:00
dependabot[bot] a8ac35d6b7
build(deps): update uvicorn requirement from ~=0.34.0 to ~=0.34.2
Updates the requirements on [uvicorn](https://github.com/encode/uvicorn) to permit the latest version.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/docs/release-notes.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.34.0...0.34.2)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-version: 0.34.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-20 11:32:15 +00:00
BuildTools b4817eee06
refactor(ggml): update safetensor conversion scripts 2025-03-22 09:41:54 -07:00
leafspark c9c2b04534
Merge pull request #97 from leafspark/dependabot/pip/huggingface-hub-approx-eq-0.29.2
build(deps): update huggingface-hub requirement from ~=0.29.1 to ~=0.29.2
2025-03-17 15:35:47 -07:00
dependabot[bot] cc47e59f37
build(deps): update huggingface-hub requirement
Updates the requirements on [huggingface-hub](https://github.com/huggingface/huggingface_hub) to permit the latest version.
- [Release notes](https://github.com/huggingface/huggingface_hub/releases)
- [Commits](https://github.com/huggingface/huggingface_hub/compare/v0.29.1...v0.29.2)

---
updated-dependencies:
- dependency-name: huggingface-hub
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-09 11:52:53 +00:00
leafspark 59bc29b2ab
chore: update setup.py email and version 2025-03-04 20:34:10 -08:00
leafspark 14ceec61da
Merge pull request #91 from leafspark/dependabot/pip/huggingface-hub-approx-eq-0.29.1
build(deps): update huggingface-hub requirement from ~=0.27.0 to ~=0.29.1
2025-03-04 20:32:25 -08:00
dependabot[bot] 23ebe47d26
build(deps): update huggingface-hub requirement
Updates the requirements on [huggingface-hub](https://github.com/huggingface/huggingface_hub) to permit the latest version.
- [Release notes](https://github.com/huggingface/huggingface_hub/releases)
- [Commits](https://github.com/huggingface/huggingface_hub/compare/v0.27.0...v0.29.1)

---
updated-dependencies:
- dependency-name: huggingface-hub
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-23 11:48:40 +00:00
leafspark 29886faff6
Merge pull request #90 from leafspark/dependabot/pip/psutil-approx-eq-7.0.0
build(deps): update psutil requirement from ~=6.1.1 to ~=7.0.0
2025-02-19 20:03:14 -08:00
dependabot[bot] 4742e6b242
build(deps): update psutil requirement from ~=6.1.1 to ~=7.0.0
Updates the requirements on [psutil](https://github.com/giampaolo/psutil) to permit the latest version.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.1.1...release-7.0.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-16 11:20:40 +00:00
leafspark 3ebc0d63f4
ci: update artifact upload version 2025-02-10 17:36:09 -08:00
leafspark ab0035f2e9
Merge pull request #84 from leafspark/dependabot/pip/pyside6-approx-eq-6.8.2
build(deps): update pyside6 requirement from ~=6.8.1 to ~=6.8.2
2025-02-03 20:01:46 -08:00
dependabot[bot] a266dfba92
build(deps): update pyside6 requirement from ~=6.8.1 to ~=6.8.2
Updates the requirements on [pyside6](https://pyside.org) to permit the latest version.

---
updated-dependencies:
- dependency-name: pyside6
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-02 11:29:49 +00:00
leafspark 97d5050a8b
chore: updated changelog for v2 2025-01-27 19:04:15 -08:00
BuildTools 93daedc285
feat(backend): allow setting fetch repository
- add AUTOGGUF_BACKEND_REPO environment variable to set GitHub repo to fetch releases
- remove Import Model confirmation
- fix error when deleting models from list
- add localizations and update README with message
2025-01-27 15:32:07 -08:00
leafspark a0d00ab999
Merge pull request #76 from leafspark/dependabot/pip/safetensors-approx-eq-0.5.2
build(deps): update safetensors requirement from ~=0.5.0 to ~=0.5.2
2025-01-24 18:16:07 -08:00
leafspark 8c79a5d213
Merge pull request #78 from leafspark/dependabot/pip/transformers-approx-eq-4.48.0
build(deps): update transformers requirement from ~=4.47.1 to ~=4.48.0
2025-01-15 18:31:53 -08:00
dependabot[bot] 1955495899
build(deps): update transformers requirement from ~=4.47.1 to ~=4.48.0
Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.47.1...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-12 11:22:11 +00:00
dependabot[bot] 18dd8878a4
build(deps): update safetensors requirement from ~=0.5.0 to ~=0.5.2
Updates the requirements on [safetensors](https://github.com/huggingface/safetensors) to permit the latest version.
- [Release notes](https://github.com/huggingface/safetensors/releases)
- [Changelog](https://github.com/huggingface/safetensors/blob/main/RELEASE.md)
- [Commits](https://github.com/huggingface/safetensors/compare/v0.5.0...v0.5.2)

---
updated-dependencies:
- dependency-name: safetensors
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-08 23:13:16 +00:00
BuildTools 102e3a14fd
chore: update for new year and improve compliance
- updated copyright year in LICENSE file to 2025
- bundled llama.cpp licensing text in About menu to maintain MIT compliance
- updated llama.cpp and gguf Python library and scripts
- adjusted monitoring intervals from 0.2s to 0.5s
- updated Python requirements to latest compatible versions
- added new HF to GGUF conversion types: `tq1_0` and `tq2_0`

Happy New Year 🎉!
2025-01-08 15:11:47 -08:00
leafspark ddbf96c8e9
Merge pull request #71 from leafspark/dependabot/pip/uvicorn-approx-eq-0.34.0
build(deps): update uvicorn requirement from ~=0.33.0 to ~=0.34.0
2025-01-02 22:51:18 -08:00
dependabot[bot] 403546bfcf
build(deps): update uvicorn requirement from ~=0.33.0 to ~=0.34.0
Updates the requirements on [uvicorn](https://github.com/encode/uvicorn) to permit the latest version.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.33.0...0.34.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-18 05:07:18 +00:00
BuildTools 53482af554
feat(ui): add clipboard support to save/load preset
- add clipboard support to save/load preset with shift clicking
- update README.md for clarity
- fixes incorrect menu bar name for Load Preset
- update Czech translations
2024-12-17 21:05:58 -08:00
leafspark b49d4ca774
Merge pull request #66 from leafspark/dependabot/pip/pyside6-approx-eq-6.8.1
build(deps): update pyside6 requirement from ~=6.8.0.2 to ~=6.8.1
2024-12-12 14:45:17 -08:00
dependabot[bot] 62e5560650
build(deps): update pyside6 requirement from ~=6.8.0.2 to ~=6.8.1
Updates the requirements on [pyside6](https://pyside.org) to permit the latest version.

---
updated-dependencies:
- dependency-name: pyside6
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-08 11:45:09 +00:00
leafspark 3f5d9e6a1b
Merge pull request #60 from leafspark/dependabot/pip/uvicorn-approx-eq-0.32.1
build(deps): update uvicorn requirement from ~=0.32.0 to ~=0.32.1
2024-12-02 18:08:14 -08:00
dependabot[bot] 980a5b6656
build(deps): update uvicorn requirement from ~=0.32.0 to ~=0.32.1
Updates the requirements on [uvicorn](https://github.com/encode/uvicorn) to permit the latest version.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.32.0...0.32.1)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-24 11:38:16 +00:00
leafspark 2ac27d62f1
Merge pull request #57 from leafspark/dependabot/pip/setuptools-approx-eq-75.5.0
build(deps): update setuptools requirement from ~=75.1.0 to ~=75.5.0
2024-11-19 17:37:45 -08:00
dependabot[bot] 7dd39b208a
build(deps): update setuptools requirement from ~=75.1.0 to ~=75.5.0
Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v75.1.0...v75.5.0)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-17 11:12:42 +00:00
BuildTools 749f3215ec
feat(ui): support shift clicking to get quantization command
- support shift clicking Quantize Model button to get quantize command
- clean up imports in AutoGGUF.py and add localization keys
- use str() for getting log_dir_name
- remove legacy validate_quantization_inputs() function
- add return_command parameter to quantize_model() function
2024-11-12 19:41:59 -08:00
leafspark 6aaefb2ccb
Merge pull request #56 from leafspark/dependabot/pip/fastapi-approx-eq-0.115.5
build(deps): update fastapi requirement from ~=0.115.2 to ~=0.115.5
2024-11-12 19:38:20 -08:00
leafspark 9955640f03
Merge pull request #48 from leafspark/dependabot/pip/huggingface-hub-approx-eq-0.26.2
build(deps): update huggingface-hub requirement from ~=0.25.2 to ~=0.26.2
2024-11-12 19:37:58 -08:00
dependabot[bot] 50a36e5abe
build(deps): update fastapi requirement from ~=0.115.2 to ~=0.115.5
Updates the requirements on [fastapi](https://github.com/fastapi/fastapi) to permit the latest version.
- [Release notes](https://github.com/fastapi/fastapi/releases)
- [Commits](https://github.com/fastapi/fastapi/compare/0.115.2...0.115.5)

---
updated-dependencies:
- dependency-name: fastapi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-13 03:37:51 +00:00
dependabot[bot] da7d1152ea
build(deps): update huggingface-hub requirement
Updates the requirements on [huggingface-hub](https://github.com/huggingface/huggingface_hub) to permit the latest version.
- [Release notes](https://github.com/huggingface/huggingface_hub/releases)
- [Commits](https://github.com/huggingface/huggingface_hub/compare/v0.25.2...v0.26.2)

---
updated-dependencies:
- dependency-name: huggingface-hub
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-13 03:37:40 +00:00
leafspark 6b3d9ce9b1
Merge pull request #54 from leafspark/dependabot/pip/transformers-approx-eq-4.46.2
build(deps): update transformers requirement from ~=4.46.0 to ~=4.46.2
2024-11-12 19:36:36 -08:00
dependabot[bot] 6230450f6e
build(deps): update transformers requirement from ~=4.46.0 to ~=4.46.2
Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.46.0...v4.46.2)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-10 11:49:15 +00:00
leafspark b9aad59fa0
Merge pull request #53 from leafspark/dependabot/pip/torch-approx-eq-2.5.1
build(deps): update torch requirement from ~=2.5.0 to ~=2.5.1
2024-11-07 15:28:21 -08:00
dependabot[bot] 24e19dad9d
build(deps): update torch requirement from ~=2.5.0 to ~=2.5.1
Updates the requirements on [torch](https://github.com/pytorch/pytorch) to permit the latest version.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.5.0...v2.5.1)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-03 11:37:47 +00:00
leafspark 0855a88433
ci: remove broken x86 build matrix 2024-11-02 09:34:38 -07:00
leafspark 96c31b58c9
Merge pull request #46 from leafspark/dependabot/pip/transformers-approx-eq-4.46.0
build(deps): update transformers requirement from ~=4.45.1 to ~=4.46.0
2024-11-02 09:33:30 -07:00
leafspark c8d6cf0ea8
Merge pull request #39 from leafspark/dependabot/pip/psutil-approx-eq-6.1.0
build(deps): update psutil requirement from ~=6.0.0 to ~=6.1.0
2024-10-27 20:20:41 -07:00
leafspark 0d95af5f72
Merge pull request #42 from leafspark/dependabot/pip/torch-approx-eq-2.5.0
build(deps): update torch requirement from ~=2.4.1 to ~=2.5.0
2024-10-27 20:20:23 -07:00
leafspark 988b5b61c3
Merge pull request #45 from leafspark/dependabot/pip/pyside6-approx-eq-6.8.0.2
build(deps): update pyside6 requirement from ~=6.8.0.1 to ~=6.8.0.2
2024-10-27 20:20:10 -07:00
dependabot[bot] f66b7fb870
build(deps): update transformers requirement from ~=4.45.1 to ~=4.46.0
Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.45.1...v4.46.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-27 11:14:48 +00:00
dependabot[bot] 87ddc00452
build(deps): update pyside6 requirement from ~=6.8.0.1 to ~=6.8.0.2
Updates the requirements on [pyside6](https://pyside.org) to permit the latest version.

---
updated-dependencies:
- dependency-name: pyside6
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-27 11:14:44 +00:00
leafspark fe914f84c2
Merge pull request #40 from leafspark/dependabot/pip/uvicorn-approx-eq-0.32.0
build(deps): update uvicorn requirement from ~=0.31.1 to ~=0.32.0
2024-10-22 17:45:48 -07:00
dependabot[bot] 3b49ceedb1
build(deps): update torch requirement from ~=2.4.1 to ~=2.5.0
Updates the requirements on [torch](https://github.com/pytorch/pytorch) to permit the latest version.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.4.1...v2.5.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-20 11:21:22 +00:00
dependabot[bot] 4df2525e8a
build(deps): update uvicorn requirement from ~=0.31.1 to ~=0.32.0
Updates the requirements on [uvicorn](https://github.com/encode/uvicorn) to permit the latest version.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](https://github.com/encode/uvicorn/compare/0.31.1...0.32.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-20 11:21:12 +00:00
dependabot[bot] 5747807391
build(deps): update psutil requirement from ~=6.0.0 to ~=6.1.0
Updates the requirements on [psutil](https://github.com/giampaolo/psutil) to permit the latest version.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-6.0.0...release-6.1.0)

---
updated-dependencies:
- dependency-name: psutil
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-20 11:21:07 +00:00
leafspark 0c1df319cd
build(deps): update PySide6 to resolve segfault 2024-10-16 17:40:15 -07:00
34 changed files with 8146 additions and 1944 deletions

View File

@ -10,3 +10,4 @@ AUTOGGUF_SERVER=enabled
AUTOGGUF_SERVER_PORT=7001
AUTOGGUF_SERVER_API_KEY=
AUTOGGUF_LANGUAGE=en-US
AUTOGGUF_BACKEND_REPO=ggerganov/llama.cpp

View File

@ -18,10 +18,9 @@ jobs:
matrix:
os: [windows-latest, ubuntu-latest, macos-latest]
arch: [x64]
include:
- os: windows-latest
arch: x86
runs-on: ${{ matrix.os }}
outputs:
artifact-names: ${{ steps.set-outputs.outputs.artifact-names }}
steps:
- uses: actions/checkout@v4
@ -67,6 +66,7 @@ jobs:
Copy-Item -Path "src\convert_lora_to_gguf.py" -Destination "$distPath\src"
Copy-Item -Path "src\convert_lora_to_ggml.py" -Destination "$distPath\src"
Copy-Item -Path "src\quantize_to_fp8_dynamic.py" -Destination "$distPath\src"
Copy-Item -Path ".env.example" -Destination "$distPath\"
- name: Copy additional files (Linux/macOS)
if: matrix.os != 'windows-latest'
@ -78,46 +78,58 @@ jobs:
cp src/convert_lora_to_gguf.py $distPath/src/
cp src/convert_lora_to_ggml.py $distPath/src/
cp src/quantize_to_fp8_dynamic.py $distPath/src/
cp .env.example $distPath/
- name: Generate SHA256 (Windows)
if: matrix.os == 'windows-latest'
run: |
$distPath = if ("${{ github.event.inputs.build_type }}" -eq "RELEASE") { "build\release\dist" } else { "build\dev\dist" }
$archSuffix = if ("${{ matrix.arch }}" -eq "x86") { "-x86" } else { "-x64" }
$exeName = "AutoGGUF$archSuffix.exe"
$versionHash = "${{ github.sha }}".Substring(0, 7)
$hashFile = "AutoGGUF-${{ matrix.os }}-${{ matrix.arch }}-$versionHash.sha256"
$hash = (Get-FileHash "$distPath\$exeName" -Algorithm SHA256).Hash.ToLower()
"$hash *$exeName" | Out-File -FilePath "$distPath\$hashFile" -Encoding utf8
- name: Generate SHA256 (Linux)
if: matrix.os == 'ubuntu-latest'
run: |
distPath=$(if [ "${{ github.event.inputs.build_type }}" = "RELEASE" ]; then echo "build/release/dist"; else echo "build/dev/dist"; fi)
exeName="AutoGGUF-x64"
versionHash=$(echo ${{ github.sha }} | cut -c1-7)
hashFile="AutoGGUF-${{ matrix.os }}-x64-$versionHash.sha256"
cd $distPath && sha256sum $exeName > $hashFile
- name: Generate SHA256 (macOS)
if: matrix.os == 'macos-latest'
run: |
distPath=$(if [ "${{ github.event.inputs.build_type }}" = "RELEASE" ]; then echo "build/release/dist"; else echo "build/dev/dist"; fi)
exeName="AutoGGUF-x64"
versionHash=$(echo ${{ github.sha }} | cut -c1-7)
hashFile="AutoGGUF-${{ matrix.os }}-x64-$versionHash.sha256"
cd $distPath && shasum -a 256 $exeName > $hashFile
- name: Set outputs for artifact name
id: set-outputs
run: echo "artifact-name=AutoGGUF-${{ matrix.os }}-${{ matrix.arch }}-${{ github.event.inputs.build_type }}-${{ github.sha }}" >> $GITHUB_OUTPUT
- name: Upload Artifact
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: AutoGGUF-${{ matrix.os }}-${{ matrix.arch }}-${{ github.event.inputs.build_type }}-${{ github.sha }}
path: |
build/${{ github.event.inputs.build_type == 'RELEASE' && 'release' || 'dev' }}/dist
!build/${{ github.event.inputs.build_type == 'RELEASE' && 'release' || 'dev' }}/dist/AutoGGUF-*.sha256
path: build/${{ github.event.inputs.build_type == 'RELEASE' && 'release' || 'dev' }}/dist
- name: Upload SHA256
uses: actions/upload-artifact@v3
generate-checksums:
needs: build
runs-on: ubuntu-latest
steps:
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: ./artifacts
- name: Generate SHA256 checksums for all artifacts
run: |
cd artifacts
versionHash=$(echo ${{ github.sha }} | cut -c1-7)
echo "# AutoGGUF Build Checksums" > ../checksums.txt
echo "Build: ${{ github.event.inputs.build_type }}" >> ../checksums.txt
echo "Commit: ${{ github.sha }}" >> ../checksums.txt
echo "Date: $(date -u)" >> ../checksums.txt
echo "" >> ../checksums.txt
# Find all artifact directories and generate checksums of their zip equivalents
for artifact_dir in AutoGGUF-*-${{ github.event.inputs.build_type }}-${{ github.sha }}; do
if [ -d "$artifact_dir" ]; then
echo "Processing $artifact_dir..."
cd "$artifact_dir"
# Create a temporary zip to calculate hash (simulating what GitHub creates)
zip -r "../temp_${artifact_dir}.zip" .
cd ..
# Generate SHA256 of the zip file
hash=$(sha256sum "temp_${artifact_dir}.zip" | cut -d' ' -f1)
echo "${hash} ${artifact_dir}.zip" >> ../checksums.txt
# Clean up the temporary zip
rm "temp_${artifact_dir}.zip"
fi
done
- name: Upload checksums
uses: actions/upload-artifact@v4
with:
name: AutoGGUF-${{ github.sha }}-SHA256
path: build/${{ github.event.inputs.build_type == 'RELEASE' && 'release' || 'dev' }}/dist/AutoGGUF-*.sha256
path: checksums.txt

View File

@ -52,7 +52,7 @@ jobs:
cat requirements.txt >> detailed_report.txt
- name: Upload audit results
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: pip-audit-report
path: detailed_report.txt

View File

@ -1,5 +1,50 @@
# Changelog
## [v2.0.1] - 2025-05-24
### Added
- Human readable mappings from KV pairs into model properties
- certifi library for backend download and update checking
- Automated checksums in CI process
### Changed
- Updated llama.cpp backend
- Improved backend UI, logging, and task handling
- Enhanced display of model properties and cleaner formatting of KV pairs
- Updated tensor data formatting and removed redundant KV pairs property
- Updated CUDA backend check for latest llama.cpp release format
- Global urllib usage implementation
- Updated README with more information about patches and updates
- Edited quick start instructions
- Small file formatting improvements
### Fixed
- Type hints corrections
- Build errors in CI
- `@upload-artifact` updated to v4
## [v2.0.0] - 2025-01-27
### Added
- Clipboard support for save/load preset functionality with shift-click option
- Support for shift-clicking to get quantization command
- AUTOGGUF_BACKEND_REPO environment variable for custom GitHub repository fetching
- New HF to GGUF conversion types: `tq1_0` and `tq2_0`
### Changed
- Updated multiple dependencies:
- PySide6, PyTorch, Transformers, FastAPI, uvicorn, and other core libraries to their latest compatible versions
- Adjusted monitoring intervals from 0.2s to 0.5s
- Updated copyright year to 2025
- Bundled llama.cpp licensing text in About menu
- Removed x86 build matrix from CI
- Removed Import Model confirmation dialog
### Fixed
- Resolved PySide6 segfault issue
- Fixed error when deleting models from list
- Corrected incorrect menu bar name for Load Preset
## [v1.9.1] - 2024-10-13
### Added
@ -215,7 +260,7 @@ ### Notes
- Fast build: Higher unzipped size (97MB), smaller download (38MB)
- Standard build: Created with PyInstaller, medium download and unzipped size (50MB), potentially slower
## [1.6.0] - 2024-08-08
## [v1.6.0] - 2024-08-08
### Changed
- Resolve licensing issues by using PySide6
@ -223,7 +268,7 @@ ### Changed
### Added
- Add GPU monitoring support for NVIDIA GPUs
## [1.5.1] - 2024-08-08
## [v1.5.1] - 2024-08-08
### Changed
- Refactor localizations to use them in HF conversion area
@ -235,7 +280,7 @@ ### Removed
### Added
- Support loading *.gguf file types
## [1.5.0] - 2024-08-06
## [v1.5.0] - 2024-08-06
### Changed
- Refactor localizations to use them in HF conversion area
@ -248,7 +293,7 @@ ### Added
### Fixed
- Fix scaling on low resolution screens, interface now scrolls
## [1.4.3] - 2024-08-05
## [v1.4.3] - 2024-08-05
### Changed
- Updated src file in release to be Black formatted
@ -261,7 +306,7 @@ ### Added
- Added model sharding management support
- Allow multiple quantization types to be selected and started simultaneously
## [1.4.2] - 2024-08-04
## [v1.4.2] - 2024-08-04
### Fixed
- Resolves bug where Base Model text was shown even when GGML type was selected
@ -270,13 +315,13 @@ ### Fixed
### Changed
- Minor repository changes
## [1.4.1] - 2024-08-04
## [v1.4.1] - 2024-08-04
### Added
- Dynamic KV Overrides (see wiki: AutoGGUF/wiki/Dynamic-KV-Overrides)
- Quantization commands are now printed and logged
## [1.4.0] - 2024-08-04
## [v1.4.0] - 2024-08-04
### Added
- LoRA Conversion:
@ -300,7 +345,7 @@ ### Added
- Currently includes src folder with conversion tools
- No console window popup
## [1.3.1] - 2024-08-04
## [v1.3.1] - 2024-08-04
### Added
- AUTOGGUF_CHECK_BACKEND environment variable to disable backend check on start
@ -308,7 +353,7 @@ ### Added
### Changed
- --onefile build with PyInstaller, _internal directory is no longer required
## [1.3.0] - 2024-08-03
## [v1.3.0] - 2024-08-03
### Added
- Support for new llama-imatrix parameters:
@ -330,7 +375,7 @@ ### Fixed
### Removed
- Duplicated functions
## [1.2.1] - 2024-08-03
## [v1.2.1] - 2024-08-03
### Added
- Refresh Models button
@ -339,13 +384,13 @@ ### Added
### Fixed
- iostream llama.cpp issue, quantized_models directory created on launch
## [1.2.0] - 2024-08-03
## [v1.2.0] - 2024-08-03
### Added
- More robust logging (find logs at latest_<timestamp>.log in logs folder)
- Localizations with support for 28 languages (machine translated using Gemini Experimental 0801)
## [1.1.0] - 2024-08-03
## [v1.1.0] - 2024-08-03
### Added
- Dynamic KV override functionality
@ -368,7 +413,7 @@ ### Added
### Fixed
- Issue where quantization errored with "AutoGGUF does not have x attribute"
## [1.0.0] - 2024-08-02
## [v1.0.0] - 2024-08-02
### Added
- Initial release

View File

@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2024 leafspark
Copyright (c) 2024-2025 leafspark
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@ -9,44 +9,55 @@ # AutoGGUF - automated GGUF model quantizer
<!-- Project Info -->
[![Powered by llama.cpp](https://img.shields.io/badge/Powered%20by-llama.cpp-green.svg)](https://github.com/ggerganov/llama.cpp)
![GitHub top language](https://img.shields.io/github/languages/top/leafspark/AutoGGUF.svg)
[![Platform Compatibility](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-blue)]()
[![GitHub license](https://img.shields.io/github/license/leafspark/AutoGGUF.svg)](https://github.com/leafspark/AutoGGUF/blob/main/LICENSE)
![GitHub top language](https://img.shields.io/github/languages/top/leafspark/AutoGGUF.svg)
<!-- Repository Stats -->
![GitHub stars](https://img.shields.io/github/stars/leafspark/AutoGGUF.svg)
![GitHub forks](https://img.shields.io/github/forks/leafspark/AutoGGUF.svg)
![GitHub release (latest by date)](https://img.shields.io/github/downloads/leafspark/AutoGGUF/latest/total?color=green)
![GitHub repo size](https://img.shields.io/github/repo-size/leafspark/AutoGGUF.svg)
![Lines of Code](https://tokei.rs/b1/github/leafspark/AutoGGUF?category=code)
<!-- ![Lines of Code](https://ghloc.vercel.app/leafspark/AutoGGUF?filter=.bat$,.py$,.sh$,.bat$) -->
<!-- Contribution -->
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Issues](https://img.shields.io/github/issues/leafspark/AutoGGUF)](https://github.com/leafspark/AutoGGUF/issues)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/leafspark/AutoGGUF/pulls)
AutoGGUF provides a graphical user interface for quantizing GGUF models using the llama.cpp library. It allows users to download different versions of llama.cpp, manage multiple backends, and perform quantization tasks with various options.
The most comprehensive GUI tool for GGUF model quantization. Stop wrestling with command lines - quantize, merge, and optimize your models with just a few clicks.
## Features
- Download and manage llama.cpp backends
- Select and quantize GGUF models
- Configure quantization parameters
- Monitor system resources during quantization
- Parallel quantization + imatrix generation
- LoRA conversion and merging
- Preset saving and loading
- AutoFP8 quantization
- GGUF splitting and merging
- 📩 Update and manage llama.cpp backends
- 🗃️ Download and quantize GGUF/safetensors models
- 📐 Configure quantization parameters
- 💻 Monitor system resources in real time during quantization
- ⏳ Parallel quantization + imatrix generation
- 🎉 LoRA conversion and merging
- 📁 Preset saving and loading
- 8⃣ AutoFP8 quantization
- 🪓 GGUF splitting and merging
- 🌐 HTTP API for automation and monitoring
## Usage
## Why AutoGGUF?
- Fast: Saves time on manual configuration
- Simple: Clean UI, no terminal needed
- Powerful: Handles models up to infinite size, only limited by your RAM
- Resource-aware: Optimized memory management and efficient UI library
### Cross-platform
1. Install dependencies:
![AutoGGUF-v1 8 1-showcase-blue](https://github.com/user-attachments/assets/b136ccc3-5983-4266-9e66-00cebf3ca590)
## Quick Start
### Cross-platform (recommended)
1. `git clone https://github.com/leafspark/AutoGGUF`
2. `cd AutoGGUF`
3. Install dependencies:
```
pip install -r requirements.txt
```
2. Run the application:
4. Run the application:
```
python src/main.py
```
@ -54,7 +65,7 @@ ### Cross-platform
macOS and Ubuntu builds are provided with GitHub Actions, you may download the binaries in the releases section.
### Windows
### Windows (for the impatient)
Standard builds:
1. Download the latest release
2. Extract all files to a folder
@ -62,13 +73,13 @@ ### Windows
4. Any necessary folders will be automatically created
Setup builds:
1. Download setup variant of latest release
1. Download the setup variant of latest release
2. Extract all files to a folder
3. Run the setup program
4. The .GGUF extension will be registered with the program automatically
4. The .gguf extension will be registered with the program automatically
5. Run the program from the Start Menu or desktop shortcuts
After launching the program, you may access its local server at port 7001 (set `AUTOGGUF_SERVER` to "enabled" first)
After launching the program, you may access its local server at port 7001 (set `AUTOGGUF_SERVER` to "enabled" first).
### Verifying Releases
@ -110,47 +121,50 @@ ### Windows
pip install -U pyinstaller
build RELEASE | DEV
```
Find the executable in `build/<type>/dist/AutoGGUF.exe`.
Find the executable in `build/<type>/dist/AutoGGUF-x64.exe`.
You can also use Nuitka, which may result in a slower build but a faster output executable:
```bash
build_optimized RELEASE | DEV
```
## Dependencies
Find them in `requirements.txt`.
## Localizations
View the list of supported languages at [AutoGGUF/wiki/Installation#configuration](https://github.com/leafspark/AutoGGUF/wiki/Installation#configuration) (LLM translated, except for English).
To use a specific language, set the `AUTOGGUF_LANGUAGE` environment variable to one of the listed language codes (note: some languages may not be fully supported yet, those will fall back to English).
Languages will be updated as soon as possible after an update, or as a part of the update.
To use a specific language, set the `AUTOGGUF_LANGUAGE` environment variable to one of the listed language codes (note: some languages may not be fully supported yet, in which the UI elements will fall back to English).
## Issues
- Some inconsistent logging
- Some inconsistent logging and signal handling
- Missing or duplicated translations (priority)
- Buggy/incomplete API interfaces
- Code review and formatting (priority)
## Planned Features
- Time estimation for quantization
- Quantization file size estimate
- Perplexity testing
- HuggingFace upload/download (coming in the next release)
- bitsandbytes (coming soon)
- [ ] Time estimation for quantization
- [ ] Quantization file size estimate
- [ ] Perplexity testing
- [ ] bitsandbytes support
## Troubleshooting
#### Project Status
AutoGGUF has now entered maintenance mode. It's considered stable and feature-complete for most use cases, so I'm not actively developing new features, but Ill continue to publish occasional builds, update dependencies regularly, and fix critical bugs as needed. If you encounter issues or have suggestions, feel free to open an issue.
## Support
- SSL module cannot be found error: Install OpenSSL or run from source using `python src/main.py` with the `run.bat` script (`pip install requests`)
- Check out the [Wiki](https://github.com/leafspark/AutoGGUF/wiki) for advanced usage and configuration
## Contributing
Fork the repo, make your changes, and ensure you have the latest commits when merging. Include a changelog of new features in your pull request description. Read `CONTRIBUTING.md` for more information.
## User Interface
![AutoGGUF-v1 8 1-showcase-blue](https://github.com/user-attachments/assets/b136ccc3-5983-4266-9e66-00cebf3ca590)
## Stargazers
[![Star History Chart](https://api.star-history.com/svg?repos=leafspark/AutoGGUF&type=Date)](https://star-history.com/#leafspark/AutoGGUF&Date)
`Last Updated: May 24, 2025`

View File

@ -4,10 +4,10 @@ ## Supported Versions
| Version | Supported |
|-----------------|--------------------|
| stable (v1.9.x) | :white_check_mark: |
| stable (v2.0.x) | :white_check_mark: |
Beta versions are not officially supported and may contain unknown security vulnerabilities. Use them at your own risk.
## Reporting a Vulnerability
Use the Issues tab, or for severe vulnerabilities please contact the maintainers via email.
Use the Issues tab, or for severe vulnerabilities, please contact the maintainers via email.

View File

@ -1,13 +1,14 @@
PyYAML~=6.0.2
psutil~=6.0.0
pynvml~=11.5.3
PySide6~=6.8.0
safetensors~=0.4.5
psutil~=7.0.0
pynvml~=12.0.0
PySide6~=6.9.1
safetensors~=0.5.3
numpy<2.0.0
torch~=2.4.1
torch~=2.7.0
sentencepiece~=0.2.0
setuptools~=75.1.0
huggingface-hub~=0.25.2
transformers~=4.45.1
fastapi~=0.115.2
uvicorn~=0.31.1
setuptools~=80.7.1
huggingface-hub~=0.33.1
transformers~=4.51.3
fastapi~=0.115.12
uvicorn~=0.34.2
certifi~=2025.4.26

View File

@ -5,12 +5,12 @@
setup(
name="AutoGGUF",
version="v1.9.0",
version="v2.0.1",
packages=[""],
url="https://github.com/leafspark/AutoGGUF",
license="apache-2.0",
author="leafspark",
author_email="",
author_email="leafspark@proton.me",
description="automatically quant GGUF models",
install_requires=required,
entry_points={"console_scripts": ["autogguf-gui = main:main"]},

View File

@ -1,8 +1,9 @@
import json
import os
import shutil
import urllib.error
import urllib.request
import certifi
import ssl
from datetime import datetime
from functools import partial, wraps
from typing import List
@ -71,7 +72,7 @@ def __init__(self, args: List[str]) -> None:
self.parse_resolution = ui_update.parse_resolution.__get__(self)
self.log_dir_name = os.environ.get("AUTOGGUF_LOG_DIR_NAME", "logs")
self.log_dir_name = str(os.environ.get("AUTOGGUF_LOG_DIR_NAME", "logs"))
width, height = self.parse_resolution()
self.logger = Logger("AutoGGUF", self.log_dir_name)
@ -194,7 +195,7 @@ def __init__(self, args: List[str]) -> None:
save_preset_action = QAction(f"&{SAVE_PRESET}", self)
save_preset_action.setShortcut(QKeySequence("Ctrl+S"))
save_preset_action.triggered.connect(self.save_preset)
load_preset_action = QAction(f"&{SAVE_PRESET}", self)
load_preset_action = QAction(f"&{LOAD_PRESET}", self)
load_preset_action.setShortcut(QKeySequence("Ctrl+S"))
load_preset_action.triggered.connect(self.load_preset)
file_menu.addAction(close_action)
@ -339,15 +340,15 @@ def __init__(self, args: List[str]) -> None:
output_layout.addWidget(output_button)
self.merge_gguf_layout.addLayout(output_layout)
# Split button
split_button = QPushButton(MERGE_GGUF)
split_button.clicked.connect(
# Merge button
merge_button = QPushButton(MERGE_GGUF)
merge_button.clicked.connect(
lambda: self.merge_gguf(
self.merge_gguf_input.text(),
self.merge_gguf_output.text(),
)
)
self.merge_gguf_layout.addWidget(split_button)
self.merge_gguf_layout.addWidget(merge_button)
self.merge_gguf_dialog.setLayout(self.merge_gguf_layout)
# HF Upload Window
@ -500,7 +501,7 @@ def __init__(self, args: List[str]) -> None:
# Timer for updating system info
self.timer = QTimer()
self.timer.timeout.connect(self.update_system_info)
self.timer.start(200)
self.timer.start(500)
# Backend selection
backend_layout = QHBoxLayout()
@ -763,7 +764,7 @@ def __init__(self, args: List[str]) -> None:
self.extra_arguments = QLineEdit()
quant_options_layout.addRow(
self.create_label(EXTRA_ARGUMENTS, EXTRA_COMMAND_ARGUMENTS),
self.create_label(EXTRA_ARGUMENTS, EXTRA_ARGUMENTS_LABEL),
self.extra_arguments,
)
@ -775,7 +776,7 @@ def __init__(self, args: List[str]) -> None:
# Quantize button layout
quantize_layout = QHBoxLayout()
quantize_button = QPushButton(QUANTIZE_MODEL)
quantize_button.clicked.connect(self.quantize_model)
quantize_button.clicked[bool].connect(self.quantize_model_handler)
save_preset_button = QPushButton(SAVE_PRESET)
save_preset_button.clicked.connect(self.save_preset)
load_preset_button = QPushButton(LOAD_PRESET)
@ -1023,7 +1024,9 @@ def __init__(self, args: List[str]) -> None:
hf_to_gguf_layout.addRow(OUTPUT_FILE, hf_outfile_layout)
self.hf_outtype = QComboBox()
self.hf_outtype.addItems(["f32", "f16", "bf16", "q8_0", "auto"])
self.hf_outtype.addItems(
["f32", "f16", "bf16", "q8_0", "tq1_0", "tq2_0", "auto"]
)
hf_to_gguf_layout.addRow(OUTPUT_TYPE, self.hf_outtype)
self.hf_vocab_only = QCheckBox(VOCAB_ONLY)
@ -1101,6 +1104,20 @@ def __init__(self, args: List[str]) -> None:
self.logger.info(AUTOGGUF_INITIALIZATION_COMPLETE)
self.logger.info(STARTUP_ELASPED_TIME.format(init_timer.elapsed()))
def quantize_model_handler(self) -> None:
if QApplication.keyboardModifiers() == Qt.ShiftModifier and self.quantize_model(
return_command=True
):
QApplication.clipboard().setText(self.quantize_model(return_command=True))
QMessageBox.information(
None,
INFO,
f"{COPIED_COMMAND_TO_CLIPBOARD} "
+ f"<code style='font-family: monospace; white-space: pre;'>{self.quantize_model(return_command=True)}</code>",
)
else:
self.quantize_model()
def resizeEvent(self, event) -> None:
super().resizeEvent(event)
path = QPainterPath()
@ -1113,7 +1130,7 @@ def delete_model(self, item):
reply = QMessageBox.question(
self,
CONFIRM_DELETE,
DELETE_WARNING.format(model_name),
DELETE_MODEL_WARNING.format(model_name),
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No,
QMessageBox.StandardButton.No,
)
@ -1126,14 +1143,17 @@ def delete_model(self, item):
)
self.logger.info(MODEL_DELETED_SUCCESSFULLY.format(model_name))
except Exception as e:
show_error(self.logger, f"Error deleting model: {e}")
show_error(self.logger, ERROR_DELETING_MODEL.format(e))
def check_for_updates(self) -> None:
try:
url = "https://api.github.com/repos/leafspark/AutoGGUF/releases/latest"
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
# Create SSL context with certifi certificates
ssl_context = ssl.create_default_context(cafile=certifi.where())
with urllib.request.urlopen(req, context=ssl_context) as response:
if response.status != 200:
raise urllib.error.HTTPError(
url, response.status, "HTTP Error", response.headers, None
@ -1186,15 +1206,25 @@ def refresh_backends(self) -> None:
and "cudart-llama" not in item.lower()
]
def extract_b_val(name: str) -> int:
match = re.search(r"b(\d+)", name)
return int(match.group(1)) if match else -1
if valid_backends:
# Sort by newest version
valid_backends.sort(key=lambda x: extract_b_val(x[0]), reverse=True)
for name, path in valid_backends:
self.backend_combo.addItem(name, userData=path)
self.backend_combo.setEnabled(
True
) # Enable the combo box if there are valid backends
self.backend_combo.setEnabled(True)
# Selects the newest version (now at index 0)
self.backend_combo.setCurrentIndex(0)
else:
self.backend_combo.addItem(NO_BACKENDS_AVAILABLE)
self.backend_combo.setEnabled(False)
self.logger.info(FOUND_VALID_BACKENDS.format(len(valid_backends)))
def save_task_preset(self, task_item) -> None:
@ -1236,13 +1266,13 @@ def download_finished(self, extract_dir) -> None:
)
else:
QMessageBox.warning(
self, CUDA_EXTRACTION_FAILED, NO_SUITABLE_CUDA_BACKEND_FOUND
self, CUDA_EXTRACTION_FAILED, NO_SUITABLE_CUDA_BACKEND_EXTRACTION
)
else:
QMessageBox.information(
self,
DOWNLOAD_COMPLETE,
LLAMACPP_BINARY_DOWNLOADED_AND_EXTRACTED.format(extract_dir),
LLAMACPP_DOWNLOADED_AND_EXTRACTED.format(extract_dir),
)
self.refresh_backends() # Refresh the backends after successful download
@ -1254,23 +1284,6 @@ def download_finished(self, extract_dir) -> None:
if index >= 0:
self.backend_combo.setCurrentIndex(index)
def validate_quantization_inputs(self) -> None:
self.logger.debug(VALIDATING_QUANTIZATION_INPUTS)
errors = []
if not self.backend_combo.currentData():
errors.append(NO_BACKEND_SELECTED)
if not self.models_input.text():
errors.append(MODELS_PATH_REQUIRED)
if not self.output_input.text():
errors.append(OUTPUT_PATH_REQUIRED)
if not self.logs_input.text():
errors.append(LOGS_PATH_REQUIRED)
if not self.model_tree.currentItem():
errors.append(NO_MODEL_SELECTED)
if errors:
raise ValueError("\n".join(errors))
def load_models(self) -> None:
self.logger.info(LOADING_MODELS)
models_dir = self.models_input.text()
@ -1698,10 +1711,9 @@ def merge_gguf(self, model_dir: str, output_dir: str) -> None:
show_error(self.logger, "Error starting merge GGUF task: {}".format(e))
self.logger.info("Split GGUF task finished.")
def quantize_model(self) -> None:
def quantize_model(self, return_command=False) -> str:
self.logger.info(STARTING_MODEL_QUANTIZATION)
try:
self.validate_quantization_inputs()
selected_item = self.model_tree.currentItem()
if not selected_item:
raise ValueError(NO_MODEL_SELECTED)
@ -1822,6 +1834,12 @@ def quantize_model(self) -> None:
if self.extra_arguments.text():
command.extend(self.extra_arguments.text().split())
if return_command:
self.logger.info(
f"{QUANTIZATION_COMMAND}: {str(' '.join(command))}"
)
return str(" ".join(command))
logs_path = self.logs_input.text()
ensure_directory(logs_path)
@ -1902,12 +1920,25 @@ def show_task_details(self, item) -> None:
# Load existing content
if os.path.exists(task_item.log_file):
with open_file_safe(task_item.log_file, "r") as f:
log_text.setPlainText(f.read())
content = f.read().rstrip("\n") # Remove trailing newlines
log_text.setPlainText(content)
# Scroll to the end
log_text.moveCursor(QTextCursor.End)
# Connect to the thread if it's still running
for thread in self.quant_threads:
if thread.log_file == task_item.log_file:
thread.output_signal.connect(log_text.appendPlainText)
# Create a local slot function that updates the text
def update_log(text):
log_text.appendPlainText(text)
log_text.moveCursor(QTextCursor.End)
thread.output_signal.connect(update_log)
# Disconnect the signal when the dialog is destroyed
log_dialog.destroyed.connect(
lambda: thread.output_signal.disconnect(update_log)
)
break
log_dialog.exec()
@ -1925,17 +1956,9 @@ def import_model(self) -> None:
show_error(self.logger, INVALID_GGUF_FILE.format(file_name))
return
reply = QMessageBox.question(
self,
CONFIRM_IMPORT,
IMPORT_MODEL_CONFIRMATION.format(file_name),
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No,
QMessageBox.StandardButton.No,
)
if reply == QMessageBox.StandardButton.Yes:
self.imported_models.append(file_path)
self.load_models()
self.logger.info(MODEL_IMPORTED_SUCCESSFULLY.format(file_name))
self.imported_models.append(file_path)
self.load_models()
self.logger.info(MODEL_IMPORTED_SUCCESSFULLY.format(file_name))
@validate_input(
"imatrix_model", "imatrix_datafile", "imatrix_model", "imatrix_output"

View File

@ -98,7 +98,7 @@ def mouseMoveEvent(self, event) -> None:
def mouseReleaseEvent(self, event) -> None:
self.pressing = False
def toggle_maximize(self):
def toggle_maximize(self) -> None:
if self.isMaximized:
self.parent.showNormal()
if self.normal_size:

View File

@ -2,6 +2,8 @@
import urllib.request
import urllib.error
import zipfile
import ssl
import certifi
from PySide6.QtCore import QThread, Signal
@ -19,7 +21,10 @@ def run(self) -> None:
try:
req = urllib.request.Request(self.url)
with urllib.request.urlopen(req) as response:
# Create SSL context with certifi certificates
ssl_context = ssl.create_default_context(cafile=certifi.where())
with urllib.request.urlopen(req, context=ssl_context) as response:
if response.status != 200:
raise urllib.error.HTTPError(
self.url, response.status, "HTTP Error", response.headers, None

View File

@ -95,7 +95,7 @@ def __init__(self, parent=None) -> None:
self.timer = QTimer(self)
self.timer.timeout.connect(self.update_gpu_info)
self.timer.start(200) # Update every 0.2 seconds
self.timer.start(500) # Update every 0.5 seconds
self.gpu_data = []
self.vram_data = []
@ -192,7 +192,7 @@ def update_graph_data() -> None:
timer = QTimer(dialog)
timer.timeout.connect(update_graph_data)
timer.start(200) # Update every 0.2 seconds
timer.start(500) # Update every 0.5 seconds
dialog.exec()
@ -227,7 +227,7 @@ def update_graph_data() -> None:
timer = QTimer(dialog)
timer.timeout.connect(update_graph_data)
timer.start(200) # Update every 0.2 seconds
timer.start(500) # Update every 0.5 seconds
tab_widget.addTab(gpu_graph, GPU_USAGE_OVER_TIME)
tab_widget.addTab(vram_graph, VRAM_USAGE_OVER_TIME)

View File

@ -22,6 +22,7 @@ def __init__(self, parent=None) -> None:
self.key_input = QLineEdit()
self.key_input.setPlaceholderText("Key")
# Set validator for key input (letters and dots only)
key_validator = QRegularExpressionValidator(QRegularExpression(r"[A-Za-z.]+"))
self.key_input.setValidator(key_validator)

View File

@ -1,7 +1,7 @@
import os
import re
AUTOGGUF_VERSION = "v2.0.0"
AUTOGGUF_VERSION = "v2.0.1"
class _Localization:
@ -53,13 +53,11 @@ def __init__(self):
self.QUANTIZE_TO_FP8_DYNAMIC = "Quantize to FP8 Dynamic"
self.OPEN_MODEL_FOLDER = "Open Model Folder"
self.QUANTIZE = "Quantize"
self.OPEN_MODEL_FOLDER = "Open Model Folder"
self.INPUT_MODEL = "Input Model:"
# GGUF Verification
self.INVALID_GGUF_FILE = "Invalid GGUF file: {}"
self.SHARDED_MODEL_NAME = "{} (Sharded)"
self.IMPORTED_MODEL_TOOLTIP = "Imported model: {}"
self.CONCATENATED_FILE_WARNING = "This is a concatenated file part. It will not work with llama-quantize; please concat the file first."
self.CONCATENATED_FILES_FOUND = (
"Found {} concatenated file parts. Please concat the files first."
@ -250,12 +248,6 @@ def __init__(self):
self.LLAMACPP_DOWNLOADED_AND_EXTRACTED = (
"llama.cpp binary downloaded and extracted to {0}"
)
self.NO_SUITABLE_CUDA_BACKEND_FOUND = (
"No suitable CUDA backend found for extraction"
)
self.LLAMACPP_BINARY_DOWNLOADED_AND_EXTRACTED = (
"llama.cpp binary downloaded and extracted to {0}"
)
self.REFRESHING_LLAMACPP_RELEASES = "Refreshing llama.cpp releases"
self.UPDATING_ASSET_LIST = "Updating asset list"
self.UPDATING_CUDA_OPTIONS = "Updating CUDA options"
@ -373,7 +365,7 @@ def __init__(self):
self.ADDING_LORA_ADAPTER = "Adding LoRA Adapter..."
self.DELETING_LORA_ADAPTER = "Deleting LoRA Adapter..."
self.SELECT_LORA_ADAPTER_FILE = "Select LoRA Adapter File"
self.STARTING_LORA_EXPORT = "Starting LoRA export..."
self.STARTING_LORA_EXPORT = "Starting LoRA export"
self.SELECT_OUTPUT_TYPE = "Select Output Type (GGUF or GGML)"
self.BASE_MODEL = "Base Model"
self.SELECT_BASE_MODEL_FILE = "Select Base Model File (GGUF)"
@ -415,6 +407,7 @@ def __init__(self):
# Model actions
self.CONFIRM_DELETE = "Confirm Delete"
self.DELETE_MODEL_WARNING = "Are you sure you want to delete the model: {}?"
self.ERROR_DELETING_MODEL = "Error deleting model: {}"
self.MODEL_RENAMED_SUCCESSFULLY = "Model renamed successfully."
self.MODEL_DELETED_SUCCESSFULLY = "Model deleted successfully."
@ -451,8 +444,15 @@ def __init__(self):
self.HF_REPOSITORY_TYPE = "Repository Type"
self.UPLOAD_TYPE = "Upload Type"
self.UPLOAD = "Upload"
self.INFO = "Info"
self.EXTRA_COMMAND_ARGUMENTS = "Additional command-line arguments"
self.COPIED_COMMAND_TO_CLIPBOARD = "Copied command to clipboard:"
# Repository
self.INVALID_REPOSITORY_FORMAT = (
"Invalid repository format. Must be 'owner/repo'"
)
self.REPO_CANNOT_BE_EMPTY = "Owner and repository name cannot be empty"
class _French(_Localization):
@ -5797,187 +5797,339 @@ def __init__(self):
class _Czech(_Localization):
def __init__(self):
super().__init__()
self.WINDOW_TITLE = "AutoGGUF (Automatický kvantizátor modelů GGUF)"
# General UI
self.WINDOW_TITLE = (
"AutoGGUF (automatizovaný nástroj pro kvantizaci GGUF modelů)"
)
self.RAM_USAGE = "Využití RAM:"
self.CPU_USAGE = "Využití CPU:"
self.BACKEND = "Backend Llama.cpp:"
self.REFRESH_BACKENDS = "Obnovit backendy"
self.MODELS_PATH = "Cesta k modelům:"
self.OUTPUT_PATH = "Výstupní cesta:"
self.OUTPUT_PATH = "Cesta pro výstup:"
self.LOGS_PATH = "Cesta k logům:"
self.BROWSE = "Procházet"
self.AVAILABLE_MODELS = "Dostupné modely:"
self.REFRESH_MODELS = "Obnovit modely"
self.STARTUP_ELASPED_TIME = "Inicializace trvala {0} ms"
# Usage Graphs
self.CPU_USAGE_OVER_TIME = "Využití CPU v čase"
self.RAM_USAGE_OVER_TIME = "Využití RAM v čase"
# Environment variables
self.DOTENV_FILE_NOT_FOUND = "Soubor .env nenalezen."
self.COULD_NOT_PARSE_LINE = "Nelze zpracovat řádek: {0}"
self.ERROR_LOADING_DOTENV = "Chyba při načítání .env: {0}"
# Model Import
self.IMPORT_MODEL = "Importovat model"
self.SELECT_MODEL_TO_IMPORT = "Vyberte model k importu"
self.CONFIRM_IMPORT = "Potvrdit import"
self.IMPORT_MODEL_CONFIRMATION = "Chcete importovat model {}?"
self.MODEL_IMPORTED_SUCCESSFULLY = "Model {} byl úspěšně importován"
self.IMPORTING_MODEL = "Importuji model"
self.IMPORTED_MODEL_TOOLTIP = "Importovaný model: {}"
# AutoFP8 Quantization
self.AUTOFP8_QUANTIZATION_TASK_STARTED = (
"Úloha kvantizace AutoFP8 byla spuštěna"
)
self.ERROR_STARTING_AUTOFP8_QUANTIZATION = (
"Chyba při spuštění kvantizace AutoFP8"
)
self.QUANTIZING_WITH_AUTOFP8 = "Kvantizuji {0} pomocí AutoFP8"
self.QUANTIZING_TO_WITH_AUTOFP8 = "Kvantizuji {0} na {1}"
self.QUANTIZE_TO_FP8_DYNAMIC = "Kvantizovat na FP8 Dynamic"
self.OPEN_MODEL_FOLDER = "Otevřít složku modelu"
self.QUANTIZE = "Kvantizovat"
self.OPEN_MODEL_FOLDER = "Otevřít složku modelu"
self.INPUT_MODEL = "Vstupní model:"
# GGUF Verification
self.INVALID_GGUF_FILE = "Neplatný GGUF soubor: {}"
self.SHARDED_MODEL_NAME = "{} (Rozdělený)"
self.IMPORTED_MODEL_TOOLTIP = "Importovaný model: {}"
self.CONCATENATED_FILE_WARNING = "Toto je spojená část souboru. Nebude fungovat s llama-quantize; prosím, spojte soubor nejdříve."
self.CONCATENATED_FILES_FOUND = (
"Nalezeno {} spojených částí souboru. Prosím, spojte soubory nejdříve."
)
# Plugins
self.PLUGINS_DIR_NOT_EXIST = (
"Adresář pluginů '{}' neexistuje. Žádné pluginy nebudou načteny."
)
self.PLUGINS_DIR_NOT_DIRECTORY = (
"'{}' existuje, ale není to adresář. Žádné pluginy nebudou načteny."
)
self.PLUGIN_LOADED = "Načten plugin: {} {}"
self.PLUGIN_INCOMPATIBLE = (
"Plugin {} {} není kompatibilní s verzí AutoGGUF {}. Podporované verze: {}"
)
self.PLUGIN_LOAD_FAILED = "Nepodařilo se načíst plugin {}: {}"
self.NO_PLUGINS_LOADED = "Žádné pluginy nebyly načteny."
# GPU Monitoring
self.GPU_USAGE = "Využití GPU:"
self.GPU_USAGE_FORMAT = "GPU: {:.1f}% | VRAM: {:.1f}% ({} MB / {} MB)"
self.GPU_DETAILS = "Detaily GPU"
self.GPU_USAGE_OVER_TIME = "Využití GPU v čase"
self.VRAM_USAGE_OVER_TIME = "Využití VRAM v čase"
self.PERCENTAGE = "Procento"
self.TIME = "Čas (s)"
self.NO_GPU_DETECTED = "Nebyla detekována žádná GPU"
self.SELECT_GPU = "Vybrat GPU"
self.AMD_GPU_NOT_SUPPORTED = "Byla detekována AMD GPU, ale není podporována"
# Quantization
self.QUANTIZATION_TYPE = "Typ kvantizace:"
self.ALLOW_REQUANTIZE = "Povolit rekvantizaci"
self.LEAVE_OUTPUT_TENSOR = "Ponechat výstupní tenzor"
self.LEAVE_OUTPUT_TENSOR = "Ponechat výstupní tensor"
self.PURE = "Čistý"
self.IMATRIX = "IMatrix:"
self.INCLUDE_WEIGHTS = "Zahrnout váhy:"
self.EXCLUDE_WEIGHTS = "Vyloučit váhy:"
self.USE_OUTPUT_TENSOR_TYPE = "Použít typ výstupního tenzoru"
self.USE_TOKEN_EMBEDDING_TYPE = "Použít typ vkládání tokenů"
self.USE_TOKEN_EMBEDDING_TYPE = "Použít typ pro token embeddings"
self.KEEP_SPLIT = "Zachovat rozdělení"
self.KV_OVERRIDES = "Přepsání KV:"
self.ADD_NEW_OVERRIDE = "Přidat nové přepsání"
self.QUANTIZE_MODEL = "Kvantizovat model"
self.EXTRA_ARGUMENTS = "Další argumenty:"
self.EXTRA_ARGUMENTS_LABEL = "Další argumenty příkazové řádky"
self.QUANTIZATION_COMMAND = "Příkaz pro kvantizaci"
# Presets
self.SAVE_PRESET = "Uložit předvolbu"
self.LOAD_PRESET = "Načíst předvolbu"
self.TASKS = "Úkoly:"
# Tasks
self.TASKS = "Úlohy:"
# llama.cpp Download
self.DOWNLOAD_LLAMACPP = "Stáhnout llama.cpp"
self.SELECT_RELEASE = "Vybrat verzi:"
self.SELECT_ASSET = "Vybrat aktivum:"
self.EXTRACT_CUDA_FILES = "Extrahovat soubory CUDA"
self.SELECT_CUDA_BACKEND = "Vybrat backend CUDA:"
self.SELECT_ASSET = "Vybrat asset:"
self.EXTRACT_CUDA_FILES = "Extrahovat CUDA soubory"
self.SELECT_CUDA_BACKEND = "Vybrat CUDA backend:"
self.DOWNLOAD = "Stáhnout"
self.REFRESH_RELEASES = "Obnovit verze"
# IMatrix Generation
self.IMATRIX_GENERATION = "Generování IMatrix"
self.DATA_FILE = "Datový soubor:"
self.MODEL = "Model:"
self.OUTPUT = "Výstup:"
self.OUTPUT_FREQUENCY = "Frekvence výstupu:"
self.GPU_OFFLOAD = "Odlehčení GPU:"
self.GPU_OFFLOAD = "Využití GPU:"
self.AUTO = "Automaticky"
self.GENERATE_IMATRIX = "Generovat IMatrix"
self.CONTEXT_SIZE = "Velikost kontextu:"
self.CONTEXT_SIZE_FOR_IMATRIX = "Velikost kontextu pro generování IMatrix"
self.THREADS = "Vlákna:"
self.NUMBER_OF_THREADS_FOR_IMATRIX = "Počet vláken pro generování IMatrix"
self.IMATRIX_GENERATION_COMMAND = "Příkaz pro generování IMatrix"
# LoRA Conversion
self.LORA_CONVERSION = "Konverze LoRA"
self.LORA_INPUT_PATH = "Vstupní cesta LoRA"
self.LORA_OUTPUT_PATH = "Výstupní cesta LoRA"
self.SELECT_LORA_INPUT_DIRECTORY = "Vybrat vstupní adresář LoRA"
self.SELECT_LORA_OUTPUT_FILE = "Vybrat výstupní soubor LoRA"
self.CONVERT_LORA = "Převést LoRA"
self.LORA_CONVERSION_COMMAND = "Příkaz pro konverzi LoRA"
# LoRA Export
self.EXPORT_LORA = "Exportovat LoRA"
self.GGML_LORA_ADAPTERS = "GGML LoRA adaptéry"
self.SELECT_LORA_ADAPTER_FILES = "Vybrat soubory LoRA adaptéru"
self.ADD_ADAPTER = "Přidat adaptér"
self.DELETE_ADAPTER = "Smazat"
self.LORA_SCALE = "LoRA škála"
self.ENTER_LORA_SCALE_VALUE = "Zadejte hodnotu LoRA škály (volitelné)"
self.NUMBER_OF_THREADS_FOR_LORA_EXPORT = "Počet vláken pro export LoRA"
self.LORA_EXPORT_COMMAND = "Příkaz pro export LoRA"
# HuggingFace to GGUF Conversion
self.HF_TO_GGUF_CONVERSION = "Konverze HuggingFace na GGUF"
self.MODEL_DIRECTORY = "Adresář modelu:"
self.OUTPUT_FILE = "Výstupní soubor:"
self.OUTPUT_TYPE = "Typ výstupu:"
self.VOCAB_ONLY = "Pouze slovník"
self.USE_TEMP_FILE = "Použít dočasný soubor"
self.NO_LAZY_EVALUATION = "Bez líného vyhodnocování"
self.MODEL_NAME = "Název modelu:"
self.VERBOSE = "Podrobný výpis"
self.SPLIT_MAX_SIZE = "Maximální velikost pro rozdělení:"
self.DRY_RUN = "Zkušební běh"
self.CONVERT_HF_TO_GGUF = "Převést HF na GGUF"
self.SELECT_HF_MODEL_DIRECTORY = "Vybrat adresář modelu HuggingFace"
self.BROWSE_FOR_HF_MODEL_DIRECTORY = "Procházení adresáře modelu HuggingFace"
self.BROWSE_FOR_HF_TO_GGUF_OUTPUT = (
"Procházení výstupního souboru pro konverzi HuggingFace na GGUF"
)
# Update Checking
self.UPDATE_AVAILABLE = "Aktualizace je k dispozici"
self.NEW_VERSION_AVAILABLE = "Je k dispozici nová verze: {}"
self.DOWNLOAD_NEW_VERSION = "Stáhnout?"
self.ERROR_CHECKING_FOR_UPDATES = "Chyba při kontrole aktualizací:"
self.CHECKING_FOR_UPDATES = "Kontrola aktualizací"
# General Messages
self.ERROR = "Chyba"
self.WARNING = "Varování"
self.PROPERTIES = "Vlastnosti"
self.CANCEL = "Zrušit"
self.RESTART = "Restartovat"
self.DELETE = "Smazat"
self.CONFIRM_DELETION = "Jste si jisti, že chcete smazat tento úkol?"
self.RENAME = "Přejmenovat"
self.CONFIRM_DELETION = "Opravdu chcete smazat tuto úlohu?"
self.TASK_RUNNING_WARNING = (
"Některé úkoly stále běží. Jste si jisti, že chcete ukončit?"
"Některé úlohy stále běží. Opravdu chcete aplikaci ukončit?"
)
self.YES = "Ano"
self.NO = "Ne"
self.COMPLETED = "Dokončeno"
# File Types
self.ALL_FILES = "Všechny soubory (*)"
self.GGUF_FILES = "GGUF soubory (*.gguf)"
self.DAT_FILES = "DAT soubory (*.dat)"
self.JSON_FILES = "JSON soubory (*.json)"
self.BIN_FILES = "Binární soubory (*.bin)"
self.LORA_FILES = "LoRA soubory (*.bin *.gguf)"
self.GGUF_AND_BIN_FILES = "GGUF a binární soubory (*.gguf *.bin)"
self.SHARDED = "rozdělený"
# Status Messages
self.DOWNLOAD_COMPLETE = "Stahování dokončeno"
self.CUDA_EXTRACTION_FAILED = "Extrahování CUDA se nezdařilo"
self.CUDA_EXTRACTION_FAILED = "Extrakce CUDA selhala"
self.PRESET_SAVED = "Předvolba uložena"
self.PRESET_LOADED = "Předvolba načtena"
self.NO_ASSET_SELECTED = "Nebylo vybráno žádné aktivum"
self.DOWNLOAD_FAILED = "Stahování se nezdařilo"
self.NO_ASSET_SELECTED = "Nebyl vybrán žádný asset"
self.DOWNLOAD_FAILED = "Stahování selhalo"
self.NO_BACKEND_SELECTED = "Nebyl vybrán žádný backend"
self.NO_MODEL_SELECTED = "Nebyl vybrán žádný model"
self.REFRESH_RELEASES = "Obnovit verze"
self.NO_SUITABLE_CUDA_BACKENDS = "Nebyly nalezeny žádné vhodné backendy CUDA"
self.LLAMACPP_DOWNLOADED_EXTRACTED = "Binární soubor llama.cpp byl stažen a extrahován do {0}\nSoubory CUDA extrahovány do {1}"
self.CUDA_FILES_EXTRACTED = "Soubory CUDA extrahovány do"
self.NO_SUITABLE_CUDA_BACKENDS = "Nebyly nalezeny žádné vhodné CUDA backendy"
self.IN_PROGRESS = "Probíhá"
self.LLAMACPP_DOWNLOADED_EXTRACTED = (
"Binární soubor llama.cpp byl stažen a extrahován do {0}"
)
self.CUDA_FILES_EXTRACTED = "CUDA soubory byly extrahovány do"
self.NO_SUITABLE_CUDA_BACKEND_EXTRACTION = (
"Nebyl nalezen žádný vhodný backend CUDA pro extrakci"
"Nebyl nalezen žádný vhodný CUDA backend pro extrakci"
)
self.ERROR_FETCHING_RELEASES = "Chyba při načítání verzí: {0}"
self.CONFIRM_DELETION_TITLE = "Potvrdit smazání"
self.LOG_FOR = "Log pro {0}"
self.ALL_FILES = "Všechny soubory (*)"
self.GGUF_FILES = "Soubory GGUF (*.gguf)"
self.DAT_FILES = "Soubory DAT (*.dat)"
self.JSON_FILES = "Soubory JSON (*.json)"
self.FAILED_LOAD_PRESET = "Nepodařilo se načíst předvolbu: {0}"
self.FAILED_TO_LOAD_PRESET = "Nepodařilo se načíst předvolbu: {0}"
self.INITIALIZING_AUTOGGUF = "Inicializace aplikace AutoGGUF"
self.AUTOGGUF_INITIALIZATION_COMPLETE = "Inicializace AutoGGUF dokončena"
self.REFRESHING_BACKENDS = "Obnovování backendů"
self.NO_BACKENDS_AVAILABLE = "Žádné dostupné backendy"
self.REFRESHING_BACKENDS = "Obnovuji backendy"
self.NO_BACKENDS_AVAILABLE = "Nejsou dostupné žádné backendy"
self.FOUND_VALID_BACKENDS = "Nalezeno {0} platných backendů"
self.SAVING_PRESET = "Ukládání předvolby"
self.SAVING_PRESET = "Ukládám předvolbu"
self.PRESET_SAVED_TO = "Předvolba uložena do {0}"
self.LOADING_PRESET = "Načítání předvolby"
self.LOADING_PRESET = "Načítám předvolbu"
self.PRESET_LOADED_FROM = "Předvolba načtena z {0}"
self.ADDING_KV_OVERRIDE = "Přidávání přepsání KV: {0}"
self.SAVING_TASK_PRESET = "Ukládání předvolby úkolu pro {0}"
self.TASK_PRESET_SAVED = "Předvolba úkolu uložena"
self.TASK_PRESET_SAVED_TO = "Předvolba úkolu uložena do {0}"
self.RESTARTING_TASK = "Restartování úkolu: {0}"
self.IN_PROGRESS = "Probíhá"
self.ADDING_KV_OVERRIDE = "Přidávám přepsání KV: {0}"
self.SAVING_TASK_PRESET = "Ukládám předvolbu úlohy pro {0}"
self.TASK_PRESET_SAVED = "Předvolba úlohy uložena"
self.TASK_PRESET_SAVED_TO = "Předvolba úlohy uložena do {0}"
self.RESTARTING_TASK = "Restartuji úlohu: {0}"
self.DOWNLOAD_FINISHED_EXTRACTED_TO = "Stahování dokončeno. Extrahováno do: {0}"
self.LLAMACPP_DOWNLOADED_AND_EXTRACTED = "Binární soubor llama.cpp byl stažen a extrahován do {0}\nSoubory CUDA extrahovány do {1}"
self.LLAMACPP_DOWNLOADED_AND_EXTRACTED = (
"Binární soubor llama.cpp byl stažen a extrahován do {0}"
)
self.NO_SUITABLE_CUDA_BACKEND_FOUND = (
"Nebyl nalezen žádný vhodný backend CUDA pro extrakci"
"Nebyl nalezen žádný vhodný CUDA backend pro extrakci"
)
self.LLAMACPP_BINARY_DOWNLOADED_AND_EXTRACTED = (
"Binární soubor llama.cpp byl stažen a extrahován do {0}"
)
self.REFRESHING_LLAMACPP_RELEASES = "Obnovování verzí llama.cpp"
self.UPDATING_ASSET_LIST = "Aktualizace seznamu aktiv"
self.UPDATING_CUDA_OPTIONS = "Aktualizace možností CUDA"
self.STARTING_LLAMACPP_DOWNLOAD = "Zahájení stahování llama.cpp"
self.UPDATING_CUDA_BACKENDS = "Aktualizace backendů CUDA"
self.NO_CUDA_BACKEND_SELECTED = "Nebyl vybrán žádný backend CUDA pro extrakci"
self.EXTRACTING_CUDA_FILES = "Extrahování souborů CUDA z {0} do {1}"
self.REFRESHING_LLAMACPP_RELEASES = "Obnovuji verze llama.cpp"
self.UPDATING_ASSET_LIST = "Aktualizuji seznam assetů"
self.UPDATING_CUDA_OPTIONS = "Aktualizuji možnosti CUDA"
self.STARTING_LLAMACPP_DOWNLOAD = "Spouštím stahování llama.cpp"
self.UPDATING_CUDA_BACKENDS = "Aktualizuji CUDA backendy"
self.NO_CUDA_BACKEND_SELECTED = "Nebyl vybrán žádný CUDA backend pro extrakci"
self.EXTRACTING_CUDA_FILES = "Extrahuji CUDA soubory z {0} do {1}"
self.DOWNLOAD_ERROR = "Chyba stahování: {0}"
self.SHOWING_TASK_CONTEXT_MENU = "Zobrazení kontextové nabídky úkolu"
self.SHOWING_PROPERTIES_FOR_TASK = "Zobrazení vlastností úkolu: {0}"
self.CANCELLING_TASK = "Zrušení úkolu: {0}"
self.SHOWING_TASK_CONTEXT_MENU = "Zobrazuji kontextové menu úlohy"
self.SHOWING_PROPERTIES_FOR_TASK = "Zobrazuji vlastnosti pro úlohu: {0}"
self.CANCELLING_TASK = "Ruším úlohu: {0}"
self.CANCELED = "Zrušeno"
self.DELETING_TASK = "Mazání úkolu: {0}"
self.LOADING_MODELS = "Načítání modelů"
self.DELETING_TASK = "Mažu úlohu: {0}"
self.LOADING_MODELS = "Načítám modely"
self.LOADED_MODELS = "Načteno {0} modelů"
self.BROWSING_FOR_MODELS_DIRECTORY = "Procházení adresáře modelů"
self.SELECT_MODELS_DIRECTORY = "Vyberte adresář modelů"
self.SELECT_MODELS_DIRECTORY = "Vybrat adresář modelů"
self.BROWSING_FOR_OUTPUT_DIRECTORY = "Procházení výstupního adresáře"
self.SELECT_OUTPUT_DIRECTORY = "Vyberte výstupní adresář"
self.SELECT_OUTPUT_DIRECTORY = "Vybrat výstupní adresář"
self.BROWSING_FOR_LOGS_DIRECTORY = "Procházení adresáře logů"
self.SELECT_LOGS_DIRECTORY = "Vyberte adresář logů"
self.SELECT_LOGS_DIRECTORY = "Vybrat adresář logů"
self.BROWSING_FOR_IMATRIX_FILE = "Procházení souboru IMatrix"
self.SELECT_IMATRIX_FILE = "Vyberte soubor IMatrix"
self.SELECT_IMATRIX_FILE = "Vybrat soubor IMatrix"
self.RAM_USAGE_FORMAT = "{0:.1f}% ({1} MB / {2} MB)"
self.CPU_USAGE_FORMAT = "Využití CPU: {0:.1f}%"
self.VALIDATING_QUANTIZATION_INPUTS = "Ověřování vstupů kvantizace"
self.MODELS_PATH_REQUIRED = "Cesta k modelům je vyžadována"
self.OUTPUT_PATH_REQUIRED = "Výstupní cesta je vyžadována"
self.LOGS_PATH_REQUIRED = "Cesta k logům je vyžadována"
self.STARTING_MODEL_QUANTIZATION = "Spuštění kvantizace modelu"
self.VALIDATING_QUANTIZATION_INPUTS = "Validuji vstupy pro kvantizaci"
self.MODELS_PATH_REQUIRED = "Je vyžadována cesta k modelům"
self.OUTPUT_PATH_REQUIRED = "Je vyžadována cesta pro výstup"
self.LOGS_PATH_REQUIRED = "Je vyžadována cesta k logům"
self.STARTING_MODEL_QUANTIZATION = "Spouštím kvantizaci modelu"
self.INPUT_FILE_NOT_EXIST = "Vstupní soubor '{0}' neexistuje."
self.QUANTIZING_MODEL_TO = "Kvantizace {0} na {1}"
self.QUANTIZATION_TASK_STARTED = "Úkol kvantizace spuštěn pro {0}"
self.QUANTIZING_MODEL_TO = "Kvantizuji {0} na {1}"
self.QUANTIZATION_TASK_STARTED = "Úloha kvantizace spuštěna pro {0}"
self.ERROR_STARTING_QUANTIZATION = "Chyba při spuštění kvantizace: {0}"
self.UPDATING_MODEL_INFO = "Aktualizace informací o modelu: {0}"
self.TASK_FINISHED = "Úkol dokončen: {0}"
self.SHOWING_TASK_DETAILS_FOR = "Zobrazení detailů úkolu pro: {0}"
self.UPDATING_MODEL_INFO = "Aktualizuji informace o modelu: {0}"
self.TASK_FINISHED = "Úloha dokončena: {0}"
self.SHOWING_TASK_DETAILS_FOR = "Zobrazuji detaily úlohy pro: {0}"
self.BROWSING_FOR_IMATRIX_DATA_FILE = "Procházení datového souboru IMatrix"
self.SELECT_DATA_FILE = "Vyberte datový soubor"
self.SELECT_DATA_FILE = "Vybrat datový soubor"
self.BROWSING_FOR_IMATRIX_MODEL_FILE = "Procházení souboru modelu IMatrix"
self.SELECT_MODEL_FILE = "Vyberte soubor modelu"
self.SELECT_MODEL_FILE = "Vybrat soubor modelu"
self.BROWSING_FOR_IMATRIX_OUTPUT_FILE = "Procházení výstupního souboru IMatrix"
self.SELECT_OUTPUT_FILE = "Vyberte výstupní soubor"
self.STARTING_IMATRIX_GENERATION = "Spuštění generování IMatrix"
self.BACKEND_PATH_NOT_EXIST = "Cesta backendu neexistuje: {0}"
self.GENERATING_IMATRIX = "Generování IMatrix"
self.SELECT_OUTPUT_FILE = "Vybrat výstupní soubor"
self.STARTING_IMATRIX_GENERATION = "Spouštím generování IMatrix"
self.BACKEND_PATH_NOT_EXIST = "Cesta k backendu neexistuje: {0}"
self.GENERATING_IMATRIX = "Generuji IMatrix"
self.ERROR_STARTING_IMATRIX_GENERATION = (
"Chyba při spuštění generování IMatrix: {0}"
)
self.IMATRIX_GENERATION_TASK_STARTED = "Úkol generování IMatrix spuštěn"
self.IMATRIX_GENERATION_TASK_STARTED = "Úloha generování IMatrix spuštěna"
self.ERROR_MESSAGE = "Chyba: {0}"
self.TASK_ERROR = "Chyba úkolu: {0}"
self.APPLICATION_CLOSING = "Zavírání aplikace"
self.APPLICATION_CLOSED = "Aplikace zavřena"
self.TASK_ERROR = "Chyba úlohy: {0}"
self.APPLICATION_CLOSING = "Ukončuji aplikaci"
self.APPLICATION_CLOSED = "Aplikace ukončena"
self.SELECT_QUANTIZATION_TYPE = "Vyberte typ kvantizace"
self.ALLOWS_REQUANTIZING = (
"Umožňuje rekvantizovat tenzory, které již byly kvantizovány"
)
self.LEAVE_OUTPUT_WEIGHT = (
"Ponechá output.weight nekvantizovaný (nebo rekvantizovaný)"
"Povoluje rekvantizaci tenzorů, které již byly kvantizovány"
)
self.LEAVE_OUTPUT_WEIGHT = "Ponechá output.weight ne(re)kvantizovaný"
self.DISABLE_K_QUANT_MIXTURES = (
"Zakázat k-kvantové směsi a kvantizovat všechny tenzory na stejný typ"
"Zakáže k-kvantové směsi a kvantizuje všechny tenzory na stejný typ"
)
self.USE_DATA_AS_IMPORTANCE_MATRIX = (
"Použít data v souboru jako matici důležitosti pro optimalizace kvantizace"
"Použije data v souboru jako matici důležitosti pro optimalizace kvantizace"
)
self.USE_IMPORTANCE_MATRIX_FOR_TENSORS = (
"Použít matici důležitosti pro tyto tenzory"
"Použije matici důležitosti pro tyto tenzory"
)
self.DONT_USE_IMPORTANCE_MATRIX_FOR_TENSORS = (
"Nepoužívat matici důležitosti pro tyto tenzory"
"Nepoužije matici důležitosti pro tyto tenzory"
)
self.OUTPUT_TENSOR_TYPE = "Typ výstupního tenzoru:"
self.USE_THIS_TYPE_FOR_OUTPUT_WEIGHT = (
"Použít tento typ pro tenzor output.weight"
"Použije tento typ pro tenzor output.weight"
)
self.TOKEN_EMBEDDING_TYPE = "Typ vkládání tokenů:"
self.TOKEN_EMBEDDING_TYPE = "Typ token embeddings:"
self.USE_THIS_TYPE_FOR_TOKEN_EMBEDDINGS = (
"Použít tento typ pro tenzor vkládání tokenů"
"Použije tento typ pro tenzor token embeddings"
)
self.WILL_GENERATE_QUANTIZED_MODEL_IN_SAME_SHARDS = (
"Vygeneruje kvantizovaný model ve stejných fragmentech jako vstup"
"Vygeneruje kvantizovaný model ve stejných sharded souborech jako vstup"
)
self.OVERRIDE_MODEL_METADATA = "Přepsat metadata modelu"
self.INPUT_DATA_FILE_FOR_IMATRIX = (
@ -5986,9 +6138,123 @@ def __init__(self):
self.MODEL_TO_BE_QUANTIZED = "Model, který má být kvantizován"
self.OUTPUT_PATH_FOR_GENERATED_IMATRIX = "Výstupní cesta pro generovaný IMatrix"
self.HOW_OFTEN_TO_SAVE_IMATRIX = "Jak často ukládat IMatrix"
self.SET_GPU_OFFLOAD_VALUE = "Nastavit hodnotu odlehčení GPU (-ngl)"
self.COMPLETED = "Dokončeno"
self.REFRESH_MODELS = "Reîmprospătează modelele"
self.SET_GPU_OFFLOAD_VALUE = "Nastavit hodnotu pro využití GPU (-ngl)"
self.STARTING_LORA_CONVERSION = "Spouštím konverzi LoRA"
self.LORA_INPUT_PATH_REQUIRED = "Je vyžadována vstupní cesta LoRA."
self.LORA_OUTPUT_PATH_REQUIRED = "Je vyžadována výstupní cesta LoRA."
self.ERROR_STARTING_LORA_CONVERSION = "Chyba při spuštění konverze LoRA: {}"
self.LORA_CONVERSION_TASK_STARTED = "Úloha konverze LoRA spuštěna."
self.BROWSING_FOR_LORA_INPUT_DIRECTORY = "Procházení vstupního adresáře LoRA..."
self.BROWSING_FOR_LORA_OUTPUT_FILE = "Procházení výstupního souboru LoRA..."
self.CONVERTING_LORA = "Konverze LoRA"
self.LORA_CONVERSION_FINISHED = "Konverze LoRA dokončena."
self.LORA_FILE_MOVED = "Soubor LoRA přesunut z {} do {}."
self.LORA_FILE_NOT_FOUND = "Soubor LoRA nenalezen: {}."
self.ERROR_MOVING_LORA_FILE = "Chyba při přesouvání souboru LoRA: {}"
self.MODEL_PATH_REQUIRED = "Je vyžadována cesta k modelu."
self.AT_LEAST_ONE_LORA_ADAPTER_REQUIRED = (
"Je vyžadován alespoň jeden LoRA adaptér."
)
self.INVALID_LORA_SCALE_VALUE = "Neplatná hodnota LoRA škály."
self.ERROR_STARTING_LORA_EXPORT = "Chyba při spuštění exportu LoRA: {}"
self.LORA_EXPORT_TASK_STARTED = "Úloha exportu LoRA spuštěna."
self.EXPORTING_LORA = "Exportuji LoRA..."
self.BROWSING_FOR_EXPORT_LORA_MODEL_FILE = (
"Procházení souboru modelu pro export LoRA..."
)
self.BROWSING_FOR_EXPORT_LORA_OUTPUT_FILE = (
"Procházení výstupního souboru pro export LoRA..."
)
self.ADDING_LORA_ADAPTER = "Přidávám LoRA adaptér..."
self.DELETING_LORA_ADAPTER = "Mažu LoRA adaptér..."
self.SELECT_LORA_ADAPTER_FILE = "Vybrat soubor LoRA adaptéru"
self.STARTING_LORA_EXPORT = "Spouštím export LoRA..."
self.SELECT_OUTPUT_TYPE = "Vyberte typ výstupu (GGUF nebo GGML)"
self.BASE_MODEL = "Základní model"
self.SELECT_BASE_MODEL_FILE = "Vybrat soubor základního modelu (GGUF)"
self.BASE_MODEL_PATH_REQUIRED = (
"Pro výstup GGUF je vyžadována cesta k základnímu modelu."
)
self.BROWSING_FOR_BASE_MODEL_FILE = "Procházení souboru základního modelu..."
self.SELECT_BASE_MODEL_FOLDER = (
"Vybrat složku základního modelu (obsahující safetensors)"
)
self.BROWSING_FOR_BASE_MODEL_FOLDER = "Procházení složky základního modelu..."
self.LORA_CONVERSION_FROM_TO = "Konverze LoRA z {} na {}"
self.GENERATING_IMATRIX_FOR = "Generuji IMatrix pro {}"
self.MODEL_PATH_REQUIRED_FOR_IMATRIX = (
"Pro generování IMatrix je vyžadována cesta k modelu."
)
self.NO_ASSET_SELECTED_FOR_CUDA_CHECK = (
"Nebyl vybrán žádný asset pro kontrolu CUDA"
)
self.NO_QUANTIZATION_TYPE_SELECTED = "Nebyl vybrán žádný typ kvantizace. Prosím, vyberte alespoň jeden typ kvantizace."
self.STARTING_HF_TO_GGUF_CONVERSION = "Spouštím konverzi HuggingFace na GGUF"
self.MODEL_DIRECTORY_REQUIRED = "Je vyžadován adresář modelu"
self.HF_TO_GGUF_CONVERSION_COMMAND = "Příkaz pro konverzi HF na GGUF: {}"
self.CONVERTING_TO_GGUF = "Převádím {} na GGUF"
self.ERROR_STARTING_HF_TO_GGUF_CONVERSION = (
"Chyba při spuštění konverze HuggingFace na GGUF: {}"
)
self.HF_TO_GGUF_CONVERSION_TASK_STARTED = (
"Úloha konverze HuggingFace na GGUF spuštěna"
)
# Split GGUF
self.SPLIT_GGUF = "Rozdělit GGUF"
self.SPLIT_MAX_SIZE = "Maximální velikost pro rozdělení"
self.SPLIT_MAX_TENSORS = "Maximální počet tensorů pro rozdělení"
self.SPLIT_GGUF_TASK_STARTED = "Úloha rozdělení GGUF spuštěna"
self.SPLIT_GGUF_TASK_FINISHED = "Úloha rozdělení GGUF dokončena"
self.SPLIT_GGUF_COMMAND = "Příkaz pro rozdělení GGUF"
self.SPLIT_GGUF_ERROR = "Chyba při spuštění rozdělení GGUF"
self.NUMBER_OF_TENSORS = "Počet tensorů"
self.SIZE_IN_UNITS = "Velikost v G/M"
# Model actions
self.CONFIRM_DELETE = "Potvrdit smazání"
self.DELETE_MODEL_WARNING = "Opravdu chcete smazat model: {}?"
self.MODEL_RENAMED_SUCCESSFULLY = "Model byl úspěšně přejmenován."
self.MODEL_DELETED_SUCCESSFULLY = "Model byl úspěšně smazán."
# HuggingFace Transfer
self.ALL_FIELDS_REQUIRED = "Všechna pole jsou vyžadována."
self.HUGGINGFACE_UPLOAD_COMMAND = "Příkaz pro nahrávání na HuggingFace: "
self.UPLOADING = "Nahrávám"
self.UPLOADING_FOLDER = "Nahrávám složku"
self.HF_TRANSFER_TASK_NAME = "{} {} do {} z {}"
self.ERROR_STARTING_HF_TRANSFER = "Chyba při spuštění přenosu na HF: {}"
self.STARTED_HUGGINGFACE_TRANSFER = "Spuštěna operace {} na HuggingFace."
self.SELECT_FOLDER = "Vybrat složku"
self.SELECT_FILE = "Vybrat soubor"
# Menubar
self.CLOSE = "Zavřít"
self.FILE = "Soubor"
self.FOLDER = "Složka"
self.HELP = "Nápověda"
self.ABOUT = "O aplikaci"
self.AUTOFP8 = "AutoFP8"
self.TOOLS = "Nástroje"
self.HF_TRANSFER = "Přenos HF"
self.MERGE_GGUF = "Sloučit GGUF"
self.HF_UPLOAD = "Nahrát na HF"
self.HF_REPOSITORY = "Repozitář:"
self.HF_REMOTE_PATH = "Vzdálená cesta:"
self.HF_LOCAL_PATH = "Lokální cesta:"
self.MODEL = "Model"
self.DATASET = "Dataset"
self.SPACE = "Space"
self.HF_REPOSITORY_TYPE = "Typ repozitáře"
self.UPLOAD_TYPE = "Typ nahrávání"
self.UPLOAD = "Nahrát"
self.EXTRA_COMMAND_ARGUMENTS = "Další argumenty příkazové řádky"
self.INFO = "Info"
self.COPIED_COMMAND_TO_CLIPBOARD = "Zkopírován příkaz do schránky:"
class _CanadianFrench(_Localization):

View File

@ -24,8 +24,21 @@ def __init__(self, model_info, parent=None) -> None:
def format_model_info(self, model_info) -> str:
html = "<h2>Model Information</h2>"
html += f"<p><b>Architecture:</b> {model_info.get('architecture', 'N/A')}</p>"
html += f"<p><b>Quantization Type:</b> {model_info.get('quantization_type', 'N/A')}</p>"
html += f"<p><b>KV Pairs:</b> {model_info.get('kv_pairs', 'N/A')}</p>"
# Format quantization types
quant_types = model_info.get("quantization_type", [])
if quant_types:
# Clean up the format: remove "- type " prefix and join with " | "
formatted_types = []
for qtype in quant_types:
# Remove "- type " prefix if present
clean_type = qtype.replace("- type ", "").strip()
formatted_types.append(clean_type)
quant_display = " | ".join(formatted_types)
else:
quant_display = "N/A"
html += f"<p><b>Quantization Type:</b> {quant_display}</p>"
html += f"<p><b>Tensors:</b> {model_info.get('tensors', 'N/A')}</p>"
html += "<h3>Key-Value Pairs:</h3>"

View File

@ -59,6 +59,34 @@ def run(self) -> None:
self.error_signal.emit(str(e))
def parse_model_info(self, line) -> None:
# Mapping of technical keys to human-readable names
key_mappings = {
"general.architecture": "Architecture",
"general.name": "Model Name",
"general.file_type": "File Type",
"general.quantization_version": "Quantization Version",
"llama.block_count": "Layers",
"llama.context_length": "Context Length",
"llama.embedding_length": "Embedding Size",
"llama.feed_forward_length": "Feed Forward Length",
"llama.attention.head_count": "Attention Heads",
"llama.attention.head_count_kv": "Key-Value Heads",
"llama.attention.layer_norm_rms_epsilon": "RMS Norm Epsilon",
"llama.rope.freq_base": "RoPE Frequency Base",
"llama.rope.dimension_count": "RoPE Dimensions",
"llama.vocab_size": "Vocabulary Size",
"tokenizer.ggml.model": "Tokenizer Model",
"tokenizer.ggml.pre": "Tokenizer Preprocessing",
"tokenizer.ggml.tokens": "Tokens",
"tokenizer.ggml.token_type": "Token Types",
"tokenizer.ggml.merges": "BPE Merges",
"tokenizer.ggml.bos_token_id": "Begin of Sequence Token ID",
"tokenizer.ggml.eos_token_id": "End of Sequence Token ID",
"tokenizer.chat_template": "Chat Template",
"tokenizer.ggml.padding_token_id": "Padding Token ID",
"tokenizer.ggml.unk_token_id": "Unknown Token ID",
}
# Parse output for model information
if "llama_model_loader: loaded meta data with" in line:
parts = line.split()
@ -66,10 +94,25 @@ def parse_model_info(self, line) -> None:
self.model_info["tensors"] = parts[9]
elif "general.architecture" in line:
self.model_info["architecture"] = line.split("=")[-1].strip()
elif line.startswith("llama_model_loader: - kv"):
key = line.split(":")[2].strip()
value = line.split("=")[-1].strip()
self.model_info.setdefault("kv_data", {})[key] = value
elif line.startswith("llama_model_loader: - kv") and "=" in line:
# Split on '=' and take the parts
parts = line.split("=", 1) # Split only on first '='
left_part = parts[0].strip()
value = parts[1].strip()
# Extract key and type from left part
# Format: "llama_model_loader: - kv N: key type"
kv_parts = left_part.split(":")
if len(kv_parts) >= 3:
key_type_part = kv_parts[2].strip() # This is "key type"
key = key_type_part.rsplit(" ", 1)[
0
] # Everything except last word (type)
# Use human-readable name if available, otherwise use original key
display_key = key_mappings.get(key, key)
self.model_info.setdefault("kv_data", {})[display_key] = value
elif line.startswith("llama_model_loader: - type"):
parts = line.split(":")
if len(parts) > 1:

View File

@ -95,29 +95,41 @@ def show_task_context_menu(self, position) -> None:
def show_task_properties(self, item) -> None:
self.logger.debug(SHOWING_PROPERTIES_FOR_TASK.format(item.text()))
task_item = self.task_list.itemWidget(item)
for thread in self.quant_threads:
if thread.log_file == task_item.log_file:
model_info_dialog = ModelInfoDialog(thread.model_info, self)
model_info_dialog.exec()
break
model_info_dialog = ModelInfoDialog(thread.model_info, self)
model_info_dialog.exec()
break
def cancel_task(self, item) -> None:
self.logger.info(CANCELLING_TASK.format(item.text()))
# TODO: fix possibly buggy signal behavior
task_item = self.task_list.itemWidget(item)
for thread in self.quant_threads:
if thread.log_file == task_item.log_file:
thread.terminate()
task_item.update_status(CANCELED)
self.quant_threads.remove(thread)
break
if task_item:
task_name = task_item.task_name # Store the name before any changes
self.logger.info(CANCELLING_TASK.format(task_name))
# Find the thread and disconnect signals before terminating
for thread in self.quant_threads:
if thread.log_file == task_item.log_file:
# Disconnect all signals from this thread first
try:
thread.error_signal.disconnect() # Disconnect all error signal connections
thread.output_signal.disconnect() # Disconnect all output signal connections
except TypeError:
# No connections to disconnect
pass
# Now terminate the thread
thread.terminate()
self.quant_threads.remove(thread)
break
def delete_task(self, item) -> None:
self.logger.info(DELETING_TASK.format(item.text()))
task_item = self.task_list.itemWidget(item)
if not task_item:
return
# Cancel the task first
self.cancel_task(item)
task_name = task_item.task_name # Store task_name before deletion
self.logger.info(DELETING_TASK.format(task_name))
reply = QMessageBox.question(
self,
@ -126,13 +138,17 @@ def delete_task(self, item) -> None:
QMessageBox.StandardButton.Yes | QMessageBox.StandardButton.No,
QMessageBox.StandardButton.No,
)
if reply == QMessageBox.StandardButton.Yes:
task_item = self.task_list.itemWidget(item)
# Cancel the task first (which disconnects signals)
self.cancel_task(item)
# Now remove from list and delete
row = self.task_list.row(item)
self.task_list.takeItem(row)
if task_item:
task_item.deleteLater()
# Delete the widget after removing from list
task_item.deleteLater()
def update_status(self, status) -> None:
self.status = status

File diff suppressed because it is too large Load Diff

View File

@ -18,15 +18,16 @@
SupportsIndex,
cast,
)
from transformers import AutoConfig
import torch
if TYPE_CHECKING:
from torch import Tensor
import gguf
from gguf.constants import *
from convert_hf_to_gguf import LazyTorchTensor, Model
# reuse model definitions from convert_hf_to_gguf.py
from convert_hf_to_gguf import LazyTorchTensor, ModelBase
logger = logging.getLogger("lora-to-gguf")
@ -37,9 +38,10 @@ class PartialLoraTensor:
B: Tensor | None = None
# magic to support tensor shape modifications and splitting
class LoraTorchTensor:
_lora_A: Tensor
_lora_B: Tensor
_lora_A: Tensor # (n_rank, row_size)
_lora_B: Tensor # (col_size, n_rank)
_rank: int
def __init__(self, A: Tensor, B: Tensor):
@ -57,14 +59,20 @@ def get_lora_A_B(self) -> tuple[Tensor, Tensor]:
def __getitem__(
self,
indices: SupportsIndex | slice | tuple[SupportsIndex | slice | Tensor, ...],
indices: (
SupportsIndex
| slice
| tuple[
SupportsIndex | slice | Tensor, ...
] # TODO: add ellipsis in the type signature
),
) -> LoraTorchTensor:
shape = self.shape
if isinstance(indices, SupportsIndex):
if len(shape) > 2:
return LoraTorchTensor(self._lora_A[indices], self._lora_B[indices])
else:
raise NotImplementedError
raise NotImplementedError # can't return a vector
elif isinstance(indices, slice):
if len(shape) > 2:
return LoraTorchTensor(self._lora_A[indices], self._lora_B[indices])
@ -74,7 +82,7 @@ def __getitem__(
assert len(indices) > 0
if indices[-1] is Ellipsis:
return self[indices[:-1]]
# expand ellipsis
indices = tuple(
u
for v in (
@ -94,6 +102,7 @@ def __getitem__(
*(slice(None, None) for _ in range(len(indices), len(shape))),
)
# TODO: make sure this is correct
indices_A = (
*(
(
@ -109,7 +118,7 @@ def __getitem__(
indices_B = indices[:-1]
return LoraTorchTensor(self._lora_A[indices_A], self._lora_B[indices_B])
else:
raise NotImplementedError
raise NotImplementedError # unknown indice type
@property
def dtype(self) -> torch.dtype:
@ -132,8 +141,9 @@ def reshape(self, *shape: int | tuple[int, ...]) -> LoraTorchTensor:
new_shape = cast(tuple[int, ...], shape)
orig_shape = self.shape
if len(new_shape) < 2:
raise NotImplementedError
raise NotImplementedError # can't become a vector
# expand -1 in the shape
if any(dim == -1 for dim in new_shape):
n_elems = prod(orig_shape)
n_new_elems = prod(dim if dim != -1 else 1 for dim in new_shape)
@ -143,7 +153,7 @@ def reshape(self, *shape: int | tuple[int, ...]) -> LoraTorchTensor:
)
if new_shape[-1] != orig_shape[-1]:
raise NotImplementedError
raise NotImplementedError # can't reshape the row size trivially
shape_A = (*(1 for _ in new_shape[:-2]), self._rank, orig_shape[-1])
shape_B = (*new_shape[:-1], self._rank)
@ -162,7 +172,7 @@ def permute(self, *dims: int) -> LoraTorchTensor:
shape = self.shape
dims = tuple(dim - len(shape) if dim >= 0 else dim for dim in dims)
if dims[-1] == -1:
# TODO: support higher dimensional A shapes bigger than 1
assert all(dim == 1 for dim in self._lora_A.shape[:-2])
return LoraTorchTensor(self._lora_A, self._lora_B.permute(*dims))
if len(shape) == 2 and dims[-1] == -2 and dims[-2] == -1:
@ -170,7 +180,7 @@ def permute(self, *dims: int) -> LoraTorchTensor:
self._lora_B.permute(*dims), self._lora_A.permute(*dims)
)
else:
# TODO: compose the above two
raise NotImplementedError
def transpose(self, dim0: int, dim1: int) -> LoraTorchTensor:
@ -189,7 +199,7 @@ def to(self, *args, **kwargs):
@classmethod
def __torch_function__(cls, func: Callable, types, args=(), kwargs=None):
del types
del types # unused
if kwargs is None:
kwargs = {}
@ -230,28 +240,73 @@ def get_base_tensor_name(lora_tensor_name: str) -> str:
base_name = lora_tensor_name.replace("base_model.model.", "")
base_name = base_name.replace(".lora_A.weight", ".weight")
base_name = base_name.replace(".lora_B.weight", ".weight")
# models produced by mergekit-extract-lora have token embeddings in the adapter
base_name = base_name.replace(".lora_embedding_A", ".weight")
base_name = base_name.replace(".lora_embedding_B", ".weight")
return base_name
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument("--outfile", type=Path)
parser = argparse.ArgumentParser(
description="Convert a Hugging Face PEFT LoRA adapter to a GGUF file"
)
parser.add_argument(
"--outfile",
type=Path,
help="path to write to; default: based on input. {ftype} will be replaced by the outtype.",
)
parser.add_argument(
"--outtype",
type=str,
choices=["f32", "f16", "bf16", "q8_0", "auto"],
default="f16",
help="output format - use f32 for float32, f16 for float16, bf16 for bfloat16, q8_0 for Q8_0, auto for the highest-fidelity 16-bit float type depending on the first loaded tensor type",
)
parser.add_argument(
"--bigendian",
action="store_true",
help="model is executed on big endian machine",
)
parser.add_argument(
"--no-lazy",
action="store_true",
help="use more RAM by computing all outputs before writing (use in case lazy evaluation is broken)",
)
parser.add_argument(
"--verbose",
action="store_true",
help="increase output verbosity",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="only print out what will be done, without writing any new files",
)
parser.add_argument(
"--base",
type=Path,
help="directory containing Hugging Face model config files (config.json, tokenizer.json) for the base model that the adapter is based on - only config is needed, actual model weights are not required. If base model is unspecified, it will be loaded from Hugging Face hub based on the adapter config",
)
parser.add_argument(
"--base-model-id",
type=str,
help="the model ID of the base model, if it is not available locally or in the adapter config. If specified, it will ignore --base and load the base model config from the Hugging Face hub (Example: 'meta-llama/Llama-3.2-1B-Instruct')",
)
parser.add_argument(
"lora_path",
type=Path,
help="directory containing Hugging Face PEFT LoRA config (adapter_model.json) and weights (adapter_model.safetensors or adapter_model.bin)",
)
parser.add_argument("--bigendian", action="store_true")
parser.add_argument("--no-lazy", action="store_true")
parser.add_argument("--verbose", action="store_true")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--base", type=Path, required=True)
parser.add_argument("lora_path", type=Path)
return parser.parse_args()
def load_hparams_from_hf(hf_model_id: str) -> dict[str, Any]:
# normally, adapter does not come with base model config, we need to load it from AutoConfig
config = AutoConfig.from_pretrained(hf_model_id)
return config.to_dict()
if __name__ == "__main__":
args = parse_args()
logging.basicConfig(level=logging.DEBUG if args.verbose else logging.INFO)
@ -266,19 +321,20 @@ def parse_args() -> argparse.Namespace:
ftype = ftype_map[args.outtype]
dir_base_model: Path = args.base
dir_base_model: Path | None = args.base
dir_lora: Path = args.lora_path
base_model_id: str | None = args.base_model_id
lora_config = dir_lora / "adapter_config.json"
input_model = dir_lora / "adapter_model.safetensors"
if args.outfile is not None:
fname_out = args.outfile
else:
# output in the same directory as the model by default
fname_out = dir_lora
if os.path.exists(input_model):
# lazy import load_file only if lora is in safetensors format.
from safetensors.torch import load_file
lora_model = load_file(input_model, device="cpu")
@ -286,11 +342,41 @@ def parse_args() -> argparse.Namespace:
input_model = os.path.join(dir_lora, "adapter_model.bin")
lora_model = torch.load(input_model, map_location="cpu", weights_only=True)
logger.info(f"Loading base model: {dir_base_model.name}")
hparams = Model.load_hparams(dir_base_model)
# load LoRA config
with open(lora_config, "r") as f:
lparams: dict[str, Any] = json.load(f)
# load base model
if base_model_id is not None:
logger.info(f"Loading base model from Hugging Face: {base_model_id}")
hparams = load_hparams_from_hf(base_model_id)
elif dir_base_model is None:
if "base_model_name_or_path" in lparams:
model_id = lparams["base_model_name_or_path"]
logger.info(f"Loading base model from Hugging Face: {model_id}")
try:
hparams = load_hparams_from_hf(model_id)
except OSError as e:
logger.error(f"Failed to load base model config: {e}")
logger.error(
"Please try downloading the base model and add its path to --base"
)
sys.exit(1)
else:
logger.error(
"'base_model_name_or_path' is not found in adapter_config.json"
)
logger.error(
"Base model config is required. Please download the base model and add its path to --base"
)
sys.exit(1)
else:
logger.info(f"Loading base model: {dir_base_model.name}")
hparams = ModelBase.load_hparams(dir_base_model)
with torch.inference_mode():
try:
model_class = Model.from_model_architecture(hparams["architectures"][0])
model_class = ModelBase.from_model_architecture(hparams["architectures"][0])
except NotImplementedError:
logger.error(f"Model {hparams['architectures'][0]} is not supported")
sys.exit(1)
@ -309,6 +395,9 @@ def __init__(
self.dir_model_card = dir_lora_model
self.lora_alpha = float(lora_alpha)
def set_vocab(self):
pass
def set_type(self):
self.gguf_writer.add_type(gguf.GGUFType.ADAPTER)
self.gguf_writer.add_string(gguf.Keys.Adapter.TYPE, "lora")
@ -317,7 +406,10 @@ def set_gguf_parameters(self):
self.gguf_writer.add_float32(
gguf.Keys.Adapter.LORA_ALPHA, self.lora_alpha
)
super().set_gguf_parameters()
def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]:
# Never add extra tensors (e.g. rope_freqs) for LoRA adapters
return ()
def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
tensor_map: dict[str, PartialLoraTensor] = {}
@ -326,14 +418,26 @@ def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
if self.lazy:
tensor = LazyTorchTensor.from_eager(tensor)
base_name = get_base_tensor_name(name)
is_lora_a = ".lora_A.weight" in name
is_lora_b = ".lora_B.weight" in name
# note: mergekit-extract-lora also adds token embeddings to the adapter
is_lora_a = ".lora_A.weight" in name or ".lora_embedding_A" in name
is_lora_b = ".lora_B.weight" in name or ".lora_embedding_B" in name
if not is_lora_a and not is_lora_b:
if ".base_layer.weight" in name:
continue
# mergekit-extract-lora add these layernorm to the adapter, we need to keep them
if "_layernorm" in name or ".norm" in name:
yield (base_name, tensor)
continue
logger.error(
f"Unexpected name '{name}': Not a lora_A or lora_B tensor"
)
if ".embed_tokens.weight" in name or ".lm_head.weight" in name:
logger.error(
"Embeddings is present in the adapter. This can be due to new tokens added during fine tuning"
)
logger.error(
"Please refer to https://github.com/ggml-org/llama.cpp/pull/9948"
)
sys.exit(1)
if base_name in tensor_map:
@ -358,17 +462,34 @@ def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
def modify_tensors(
self, data_torch: Tensor, name: str, bid: int | None
) -> Iterable[tuple[str, Tensor]]:
dest = super().modify_tensors(data_torch, name, bid)
dest = list(super().modify_tensors(data_torch, name, bid))
# some archs may have the same tensor for lm_head and output (tie word embeddings)
# in this case, adapters targeting lm_head will fail when using llama-export-lora
# therefore, we ignore them for now
# see: https://github.com/ggml-org/llama.cpp/issues/9065
if name == "lm_head.weight" and len(dest) == 0:
raise ValueError(
"lm_head is present in adapter, but is ignored in base model"
)
for dest_name, dest_data in dest:
# mergekit-extract-lora add these layernorm to the adapter
if "_norm" in dest_name:
assert dest_data.dim() == 1
yield (dest_name, dest_data)
continue
# otherwise, we must get the lora_A and lora_B tensors
assert isinstance(dest_data, LoraTorchTensor)
lora_a, lora_b = dest_data.get_lora_A_B()
# note: mergekit-extract-lora flip and transpose A and B
# here we only need to transpose token_embd.lora_a, see llm_build_inp_embd()
if "token_embd.weight" in dest_name:
lora_a = lora_a.T
yield (dest_name + ".lora_a", lora_a)
yield (dest_name + ".lora_b", lora_b)
with open(lora_config, "r") as f:
lparams: dict[str, Any] = json.load(f)
alpha: float = lparams["lora_alpha"]
model_instance = LoraModel(
@ -381,7 +502,7 @@ def modify_tensors(
dry_run=args.dry_run,
dir_lora_model=dir_lora,
lora_alpha=alpha,
is_lora=True,
hparams=hparams,
)
logger.info("Exporting model...")

File diff suppressed because it is too large Load Diff

11
src/gguf/gguf.py Normal file
View File

@ -0,0 +1,11 @@
import importlib
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
# Compatibility for people trying to import gguf/gguf.py directly instead of as a package.
importlib.invalidate_caches()
import gguf # noqa: E402
importlib.reload(gguf)

View File

@ -1,11 +1,8 @@
#
# GGUF file reading/modification support. For API usage information,
# please see the files scripts/ for some fairly simple examples.
#
from __future__ import annotations
import logging
import os
import sys
from collections import OrderedDict
from typing import Any, Literal, NamedTuple, TypeVar, Union
@ -15,7 +12,6 @@
from .quants import quant_shape_to_byte_shape
if __name__ == "__main__":
import sys
from pathlib import Path
# Allow running file in package as a script.
@ -28,6 +24,7 @@
GGUF_VERSION,
GGMLQuantizationType,
GGUFValueType,
GGUFEndian,
)
logger = logging.getLogger(__name__)
@ -53,6 +50,52 @@ class ReaderField(NamedTuple):
types: list[GGUFValueType] = []
def contents(self, index_or_slice: int | slice = slice(None)) -> Any:
if self.types:
to_string = lambda x: str(x.tobytes(), encoding="utf-8") # noqa: E731
main_type = self.types[0]
if main_type == GGUFValueType.ARRAY:
sub_type = self.types[-1]
if sub_type == GGUFValueType.STRING:
indices = self.data[index_or_slice]
if isinstance(index_or_slice, int):
return to_string(self.parts[indices]) # type: ignore
else:
return [to_string(self.parts[idx]) for idx in indices] # type: ignore
else:
# FIXME: When/if _get_field_parts() support multi-dimensional arrays, this must do so too
# Check if it's unsafe to perform slice optimization on data
# if any(True for idx in self.data if len(self.parts[idx]) != 1):
# optim_slice = slice(None)
# else:
# optim_slice = index_or_slice
# index_or_slice = slice(None)
# if isinstance(optim_slice, int):
# return self.parts[self.data[optim_slice]].tolist()[0]
# else:
# return [pv for idx in self.data[optim_slice] for pv in self.parts[idx].tolist()][index_or_slice]
if isinstance(index_or_slice, int):
return self.parts[self.data[index_or_slice]].tolist()[0]
else:
return [
pv
for idx in self.data[index_or_slice]
for pv in self.parts[idx].tolist()
]
if main_type == GGUFValueType.STRING:
return to_string(self.parts[-1])
else:
return self.parts[-1].tolist()[0]
return None
class ReaderTensor(NamedTuple):
name: str
@ -103,12 +146,23 @@ def __init__(
# If we get 0 here that means it's (probably) a GGUF file created for
# the opposite byte order of the machine this script is running on.
self.byte_order = "S"
temp_version = temp_version.newbyteorder(self.byte_order)
temp_version = temp_version.view(
temp_version.dtype.newbyteorder(self.byte_order)
)
version = temp_version[0]
if version not in READER_SUPPORTED_VERSIONS:
raise ValueError(
f"Sorry, file appears to be version {version} which we cannot handle"
)
if sys.byteorder == "little":
# Host is little endian
host_endian = GGUFEndian.LITTLE
swapped_endian = GGUFEndian.BIG
else:
# Sorry PDP or other weird systems that don't use BE or LE.
host_endian = GGUFEndian.BIG
swapped_endian = GGUFEndian.LITTLE
self.endianess = swapped_endian if self.byte_order == "S" else host_endian
self.fields: OrderedDict[str, ReaderField] = OrderedDict()
self.tensors: list[ReaderTensor] = []
offs += self._push_field(
@ -169,10 +223,11 @@ def _get(
count = int(count)
itemsize = int(np.empty([], dtype=dtype).itemsize)
end_offs = offset + itemsize * count
return (
self.data[offset:end_offs]
.view(dtype=dtype)[:count]
.newbyteorder(override_order or self.byte_order)
arr = self.data[offset:end_offs].view(dtype=dtype)[:count]
return arr.view(
arr.dtype.newbyteorder(
self.byte_order if override_order is None else override_order
)
)
def _push_field(self, field: ReaderField, skip_sum: bool = False) -> int:
@ -219,6 +274,7 @@ def _get_field_parts(
offs += int(alen.nbytes)
aparts: list[npt.NDArray[Any]] = [raw_itype, alen]
data_idxs: list[int] = []
# FIXME: Handle multi-dimensional arrays properly instead of flattening
for idx in range(alen[0]):
curr_size, curr_parts, curr_idxs, curr_types = self._get_field_parts(
offs, raw_itype[0]

View File

@ -26,12 +26,14 @@
RopeScalingType,
PoolingType,
TokenType,
ExpertGatingFuncType,
)
from .quants import quant_shape_from_byte_shape
logger = logging.getLogger(__name__)
SHARD_NAME_FORMAT = "{:s}-{:05d}-of-{:05d}.gguf"
@ -135,7 +137,7 @@ def get_total_parameter_count(self) -> tuple[int, int, int, int]:
continue
elif name.endswith(".lora_b"):
if last_lora_a is None or last_lora_a[0] != name[:-1] + "a":
# Bail when the LoRA pair can't be found trivially
logger.warning(
"can't measure LoRA size correctly, tensor order is unusual"
)
@ -154,11 +156,14 @@ def get_total_parameter_count(self) -> tuple[int, int, int, int]:
total_params += size
# Hopefully this should work even for variable-expert-count models
expert_count = (expert_sum // n_expert_tensors) if n_expert_tensors > 0 else 0
# Negate the total to signal it's likely not exact
if last_lora_a is not None:
total_params = -total_params
# NOTE: keep the output in the same order as accepted by 'size_label' in gguf-py/gguf/utility.py
return total_params, shared_params, expert_params, expert_count
def format_shard_names(self, path: Path) -> list[Path]:
@ -177,7 +182,7 @@ def open_output_file(self, path: Path | None = None) -> None:
and self.fout is not None
and (path is None or path == self.path)
):
# allow calling this multiple times as long as the path is the same
return
if self.state is not WriterState.NO_FILE:
@ -206,7 +211,7 @@ def print_plan(self) -> list[Path]:
if self.dry_run:
logger.info("Dry run, not writing files")
for name in filenames:
print(name)
print(name) # noqa: NP100
exit()
return filenames
@ -390,11 +395,12 @@ def add_tensor_info(
if tensor_dtype == np.uint8:
tensor_shape = quant_shape_from_byte_shape(tensor_shape, raw_dtype)
# make sure there is at least one tensor before splitting
if len(self.tensors[-1]) > 0:
if (
if ( # split when over tensor limit
self.split_max_tensors != 0
and len(self.tensors[-1]) >= self.split_max_tensors
) or (
) or ( # split when over size limit
self.split_max_size != 0
and sum(ti.nbytes for ti in self.tensors[-1].values()) + tensor_nbytes
> self.split_max_size
@ -460,6 +466,8 @@ def write_tensor_data(self, tensor: np.ndarray[Any, Any]) -> None:
fout = self.fout[file_id]
# pop the first tensor info
# TODO: cleaner way to get the first key
first_tensor_name = [
name for name, _ in zip(self.tensors[file_id].keys(), range(1))
][0]
@ -506,8 +514,11 @@ def write_tensors_to_file(self, *, progress: bool = False) -> None:
total = sum(ti.nbytes for ti in tensors.values())
shard_bar.reset(total=(total if total > 0 else None))
# relying on the fact that Python dicts preserve insertion order (since 3.7)
for ti in tensors.values():
assert ti.tensor is not None
assert (
ti.tensor is not None
) # can only iterate once over the tensors
assert ti.tensor.nbytes == ti.nbytes
ti.tensor.tofile(fout)
if shard_bar is not None:
@ -631,6 +642,11 @@ def add_base_model_organization(self, source_id: int, organization: str) -> None
Keys.General.BASE_MODEL_ORGANIZATION.format(id=source_id), organization
)
def add_base_model_description(self, source_id: int, description: str) -> None:
self.add_string(
Keys.General.BASE_MODEL_DESCRIPTION.format(id=source_id), description
)
def add_base_model_url(self, source_id: int, url: str) -> None:
self.add_string(Keys.General.BASE_MODEL_URL.format(id=source_id), url)
@ -643,15 +659,46 @@ def add_base_model_uuid(self, source_id: int, uuid: str) -> None:
def add_base_model_repo_url(self, source_id: int, repo_url: str) -> None:
self.add_string(Keys.General.BASE_MODEL_REPO_URL.format(id=source_id), repo_url)
def add_dataset_count(self, source_count: int) -> None:
self.add_uint32(Keys.General.DATASET_COUNT, source_count)
def add_dataset_name(self, source_id: int, name: str) -> None:
self.add_string(Keys.General.DATASET_NAME.format(id=source_id), name)
def add_dataset_author(self, source_id: int, author: str) -> None:
self.add_string(Keys.General.DATASET_AUTHOR.format(id=source_id), author)
def add_dataset_version(self, source_id: int, version: str) -> None:
self.add_string(Keys.General.DATASET_VERSION.format(id=source_id), version)
def add_dataset_organization(self, source_id: int, organization: str) -> None:
self.add_string(
Keys.General.DATASET_ORGANIZATION.format(id=source_id), organization
)
def add_dataset_description(self, source_id: int, description: str) -> None:
self.add_string(
Keys.General.DATASET_DESCRIPTION.format(id=source_id), description
)
def add_dataset_url(self, source_id: int, url: str) -> None:
self.add_string(Keys.General.DATASET_URL.format(id=source_id), url)
def add_dataset_doi(self, source_id: int, doi: str) -> None:
self.add_string(Keys.General.DATASET_DOI.format(id=source_id), doi)
def add_dataset_uuid(self, source_id: int, uuid: str) -> None:
self.add_string(Keys.General.DATASET_UUID.format(id=source_id), uuid)
def add_dataset_repo_url(self, source_id: int, repo_url: str) -> None:
self.add_string(Keys.General.DATASET_REPO_URL.format(id=source_id), repo_url)
def add_tags(self, tags: Sequence[str]) -> None:
self.add_array(Keys.General.TAGS, tags)
def add_languages(self, languages: Sequence[str]) -> None:
self.add_array(Keys.General.LANGUAGES, languages)
def add_datasets(self, datasets: Sequence[str]) -> None:
self.add_array(Keys.General.DATASETS, datasets)
def add_tensor_data_layout(self, layout: str) -> None:
self.add_string(Keys.LLM.TENSOR_DATA_LAYOUT.format(arch=self.arch), layout)
@ -664,6 +711,21 @@ def add_context_length(self, length: int) -> None:
def add_embedding_length(self, length: int) -> None:
self.add_uint32(Keys.LLM.EMBEDDING_LENGTH.format(arch=self.arch), length)
def add_features_length(self, length: int) -> None:
self.add_uint32(Keys.LLM.FEATURES_LENGTH.format(arch=self.arch), length)
def add_posnet_embedding_length(self, length: int) -> None:
self.add_uint32(Keys.PosNet.EMBEDDING_LENGTH.format(arch=self.arch), length)
def add_posnet_block_count(self, length: int) -> None:
self.add_uint32(Keys.PosNet.BLOCK_COUNT.format(arch=self.arch), length)
def add_convnext_embedding_length(self, length: int) -> None:
self.add_uint32(Keys.ConvNext.EMBEDDING_LENGTH.format(arch=self.arch), length)
def add_convnext_block_count(self, length: int) -> None:
self.add_uint32(Keys.ConvNext.BLOCK_COUNT.format(arch=self.arch), length)
def add_block_count(self, length: int) -> None:
self.add_uint32(Keys.LLM.BLOCK_COUNT.format(arch=self.arch), length)
@ -712,6 +774,12 @@ def add_key_length(self, length: int) -> None:
def add_value_length(self, length: int) -> None:
self.add_uint32(Keys.Attention.VALUE_LENGTH.format(arch=self.arch), length)
def add_key_length_mla(self, length: int) -> None:
self.add_uint32(Keys.Attention.KEY_LENGTH_MLA.format(arch=self.arch), length)
def add_value_length_mla(self, length: int) -> None:
self.add_uint32(Keys.Attention.VALUE_LENGTH_MLA.format(arch=self.arch), length)
def add_max_alibi_bias(self, bias: float) -> None:
self.add_float32(Keys.Attention.MAX_ALIBI_BIAS.format(arch=self.arch), bias)
@ -739,6 +807,18 @@ def add_expert_shared_count(self, count: int) -> None:
def add_expert_weights_scale(self, value: float) -> None:
self.add_float32(Keys.LLM.EXPERT_WEIGHTS_SCALE.format(arch=self.arch), value)
def add_expert_weights_norm(self, value: bool) -> None:
self.add_bool(Keys.LLM.EXPERT_WEIGHTS_NORM.format(arch=self.arch), value)
def add_expert_gating_func(self, value: ExpertGatingFuncType) -> None:
self.add_uint32(Keys.LLM.EXPERT_GATING_FUNC.format(arch=self.arch), value.value)
def add_moe_every_n_layers(self, value: int) -> None:
self.add_uint32(Keys.LLM.MOE_EVERY_N_LAYERS.format(arch=self.arch), value)
def add_swin_norm(self, value: bool) -> None:
self.add_bool(Keys.LLM.SWIN_NORM.format(arch=self.arch), value)
def add_rescale_every_n_layers(self, count: int) -> None:
self.add_uint32(Keys.LLM.RESCALE_EVERY_N_LAYERS.format(arch=self.arch), count)
@ -757,12 +837,26 @@ def add_embedding_scale(self, value: float) -> None:
def add_wkv_head_size(self, size: int) -> None:
self.add_uint32(Keys.WKV.HEAD_SIZE.format(arch=self.arch), size)
def add_token_shift_count(self, count: int) -> None:
self.add_uint32(Keys.LLM.TOKEN_SHIFT_COUNT.format(arch=self.arch), count)
def add_interleave_moe_layer_step(self, value: int) -> None:
self.add_uint32(
Keys.LLM.INTERLEAVE_MOE_LAYER_STEP.format(arch=self.arch), value
)
def add_layer_norm_eps(self, value: float) -> None:
self.add_float32(Keys.Attention.LAYERNORM_EPS.format(arch=self.arch), value)
def add_layer_norm_rms_eps(self, value: float) -> None:
self.add_float32(Keys.Attention.LAYERNORM_RMS_EPS.format(arch=self.arch), value)
def add_group_norm_eps(self, value: float) -> None:
self.add_float32(Keys.Attention.GROUPNORM_EPS.format(arch=self.arch), value)
def add_group_norm_groups(self, value: int) -> None:
self.add_uint32(Keys.Attention.GROUPNORM_GROUPS.format(arch=self.arch), value)
def add_causal_attention(self, value: bool) -> None:
self.add_bool(Keys.Attention.CAUSAL.format(arch=self.arch), value)
@ -772,6 +866,20 @@ def add_q_lora_rank(self, length: int) -> None:
def add_kv_lora_rank(self, length: int) -> None:
self.add_uint32(Keys.Attention.KV_LORA_RANK.format(arch=self.arch), length)
def add_decay_lora_rank(self, length: int) -> None:
self.add_uint32(Keys.Attention.DECAY_LORA_RANK.format(arch=self.arch), length)
def add_iclr_lora_rank(self, length: int) -> None:
self.add_uint32(Keys.Attention.ICLR_LORA_RANK.format(arch=self.arch), length)
def add_value_residual_mix_lora_rank(self, length: int) -> None:
self.add_uint32(
Keys.Attention.VALUE_RESIDUAL_MIX_LORA_RANK.format(arch=self.arch), length
)
def add_gate_lora_rank(self, length: int) -> None:
self.add_uint32(Keys.Attention.GATE_LORA_RANK.format(arch=self.arch), length)
def add_relative_attn_buckets_count(self, value: int) -> None:
self.add_uint32(Keys.Attention.REL_BUCKETS_COUNT.format(arch=self.arch), value)
@ -787,6 +895,9 @@ def add_pooling_type(self, value: PoolingType) -> None:
def add_rope_dimension_count(self, count: int) -> None:
self.add_uint32(Keys.Rope.DIMENSION_COUNT.format(arch=self.arch), count)
def add_rope_dimension_sections(self, dims: Sequence[int]) -> None:
self.add_array(Keys.Rope.DIMENSION_SECTIONS.format(arch=self.arch), dims)
def add_rope_freq_base(self, value: float) -> None:
self.add_float32(Keys.Rope.FREQ_BASE.format(arch=self.arch), value)
@ -863,9 +974,6 @@ def add_sep_token_id(self, id: int) -> None:
def add_pad_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.PAD_ID, id)
def add_cls_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.CLS_ID, id)
def add_mask_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.MASK_ID, id)
@ -893,6 +1001,7 @@ def add_chat_template(self, value: str | Sequence[Mapping[str, str]]) -> None:
name = choice.get("name", "")
template = choice.get("template")
# Allowing non-alphanumerical characters in template name is probably not a good idea, so filter it
name = "".join(
(c if c in ascii_letters + digits else "_" for c in name)
)
@ -916,21 +1025,65 @@ def add_chat_template(self, value: str | Sequence[Mapping[str, str]]) -> None:
self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value)
def add_prefix_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.PREFIX_ID, id)
def add_suffix_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.SUFFIX_ID, id)
def add_middle_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.MIDDLE_ID, id)
def add_eot_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.EOT_ID, id)
def add_eom_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.EOM_ID, id)
# for vision models
def add_vision_projection_dim(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.PROJECTION_DIM, value)
def add_vision_has_vision_encoder(self, value: bool) -> None:
self.add_bool(Keys.ClipVision.HAS_VISION_ENCODER, value)
def add_vision_patch_size(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.PATCH_SIZE, value)
def add_vision_embedding_length(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.EMBEDDING_LENGTH, value)
def add_vision_feed_forward_length(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.FEED_FORWARD_LENGTH, value)
def add_vision_block_count(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.BLOCK_COUNT, value)
def add_vision_head_count(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.Attention.HEAD_COUNT, value)
def add_vision_projector_type(self, value: str) -> None:
self.add_string(Keys.ClipVision.PROJECTOR_TYPE, value)
def add_vision_attention_layernorm_eps(self, value: float) -> None:
self.add_float32(Keys.ClipVision.Attention.LAYERNORM_EPS, value)
def add_vision_image_size(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.IMAGE_SIZE, value)
def add_vision_image_mean(self, values: Sequence[float]) -> None:
self.add_array(Keys.ClipVision.IMAGE_MEAN, values)
def add_vision_image_std(self, values: Sequence[float]) -> None:
self.add_array(Keys.ClipVision.IMAGE_STD, values)
def add_vision_spatial_merge_size(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.SPATIAL_MERGE_SIZE, value)
def add_vision_use_gelu(self, value: bool) -> None:
self.add_bool(Keys.ClipVision.USE_GELU, value)
def add_vision_use_silu(self, value: bool) -> None:
self.add_bool(Keys.ClipVision.USE_SILU, value)
def add_vision_projector_scale_factor(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.Projector.SCALE_FACTOR, value)
def add_vision_n_wa_pattern(self, value: int) -> None:
self.add_uint32(Keys.ClipVision.N_WA_PATTERN, value)
def _pack(self, fmt: str, value: Any, skip_pack_prefix: bool = False) -> bytes:
pack_prefix = ""
if not skip_pack_prefix:

View File

@ -12,6 +12,7 @@
class LazyMeta(ABCMeta):
def __new__(
cls, name: str, bases: tuple[type, ...], namespace: dict[str, Any], **kwargs
):
@ -34,7 +35,7 @@ def __getattr__(self, name: str) -> Any:
# need to make a builder for the wrapped wrapper to copy the name,
# or else it fails with very cryptic error messages,
# because somehow the same string would end up in every closure
# because somehow the same string would end up in every closures
def mk_wrap(op_name: str, *, meta_noop: bool = False):
# need to wrap the wrapper to get self
def wrapped_special_op(self, *args, **kwargs):
@ -200,6 +201,27 @@ def wrapped_fn(*args, **kwargs):
return cls(
meta=cls.eager_to_meta(res), args=args, kwargs=kwargs, func=fn
)
elif isinstance(res, tuple) and all(
isinstance(t, cls._tensor_type) for t in res
):
# share the evaluation between lazy tuple elements
shared_args: list = [args, None]
def eager_tuple_element(a: list[Any], i: int = 0, /, **kw) -> LazyBase:
assert len(a) == 2
if a[1] is None:
a[1] = fn(*a[0], **kw)
return a[1][i]
return tuple(
cls(
meta=cls.eager_to_meta(res[i]),
args=(shared_args, i),
kwargs=kwargs,
func=eager_tuple_element,
)
for i in range(len(res))
)
else:
del res # not needed
# non-tensor return likely relies on the contents of the args
@ -254,6 +276,8 @@ def from_eager(cls, t: Any) -> Any:
class LazyNumpyTensor(LazyBase):
_tensor_type = np.ndarray
shape: tuple[int, ...] # Makes the type checker happy in quants.py
@classmethod
def meta_with_dtype_and_shape(
cls, dtype: DTypeLike, shape: tuple[int, ...]

View File

@ -41,7 +41,7 @@ class Metadata:
base_models: Optional[list[dict]] = None
tags: Optional[list[str]] = None
languages: Optional[list[str]] = None
datasets: Optional[list[str]] = None
datasets: Optional[list[dict]] = None
@staticmethod
def load(
@ -50,7 +50,7 @@ def load(
model_name: Optional[str] = None,
total_params: int = 0,
) -> Metadata:
# This grabs as much contextual authorship metadata as possible from the model repository
# This grabs as many contextual authorship metadata as possible from the model repository
# making any conversion as required to match the gguf kv store metadata format
# as well as giving users the ability to override any authorship metadata that may be incorrect
@ -126,13 +126,13 @@ def load(
"general.base_models", metadata.base_models
)
# Datasets is received here as an array of datasets
metadata.datasets = metadata_override.get("general.datasets", metadata.datasets)
metadata.tags = metadata_override.get(Keys.General.TAGS, metadata.tags)
metadata.languages = metadata_override.get(
Keys.General.LANGUAGES, metadata.languages
)
metadata.datasets = metadata_override.get(
Keys.General.DATASETS, metadata.datasets
)
# Direct Metadata Override (via direct cli argument)
if model_name is not None:
@ -160,21 +160,41 @@ def load_model_card(model_path: Optional[Path] = None) -> dict[str, Any]:
if not model_card_path.is_file():
return {}
# The model card metadata is assumed to always be in YAML
# The model card metadata is assumed to always be in YAML (frontmatter)
# ref: https://github.com/huggingface/transformers/blob/a5c642fe7a1f25d3bdcd76991443ba6ff7ee34b2/src/transformers/modelcard.py#L468-L473
yaml_content: str = ""
with open(model_card_path, "r", encoding="utf-8") as f:
if f.readline() == "---\n":
raw = f.read().partition("---\n")[0]
data = yaml.safe_load(raw)
if isinstance(data, dict):
return data
else:
logger.error(
f"while reading YAML model card frontmatter, data is {type(data)} instead of dict"
)
return {}
else:
content = f.read()
lines = content.splitlines()
lines_yaml = []
if len(lines) == 0:
# Empty file
return {}
if len(lines) > 0 and lines[0] != "---":
# No frontmatter
return {}
for line in lines[1:]:
if line == "---":
break # End of frontmatter
else:
lines_yaml.append(line)
yaml_content = "\n".join(lines_yaml) + "\n"
# Quick hack to fix the Norway problem
# https://hitchdev.com/strictyaml/why/implicit-typing-removed/
yaml_content = yaml_content.replace("- no\n", '- "no"\n')
if yaml_content:
data = yaml.safe_load(yaml_content)
if isinstance(data, dict):
return data
else:
logger.error(
f"while reading YAML model card frontmatter, data is {type(data)} instead of dict"
)
return {}
else:
return {}
@staticmethod
def load_hf_parameters(model_path: Optional[Path] = None) -> dict[str, Any]:
@ -228,7 +248,11 @@ def get_model_id_components(
org_component, model_full_name_component = None, model_id
# Check if we erroneously matched against './' or '../' etc...
if org_component is not None and org_component[0] == ".":
if (
org_component is not None
and len(org_component) > 0
and org_component[0] == "."
):
org_component = None
name_parts: list[str] = model_full_name_component.split("-")
@ -387,27 +411,86 @@ def apply_metadata_heuristic(
########################
if model_card is not None:
if "model_name" in model_card and metadata.name is None:
# Not part of huggingface model card standard but notice some model creator using it
# such as TheBloke in 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'
metadata.name = model_card.get("model_name")
def use_model_card_metadata(metadata_key: str, model_card_key: str):
if (
model_card_key in model_card
and getattr(metadata, metadata_key, None) is None
):
setattr(metadata, metadata_key, model_card.get(model_card_key))
if "model_creator" in model_card and metadata.author is None:
# Not part of huggingface model card standard but notice some model creator using it
# such as TheBloke in 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'
metadata.author = model_card.get("model_creator")
def use_array_model_card_metadata(metadata_key: str, model_card_key: str):
# Note: Will append rather than replace if already exist
tags_value = model_card.get(model_card_key, None)
if tags_value is None:
return
if "model_type" in model_card and metadata.basename is None:
# Not part of huggingface model card standard but notice some model creator using it
# such as TheBloke in 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'
metadata.basename = model_card.get("model_type")
current_value = getattr(metadata, metadata_key, None)
if current_value is None:
current_value = []
if "base_model" in model_card:
if isinstance(tags_value, str):
current_value.append(tags_value)
elif isinstance(tags_value, list):
current_value.extend(tags_value)
setattr(metadata, metadata_key, current_value)
# LLAMA.cpp's direct internal convention
# (Definitely not part of hugging face formal/informal standard)
#########################################
use_model_card_metadata("name", "name")
use_model_card_metadata("author", "author")
use_model_card_metadata("version", "version")
use_model_card_metadata("organization", "organization")
use_model_card_metadata("description", "description")
use_model_card_metadata("finetune", "finetune")
use_model_card_metadata("basename", "basename")
use_model_card_metadata("size_label", "size_label")
use_model_card_metadata("source_url", "url")
use_model_card_metadata("source_doi", "doi")
use_model_card_metadata("source_uuid", "uuid")
use_model_card_metadata("source_repo_url", "repo_url")
# LLAMA.cpp's huggingface style convention
# (Definitely not part of hugging face formal/informal standard... but with model_ appended to match their style)
###########################################
use_model_card_metadata("name", "model_name")
use_model_card_metadata("author", "model_author")
use_model_card_metadata("version", "model_version")
use_model_card_metadata("organization", "model_organization")
use_model_card_metadata("description", "model_description")
use_model_card_metadata("finetune", "model_finetune")
use_model_card_metadata("basename", "model_basename")
use_model_card_metadata("size_label", "model_size_label")
use_model_card_metadata("source_url", "model_url")
use_model_card_metadata("source_doi", "model_doi")
use_model_card_metadata("source_uuid", "model_uuid")
use_model_card_metadata("source_repo_url", "model_repo_url")
# Hugging Face Direct Convention
#################################
# Not part of huggingface model card standard but notice some model creator using it
# such as TheBloke in 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'
use_model_card_metadata("name", "model_name")
use_model_card_metadata("author", "model_creator")
use_model_card_metadata("basename", "model_type")
if (
"base_model" in model_card
or "base_models" in model_card
or "base_model_sources" in model_card
):
# This represents the parent models that this is based on
# Example: stabilityai/stable-diffusion-xl-base-1.0. Can also be a list (for merges)
# Example of merges: https://huggingface.co/EmbeddedLLM/Mistral-7B-Merge-14-v0.1/blob/main/README.md
metadata_base_models = []
base_model_value = model_card.get("base_model", None)
base_model_value = model_card.get(
"base_model",
model_card.get(
"base_models", model_card.get("base_model_sources", None)
),
)
if base_model_value is not None:
if isinstance(base_model_value, str):
@ -420,86 +503,195 @@ def apply_metadata_heuristic(
for model_id in metadata_base_models:
# NOTE: model size of base model is assumed to be similar to the size of the current model
(
model_full_name_component,
org_component,
basename,
finetune,
version,
size_label,
) = Metadata.get_model_id_components(model_id, total_params)
base_model = {}
if model_full_name_component is not None:
base_model["name"] = Metadata.id_to_title(
model_full_name_component
)
if org_component is not None:
base_model["organization"] = Metadata.id_to_title(org_component)
if version is not None:
base_model["version"] = version
if (
org_component is not None
and model_full_name_component is not None
):
base_model["repo_url"] = (
f"https://huggingface.co/{org_component}/{model_full_name_component}"
if isinstance(model_id, str):
if (
model_id.startswith("http://")
or model_id.startswith("https://")
or model_id.startswith("ssh://")
):
base_model["repo_url"] = model_id
# Check if Hugging Face ID is present in URL
if "huggingface.co" in model_id:
match = re.match(
r"https?://huggingface.co/([^/]+/[^/]+)$", model_id
)
if match:
model_id_component = match.group(1)
(
model_full_name_component,
org_component,
basename,
finetune,
version,
size_label,
) = Metadata.get_model_id_components(
model_id_component, total_params
)
# Populate model dictionary with extracted components
if model_full_name_component is not None:
base_model["name"] = Metadata.id_to_title(
model_full_name_component
)
if org_component is not None:
base_model["organization"] = (
Metadata.id_to_title(org_component)
)
if version is not None:
base_model["version"] = version
else:
# Likely a Hugging Face ID
(
model_full_name_component,
org_component,
basename,
finetune,
version,
size_label,
) = Metadata.get_model_id_components(model_id, total_params)
# Populate model dictionary with extracted components
if model_full_name_component is not None:
base_model["name"] = Metadata.id_to_title(
model_full_name_component
)
if org_component is not None:
base_model["organization"] = Metadata.id_to_title(
org_component
)
if version is not None:
base_model["version"] = version
if (
org_component is not None
and model_full_name_component is not None
):
base_model["repo_url"] = (
f"https://huggingface.co/{org_component}/{model_full_name_component}"
)
elif isinstance(model_id, dict):
base_model = model_id
else:
logger.error(
f"base model entry '{str(model_id)}' not in a known format"
)
metadata.base_models.append(base_model)
if "license" in model_card and metadata.license is None:
metadata.license = model_card.get("license")
if (
"datasets" in model_card
or "dataset" in model_card
or "dataset_sources" in model_card
):
# This represents the datasets that this was trained from
metadata_datasets = []
dataset_value = model_card.get(
"datasets",
model_card.get("dataset", model_card.get("dataset_sources", None)),
)
if "license_name" in model_card and metadata.license_name is None:
metadata.license_name = model_card.get("license_name")
if "license_link" in model_card and metadata.license_link is None:
metadata.license_link = model_card.get("license_link")
tags_value = model_card.get("tags", None)
if tags_value is not None:
if metadata.tags is None:
metadata.tags = []
if isinstance(tags_value, str):
metadata.tags.append(tags_value)
elif isinstance(tags_value, list):
metadata.tags.extend(tags_value)
pipeline_tags_value = model_card.get("pipeline_tag", None)
if pipeline_tags_value is not None:
if metadata.tags is None:
metadata.tags = []
if isinstance(pipeline_tags_value, str):
metadata.tags.append(pipeline_tags_value)
elif isinstance(pipeline_tags_value, list):
metadata.tags.extend(pipeline_tags_value)
language_value = model_card.get(
"languages", model_card.get("language", None)
)
if language_value is not None:
if metadata.languages is None:
metadata.languages = []
if isinstance(language_value, str):
metadata.languages.append(language_value)
elif isinstance(language_value, list):
metadata.languages.extend(language_value)
dataset_value = model_card.get("datasets", model_card.get("dataset", None))
if dataset_value is not None:
if dataset_value is not None:
if isinstance(dataset_value, str):
metadata_datasets.append(dataset_value)
elif isinstance(dataset_value, list):
metadata_datasets.extend(dataset_value)
if metadata.datasets is None:
metadata.datasets = []
if isinstance(dataset_value, str):
metadata.datasets.append(dataset_value)
elif isinstance(dataset_value, list):
metadata.datasets.extend(dataset_value)
for dataset_id in metadata_datasets:
# NOTE: model size of base model is assumed to be similar to the size of the current model
dataset = {}
if isinstance(dataset_id, str):
if dataset_id.startswith(("http://", "https://", "ssh://")):
dataset["repo_url"] = dataset_id
# Check if Hugging Face ID is present in URL
if "huggingface.co" in dataset_id:
match = re.match(
r"https?://huggingface.co/([^/]+/[^/]+)$",
dataset_id,
)
if match:
dataset_id_component = match.group(1)
(
dataset_name_component,
org_component,
basename,
finetune,
version,
size_label,
) = Metadata.get_model_id_components(
dataset_id_component, total_params
)
# Populate dataset dictionary with extracted components
if dataset_name_component is not None:
dataset["name"] = Metadata.id_to_title(
dataset_name_component
)
if org_component is not None:
dataset["organization"] = Metadata.id_to_title(
org_component
)
if version is not None:
dataset["version"] = version
else:
# Likely a Hugging Face ID
(
dataset_name_component,
org_component,
basename,
finetune,
version,
size_label,
) = Metadata.get_model_id_components(
dataset_id, total_params
)
# Populate dataset dictionary with extracted components
if dataset_name_component is not None:
dataset["name"] = Metadata.id_to_title(
dataset_name_component
)
if org_component is not None:
dataset["organization"] = Metadata.id_to_title(
org_component
)
if version is not None:
dataset["version"] = version
if (
org_component is not None
and dataset_name_component is not None
):
dataset["repo_url"] = (
f"https://huggingface.co/{org_component}/{dataset_name_component}"
)
elif isinstance(dataset_id, dict):
dataset = dataset_id
else:
logger.error(
f"dataset entry '{str(dataset_id)}' not in a known format"
)
metadata.datasets.append(dataset)
use_model_card_metadata("license", "license")
use_model_card_metadata("license_name", "license_name")
use_model_card_metadata("license_link", "license_link")
use_array_model_card_metadata("tags", "tags")
use_array_model_card_metadata("tags", "pipeline_tag")
use_array_model_card_metadata("languages", "languages")
use_array_model_card_metadata("languages", "language")
# Hugging Face Parameter Heuristics
####################################
@ -508,7 +700,7 @@ def apply_metadata_heuristic(
hf_name_or_path = hf_params.get("_name_or_path")
if hf_name_or_path is not None and hf_name_or_path.count("/") <= 1:
# Use _name_or_path only if it's actually a model name and not some computer path
# Use _name_or_path only if its actually a model name and not some computer path
# e.g. 'meta-llama/Llama-2-7b-hf'
model_id = hf_name_or_path
(
@ -584,7 +776,10 @@ def set_gguf_meta_model(self, gguf_writer: gguf.GGUFWriter):
gguf_writer.add_size_label(self.size_label)
if self.license is not None:
gguf_writer.add_license(self.license)
if isinstance(self.license, list):
gguf_writer.add_license(",".join(self.license))
else:
gguf_writer.add_license(self.license)
if self.license_name is not None:
gguf_writer.add_license_name(self.license_name)
if self.license_link is not None:
@ -621,6 +816,10 @@ def set_gguf_meta_model(self, gguf_writer: gguf.GGUFWriter):
gguf_writer.add_base_model_organization(
key, base_model_entry["organization"]
)
if "description" in base_model_entry:
gguf_writer.add_base_model_description(
key, base_model_entry["description"]
)
if "url" in base_model_entry:
gguf_writer.add_base_model_url(key, base_model_entry["url"])
if "doi" in base_model_entry:
@ -632,9 +831,33 @@ def set_gguf_meta_model(self, gguf_writer: gguf.GGUFWriter):
key, base_model_entry["repo_url"]
)
if self.datasets is not None:
gguf_writer.add_dataset_count(len(self.datasets))
for key, dataset_entry in enumerate(self.datasets):
if "name" in dataset_entry:
gguf_writer.add_dataset_name(key, dataset_entry["name"])
if "author" in dataset_entry:
gguf_writer.add_dataset_author(key, dataset_entry["author"])
if "version" in dataset_entry:
gguf_writer.add_dataset_version(key, dataset_entry["version"])
if "organization" in dataset_entry:
gguf_writer.add_dataset_organization(
key, dataset_entry["organization"]
)
if "description" in dataset_entry:
gguf_writer.add_dataset_description(
key, dataset_entry["description"]
)
if "url" in dataset_entry:
gguf_writer.add_dataset_url(key, dataset_entry["url"])
if "doi" in dataset_entry:
gguf_writer.add_dataset_doi(key, dataset_entry["doi"])
if "uuid" in dataset_entry:
gguf_writer.add_dataset_uuid(key, dataset_entry["uuid"])
if "repo_url" in dataset_entry:
gguf_writer.add_dataset_repo_url(key, dataset_entry["repo_url"])
if self.tags is not None:
gguf_writer.add_tags(self.tags)
if self.languages is not None:
gguf_writer.add_languages(self.languages)
if self.datasets is not None:
gguf_writer.add_datasets(self.datasets)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,11 @@
from __future__ import annotations
from dataclasses import dataclass
from typing import Literal
import os
import json
def fill_templated_filename(filename: str, output_type: str | None) -> str:
# Given a file name fill in any type templates e.g. 'some-model-name.{ftype}.gguf'
@ -67,7 +71,7 @@ def naming_convention(
output_type: str | None,
model_type: Literal["vocab", "LoRA"] | None = None,
) -> str:
# Reference: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention
# Reference: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#gguf-naming-convention
if base_name is not None:
name = base_name.strip().replace(" ", "-").replace("/", "-")
@ -99,3 +103,214 @@ def naming_convention(
kind = f"-{model_type.strip().replace(' ', '-')}" if model_type is not None else ""
return f"{name}{parameters}{finetune}{version}{encoding}{kind}"
@dataclass
class RemoteTensor:
dtype: str
shape: tuple[int, ...]
offset_start: int
size: int
url: str
def data(self) -> bytearray:
# TODO: handle request errors (maybe with limited retries?)
# NOTE: using a bytearray, otherwise PyTorch complains the buffer is not writeable
data = bytearray(
SafetensorRemote.get_data_by_range(
url=self.url, start=self.offset_start, size=self.size
)
)
return data
class SafetensorRemote:
"""
Uility class to handle remote safetensor files.
This class is designed to work with Hugging Face model repositories.
Example (one model has single safetensor file, the other has multiple):
for model_id in ["ngxson/TEST-Tiny-Llama4", "Qwen/Qwen2.5-7B-Instruct"]:
tensors = SafetensorRemote.get_list_tensors_hf_model(model_id)
print(tensors)
Example reading tensor data:
tensors = SafetensorRemote.get_list_tensors_hf_model(model_id)
for name, meta in tensors.items():
dtype, shape, offset_start, size, remote_safetensor_url = meta
# read the tensor data
data = SafetensorRemote.get_data_by_range(remote_safetensor_url, offset_start, size)
print(data)
"""
BASE_DOMAIN = "https://huggingface.co"
ALIGNMENT = 8 # bytes
@classmethod
def get_list_tensors_hf_model(cls, model_id: str) -> dict[str, RemoteTensor]:
"""
Get list of tensors from a Hugging Face model repository.
Returns a dictionary of tensor names and their metadata.
Each tensor is represented as a tuple of (dtype, shape, offset_start, size, remote_safetensor_url)
"""
# case 1: model has only one single model.safetensor file
is_single_file = cls.check_file_exist(
f"{cls.BASE_DOMAIN}/{model_id}/resolve/main/model.safetensors"
)
if is_single_file:
url = f"{cls.BASE_DOMAIN}/{model_id}/resolve/main/model.safetensors"
return cls.get_list_tensors(url)
# case 2: model has multiple files
index_url = (
f"{cls.BASE_DOMAIN}/{model_id}/resolve/main/model.safetensors.index.json"
)
is_multiple_files = cls.check_file_exist(index_url)
if is_multiple_files:
# read the index file
index_data = cls.get_data_by_range(index_url, 0)
index_str = index_data.decode("utf-8")
index_json = json.loads(index_str)
assert (
index_json.get("weight_map") is not None
), "weight_map not found in index file"
weight_map = index_json["weight_map"]
# get the list of files
all_files = list(set(weight_map.values()))
all_files.sort() # make sure we load shard files in order
# get the list of tensors
tensors: dict[str, RemoteTensor] = {}
for file in all_files:
url = f"{cls.BASE_DOMAIN}/{model_id}/resolve/main/{file}"
for key, val in cls.get_list_tensors(url).items():
tensors[key] = val
return tensors
raise ValueError(f"Model {model_id} does not have any safetensor files")
@classmethod
def get_list_tensors(cls, url: str) -> dict[str, RemoteTensor]:
"""
Get list of tensors from a remote safetensor file.
Returns a dictionary of tensor names and their metadata.
Each tensor is represented as a tuple of (dtype, shape, offset_start, size)
"""
metadata, data_start_offset = cls.get_metadata(url)
res: dict[str, RemoteTensor] = {}
for name, meta in metadata.items():
if name == "__metadata__":
continue
if not isinstance(meta, dict):
raise ValueError(f"Invalid metadata for tensor '{name}': {meta}")
try:
dtype = meta["dtype"]
shape = meta["shape"]
offset_start_relative, offset_end_relative = meta["data_offsets"]
size = offset_end_relative - offset_start_relative
offset_start = data_start_offset + offset_start_relative
res[name] = RemoteTensor(
dtype=dtype,
shape=tuple(shape),
offset_start=offset_start,
size=size,
url=url,
)
except KeyError as e:
raise ValueError(
f"Missing key in metadata for tensor '{name}': {e}, meta = {meta}"
)
return res
@classmethod
def get_metadata(cls, url: str) -> tuple[dict, int]:
"""
Get JSON metadata from a remote safetensor file.
Returns tuple of (metadata, data_start_offset)
"""
# Request first 5MB of the file (hopefully enough for metadata)
read_size = 5 * 1024 * 1024
raw_data = cls.get_data_by_range(url, 0, read_size)
# Parse header
# First 8 bytes contain the metadata length as u64 little-endian
if len(raw_data) < 8:
raise ValueError("Not enough data to read metadata size")
metadata_length = int.from_bytes(raw_data[:8], byteorder="little")
# Calculate the data start offset
data_start_offset = 8 + metadata_length
alignment = SafetensorRemote.ALIGNMENT
if data_start_offset % alignment != 0:
data_start_offset += alignment - (data_start_offset % alignment)
# Check if we have enough data to read the metadata
if len(raw_data) < 8 + metadata_length:
raise ValueError(
f"Could not read complete metadata. Need {8 + metadata_length} bytes, got {len(raw_data)}"
)
# Extract metadata bytes and parse as JSON
metadata_bytes = raw_data[8 : 8 + metadata_length]
metadata_str = metadata_bytes.decode("utf-8")
try:
metadata = json.loads(metadata_str)
return metadata, data_start_offset
except json.JSONDecodeError as e:
raise ValueError(f"Failed to parse safetensor metadata as JSON: {e}")
@classmethod
def get_data_by_range(cls, url: str, start: int, size: int = -1) -> bytes:
"""
Get raw byte data from a remote file by range.
If size is not specified, it will read the entire file.
"""
import requests
from urllib.parse import urlparse
parsed_url = urlparse(url)
if not parsed_url.scheme or not parsed_url.netloc:
raise ValueError(f"Invalid URL: {url}")
headers = cls._get_request_headers()
if size > -1:
headers["Range"] = f"bytes={start}-{start + size}"
response = requests.get(url, allow_redirects=True, headers=headers)
response.raise_for_status()
# Get raw byte data
return response.content[:size]
@classmethod
def check_file_exist(cls, url: str) -> bool:
"""
Check if a file exists at the given URL.
Returns True if the file exists, False otherwise.
"""
import requests
from urllib.parse import urlparse
parsed_url = urlparse(url)
if not parsed_url.scheme or not parsed_url.netloc:
raise ValueError(f"Invalid URL: {url}")
try:
headers = cls._get_request_headers()
headers["Range"] = "bytes=0-0"
response = requests.head(url, allow_redirects=True, headers=headers)
# Success (2xx) or redirect (3xx)
return 200 <= response.status_code < 400
except requests.RequestException:
return False
@classmethod
def _get_request_headers(cls) -> dict[str, str]:
"""Prepare common headers for requests."""
headers = {"User-Agent": "convert_hf_to_gguf"}
if os.environ.get("HF_TOKEN"):
headers["Authorization"] = f"Bearer {os.environ['HF_TOKEN']}"
return headers

View File

@ -157,8 +157,36 @@ def _try_load_from_tokenizer_json(self, path: Path) -> bool:
tokenizer = json.load(f)
if self.load_merges:
merges = tokenizer.get("model", {}).get("merges")
if isinstance(merges, list) and merges and isinstance(merges[0], str):
self.merges = merges
if isinstance(merges, list) and merges:
if isinstance(merges[0], str):
self.merges = merges
elif (
isinstance(merges[0], list)
and len(merges[0]) == 2
and isinstance(merges[0][0], str)
):
# New format since transformers 4.45 to support spaces in merges
# ref: https://github.com/ggml-org/llama.cpp/issues/9692
# TODO: internally store as the new format instead of converting to old
if any(" " in s for pair in merges for s in pair):
logger.warning(
f'Spaces in merges detected, encoding as {chr(ord(" ") + 256)!r}'
)
self.merges = [
" ".join(
[
# ensure the spaces are properly encoded
"".join(
chr(ord(c) + 256) if c == " " else c
for c in part
)
for part in pair
]
)
for pair in merges
]
else:
raise ValueError("Unknown tokenizer merges format")
added_tokens = tokenizer.get("added_tokens", {})
else:
added_tokens = {}
@ -167,7 +195,12 @@ def _try_load_from_tokenizer_json(self, path: Path) -> bool:
return True
with open(tokenizer_config_file, encoding="utf-8") as f:
tokenizer_config = json.load(f)
chat_template = tokenizer_config.get("chat_template")
chat_template_alt = None
chat_template_file = path / "chat_template.json"
if chat_template_file.is_file():
with open(chat_template_file, encoding="utf-8") as f:
chat_template_alt = json.load(f).get("chat_template")
chat_template = tokenizer_config.get("chat_template", chat_template_alt)
if chat_template is None or isinstance(chat_template, (str, list)):
self.chat_template = chat_template
else:
@ -225,7 +258,6 @@ class Vocab(BaseVocab, Protocol):
fname_tokenizer: Path
def __init__(self, base_path: Path): ...
def all_tokens(self) -> Iterable[tuple[bytes, float, gguf.TokenType]]: ...

View File

@ -1,7 +1,7 @@
import os
import re
import sys
from typing import Any, List, TextIO, Union
from typing import Any, IO, List, TextIO, Union
from PySide6.QtWidgets import (
QMessageBox,
@ -80,11 +80,15 @@ def load_dotenv(self=Any) -> None:
def show_about(self) -> None:
about_text = (
"AutoGGUF\n\n"
f"Version: {AUTOGGUF_VERSION}\n\n"
"A tool for managing and converting GGUF models."
)
about_text = f"""AutoGGUF
Version: {AUTOGGUF_VERSION}
A tool for managing and converting GGUF models.
This application is licensed under the Apache License 2.0.
Copyright (c) 2024-2025 leafspark.
It also utilizes llama.cpp, licensed under the MIT License.
Copyright (c) 2023-2025 The ggml authors."""
QMessageBox.about(self, "About AutoGGUF", about_text)
@ -93,7 +97,7 @@ def ensure_directory(path) -> None:
os.makedirs(path)
def open_file_safe(file_path, mode="r") -> TextIO:
def open_file_safe(file_path, mode="r") -> IO[Any]:
encodings = ["utf-8", "latin-1", "ascii", "utf-16"]
for encoding in encodings:
try:

View File

@ -1,7 +1,7 @@
import json
from PySide6.QtCore import Qt
from PySide6.QtWidgets import QFileDialog, QMessageBox
from PySide6.QtWidgets import QApplication, QFileDialog, QMessageBox
from Localizations import (
SAVING_PRESET,
SAVE_PRESET,
@ -36,20 +36,40 @@ def save_preset(self) -> None:
"extra_arguments": self.extra_arguments.text(),
}
file_name, _ = QFileDialog.getSaveFileName(self, SAVE_PRESET, "", JSON_FILES)
if file_name:
with open(file_name, "w") as f:
json.dump(preset, f, indent=4)
QMessageBox.information(self, PRESET_SAVED, PRESET_SAVED_TO.format(file_name))
self.logger.info(PRESET_SAVED_TO.format(file_name))
if not QApplication.keyboardModifiers() & Qt.ShiftModifier:
file_name, _ = QFileDialog.getSaveFileName(self, SAVE_PRESET, "", JSON_FILES)
if file_name:
with open(file_name, "w") as f:
json.dump(preset, f, indent=4)
QMessageBox.information(
self, PRESET_SAVED, PRESET_SAVED_TO.format(file_name)
)
self.logger.info(PRESET_SAVED_TO.format(file_name))
else:
clipboard = QApplication.clipboard()
preset_str = json.dumps(preset, indent=1)
clipboard.setText(preset_str)
QMessageBox.information(self, PRESET_SAVED, "Preset copied to clipboard")
self.logger.info("Preset copied to clipboard")
def load_preset(self) -> None:
self.logger.info(LOADING_PRESET)
file_name, _ = QFileDialog.getOpenFileName(self, LOAD_PRESET, "", JSON_FILES)
if file_name:
with open(file_name, "r") as f:
preset = json.load(f)
try:
if QApplication.keyboardModifiers() & Qt.ShiftModifier:
clipboard = QApplication.clipboard()
preset = json.loads(clipboard.text())
source = "clipboard"
else:
file_name, _ = QFileDialog.getOpenFileName(
self, LOAD_PRESET, "", JSON_FILES
)
if not file_name:
return
with open(file_name, "r") as f:
preset = json.load(f)
source = file_name
self.quant_type.clearSelection()
for quant_type in preset.get("quant_types", []):
@ -80,6 +100,19 @@ def load_preset(self) -> None:
self.add_kv_override(override)
QMessageBox.information(
self, PRESET_LOADED, PRESET_LOADED_FROM.format(file_name)
self,
PRESET_LOADED,
PRESET_LOADED_FROM.format(
source
if not QApplication.keyboardModifiers() & Qt.ShiftModifier
else "clipboard"
),
)
self.logger.info(PRESET_LOADED_FROM.format(file_name))
self.logger.info(PRESET_LOADED_FROM.format(source))
except json.JSONDecodeError:
QMessageBox.critical(self, "Error", "Invalid JSON in clipboard")
self.logger.error("Failed to parse JSON from clipboard")
except Exception as e:
QMessageBox.critical(self, "Error", f"Failed to load preset: {str(e)}")
self.logger.error(f"Failed to load preset: {str(e)}")

View File

@ -159,7 +159,9 @@ def update_cuda_backends(self) -> None:
for item in os.listdir(llama_bin):
item_path = os.path.join(llama_bin, item)
if os.path.isdir(item_path) and "cudart-llama" not in item.lower():
if "cu1" in item.lower(): # Only include CUDA-capable backends
if (
"cu1" in item.lower() or "cuda-1" in item.lower()
): # Only include CUDA-capable backends
self.backend_combo_cuda.addItem(item, userData=item_path)
if self.backend_combo_cuda.count() == 0:

View File

@ -1,6 +1,10 @@
from typing import Any, Union
import requests
import urllib.request
import urllib.error
import json
import ssl
import certifi
from PySide6.QtCore import Qt
from PySide6.QtWidgets import QFileDialog, QInputDialog, QMenu
@ -169,18 +173,47 @@ def download_llama_cpp(self) -> None:
self.download_progress.setValue(0)
def get_repo_from_env() -> tuple[str, str]:
repo = os.getenv("AUTOGGUF_BACKEND_REPO", "ggerganov/llama.cpp")
if not repo or "/" not in repo:
raise ValueError(INVALID_REPOSITORY_FORMAT)
owner, repo_name = repo.split("/", 1)
if not all(part.strip() for part in (owner, repo_name)):
raise ValueError(REPO_CANNOT_BE_EMPTY)
return owner, repo_name
def refresh_releases(self) -> None:
self.logger.info(REFRESHING_LLAMACPP_RELEASES)
try:
response = requests.get(
"https://api.github.com/repos/ggerganov/llama.cpp/releases"
)
response.raise_for_status() # Raise an exception for bad status codes
releases = response.json()
owner, repo = get_repo_from_env()
url = f"https://api.github.com/repos/{owner}/{repo}/releases"
# Create SSL context with certifi certificates
ssl_context = ssl.create_default_context(cafile=certifi.where())
# Create request
req = urllib.request.Request(url)
# Make the request
with urllib.request.urlopen(req, context=ssl_context) as response:
if response.status != 200:
raise urllib.error.HTTPError(
url, response.status, "HTTP Error", response.headers, None
)
releases = json.loads(response.read().decode("utf-8"))
self.release_combo.clear()
for release in releases:
self.release_combo.addItem(release["tag_name"], userData=release)
self.release_combo.currentIndexChanged.connect(self.update_assets)
self.update_assets()
except requests.exceptions.RequestException as e:
except ValueError as e:
show_error(self.logger, f"Invalid repository configuration: {str(e)}")
except (urllib.error.URLError, urllib.error.HTTPError) as e:
show_error(self.logger, ERROR_FETCHING_RELEASES.format(str(e)))