add support for qwen3.5 vl model#693
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCollects multiple no-split module classes for FSDP wrap policies, adds VLM detection/loading/extraction and utilities, detects M‑RoPE to adjust FlashAttention usage (preferring SDPA when needed), and exposes Model._use_local_mamba_kernels() to attempt swapping local transformer kernels. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Client
participant Model as ModelLoader
participant VLM as vlm_utils
participant HF as HuggingFaceAuto
participant Kernels as KernelSelector
CLI->>Model: load_model(model_path, load_kwargs)
Model->>VLM: is_vlm_with_causal_lm(model_path)?
alt VLM wraps CausalLM
VLM-->>Model: True
Model->>VLM: extract_causal_lm_from_vlm(model_path, load_kwargs)
VLM-->>Model: PreTrained CausalLM
else VLM direct-load candidate
Model->>VLM: is_vlm_for_direct_loading(model_path)?
alt direct-load
Model->>VLM: load_vlm_for_text_training(model_path, load_kwargs)
VLM-->>Model: PreTrained model
else standard
Model->>HF: AutoModelForCausalLM.from_pretrained(model_path, **load_kwargs)
HF-->>Model: PreTrained model
end
end
Model->>VLM: has_mrope(model_path)?
alt M‑RoPE detected
Model->>Kernels: disable FlashAttention2 -> use SDPA
else no M‑RoPE
Model->>Kernels: select flash/eager per GPU & GPT‑OSS rules
end
Kernels->>Model: optionally _use_local_mamba_kernels() to swap kernels
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/instructlab/training/accelerator.py`:
- Around line 148-157: The code currently raises a ValueError when no classes
are resolved from self.model._no_split_modules; instead, try resolving names
against the underlying HF model (e.g., check getattr(self.model, "model", None)
or getattr(self.model, "hf_model", None) and call get_module_class_from_name
against that if the first pass yields nothing), and if still no layer_classes,
do not raise—instead emit a clear warning (use warnings.warn or the module
logger) that resolution failed and that FSDP will fall back to a safe wrap
policy, then continue with an empty layer_classes so the downstream FSDP wrap
logic uses the safe/default policy. Ensure you reference
get_module_class_from_name, self.model._no_split_modules, and layer_classes in
the change.
In `@src/instructlab/training/model.py`:
- Around line 125-131: The _has_mrope() helper should catch config-load errors
and validate types: wrap AutoConfig.from_pretrained(model_path) in a try/except
and return False on any exception, then retrieve rope_scaling safely by checking
attributes and types (use getattr(config, "rope_scaling", None) and
getattr(getattr(config, "text_config", None), "rope_scaling", None) but ensure
the resulting rope_scaling is a dict or has __contains__ before testing
"mrope_section"); if type checks fail, return False so M-RoPE detection is a
safe, non-fatal fallback.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/instructlab/training/vlm_utils.py (1)
145-157:⚠️ Potential issue | 🟠 MajorMake
has_mrope()a best-effort probe.This still lets
AutoConfig.from_pretrained()failures and non-mapping rope fields abort model init, even though the check only decides between FA2 and SDPA. It should returnFalseunless it can safely confirmmrope_section.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 145 - 157, Make has_mrope() a best-effort probe by catching errors from AutoConfig.from_pretrained and returning False on exception, and by treating rope fields safely: after loading config (in has_mrope) wrap the from_pretrained call in a try/except and return False on failure, then when collecting candidates check both config and getattr(config, "text_config", None) but only inspect rope_scaling/rope_parameters if they exist and are mappings (use isinstance(rope, Mapping) or similar) before looking for "mrope_section"; only return True when you can safely confirm the key exists, otherwise return False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/instructlab/training/model.py`:
- Around line 123-133: Replace the hardcoded GPU index 0 with the actual CUDA
device for this process (use torch.cuda.current_device() when querying
capabilities) so the SM check uses the pinned/LOCAL_RANK GPU; update the call
torch.cuda.get_device_capability(0) to
torch.cuda.get_device_capability(torch.cuda.current_device()) (and keep the rest
of the logic that sets self.base_model_args["attn_implementation"] and the
logger warning intact) to avoid selecting flash-attn3 based on the wrong device.
In `@src/instructlab/training/vlm_utils.py`:
- Line 29: The utility currently hardcodes AutoConfig.from_pretrained(...,
trust_remote_code=True); update the three functions is_vlm_with_causal_lm,
extract_causal_lm_from_vlm, and has_mrope to accept an optional parameter
trust_remote_code: bool = False and pass that parameter into their calls to
AutoConfig.from_pretrained(model_path, trust_remote_code=trust_remote_code) (and
propagate it through any internal helper calls), so callers can control whether
remote code is trusted instead of forcing True; ensure default remains False.
- Around line 117-125: Avoid instantiating a full random-weight model: create
the causal wrapper using an empty-weight context (e.g.,
transformers.init_empty_weights) around causal_lm_class(text_config) so no large
tensors are allocated, then attach the real submodules (set text_model.model =
backbone and text_model.lm_head = vlm.lm_head) and copy quantization metadata
from vlm (e.g., text_model.is_loaded_in_4bit, text_model.is_loaded_in_8bit,
text_model.hf_quantizer = getattr(vlm, 'hf_quantizer', None)) so
Model.prepare_peft_model()/prepare_model_for_kbit_training() sees the correct
flags; locate code around AutoConfig.from_pretrained, text_config,
causal_lm_class, and text_model to implement this change.
---
Duplicate comments:
In `@src/instructlab/training/vlm_utils.py`:
- Around line 145-157: Make has_mrope() a best-effort probe by catching errors
from AutoConfig.from_pretrained and returning False on exception, and by
treating rope fields safely: after loading config (in has_mrope) wrap the
from_pretrained call in a try/except and return False on failure, then when
collecting candidates check both config and getattr(config, "text_config", None)
but only inspect rope_scaling/rope_parameters if they exist and are mappings
(use isinstance(rope, Mapping) or similar) before looking for "mrope_section";
only return True when you can safely confirm the key exists, otherwise return
False.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 483f2af4-0503-4536-ae08-c08400328ba2
📒 Files selected for processing (2)
src/instructlab/training/model.pysrc/instructlab/training/vlm_utils.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/instructlab/training/vlm_utils.py (3)
192-218:⚠️ Potential issue | 🟠 MajorMake
has_mrope()a safe fallback.This runs during
Model.__init__, so a config fetch failure or an unexpected non-mappingrope_*value will currently abort model startup just to decide an attention backend. Please catch config-load errors and returnFalse, and only test"mrope_section"on mapping-like values.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 192 - 218, The has_mrope function can crash during Model.__init__ if AutoConfig.from_pretrained fails or if rope_scaling/rope_parameters are not mapping-like; wrap the config load in a try/except and return False on any exception, then when iterating candidates only treat mapping-like values (e.g., isinstance(rope, dict) or collections.abc.Mapping) before checking for the "mrope_section" key; keep checks against both top-level config and getattr(config, "text_config", None) as before and return any("mrope_section" in rope for rope in candidates) only after filtering to valid mappings.
15-32:⚠️ Potential issue | 🟠 MajorDon’t force
trust_remote_code=Truein the VLM probes.These helpers unconditionally trust remote code during config inspection, but the subsequent model load path does not thread the same trust decision through
load_kwargs. That both widens the trust boundary and can make probe/load behavior disagree. Please taketrust_remote_codefrom the caller and apply it consistently across both probe and load paths.Also applies to: 45-59, 179-179, 206-206
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 15 - 32, The probes currently call AutoConfig.from_pretrained with trust_remote_code=True unconditionally (e.g., in is_vlm_with_causal_lm) which mis-aligns with model load trust decisions; add a trust_remote_code: bool parameter to the probe functions (including is_vlm_with_causal_lm and the other VLM probe helpers referenced around lines 45-59, 179, 206), pass that value into AutoConfig.from_pretrained, and ensure the same trust_remote_code flag is threaded into any subsequent model-loading/inspection codepaths (e.g., the load_kwargs or calls that later load the model) so the probe and load behavior use the same trust decision.
178-186:⚠️ Potential issue | 🟠 MajorAvoid allocating a second full CausalLM wrapper here.
causal_lm_class(text_config)materializes a fresh random-weight model beforemodelandlm_headare immediately replaced. For multi-B checkpoints this creates a large transient RAM spike, and the replacement wrapper also drops quantization flags thatModel.prepare_peft_model()insrc/instructlab/training/model.py:286-293relies on. Build the wrapper with empty/meta weights and copy the quantization metadata fromvlm.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 178 - 186, The current code instantiates a full random-weight CausalLM via causal_lm_class(text_config) which creates a large transient allocation and loses quantization metadata; instead construct the CausalLM wrapper with empty/meta weights (e.g., use an empty init context such as init_empty_weights or the model.config-only constructor) so no heavy tensors are allocated, then assign text_model.model = backbone and text_model.lm_head = vlm.lm_head, and copy any quantization/metadata flags from vlm to text_model (so Model.prepare_peft_model in src/instructlab/training/model.py can detect them) before returning/using text_model; locate these symbols: AutoConfig.from_pretrained, MODEL_FOR_CAUSAL_LM_MAPPING, causal_lm_class, text_model, vlm, and Model.prepare_peft_model to implement the change.src/instructlab/training/model.py (1)
120-133:⚠️ Potential issue | 🟠 MajorUse the process-local CUDA device for the GPT-OSS SM check.
Line 123 still probes device
0. In multi-GPU runs this process may be pinned to a different GPU, soattn_implementationcan be chosen from the wrong device capability.Suggested fix
- major, _ = torch.cuda.get_device_capability(0) + device_id = torch.cuda.current_device() + major, _ = torch.cuda.get_device_capability(device_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 120 - 133, The SM-level check currently calls torch.cuda.get_device_capability(0) which always probes device 0; change it to probe the process-local CUDA device by using torch.cuda.current_device() (e.g., device = torch.cuda.current_device()) and then call torch.cuda.get_device_capability(device). Update the logic in the self.is_gpt_oss branch that sets self.base_model_args["attn_implementation"] and the logger.warning message to use the retrieved capability from the correct device so the attn_implementation choice and warning reflect the process's actual GPU.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/instructlab/training/model.py`:
- Around line 377-385: The mismatch check uses a truthiness test that skips
valid EOS IDs of 0; change the condition in the reconciliation block to
explicitly check for None: replace the middle clause `and
self.tokenizer.eos_token_id` with `and self.tokenizer.eos_token_id is not None`
so the guarded expression becomes `if text_config.eos_token_id is not None and
self.tokenizer.eos_token_id is not None and text_config.eos_token_id !=
self.tokenizer.eos_token_id:` and keep the warning and assignment that set
`text_config.eos_token_id = self.tokenizer.eos_token_id`.
---
Duplicate comments:
In `@src/instructlab/training/model.py`:
- Around line 120-133: The SM-level check currently calls
torch.cuda.get_device_capability(0) which always probes device 0; change it to
probe the process-local CUDA device by using torch.cuda.current_device() (e.g.,
device = torch.cuda.current_device()) and then call
torch.cuda.get_device_capability(device). Update the logic in the
self.is_gpt_oss branch that sets self.base_model_args["attn_implementation"] and
the logger.warning message to use the retrieved capability from the correct
device so the attn_implementation choice and warning reflect the process's
actual GPU.
In `@src/instructlab/training/vlm_utils.py`:
- Around line 192-218: The has_mrope function can crash during Model.__init__ if
AutoConfig.from_pretrained fails or if rope_scaling/rope_parameters are not
mapping-like; wrap the config load in a try/except and return False on any
exception, then when iterating candidates only treat mapping-like values (e.g.,
isinstance(rope, dict) or collections.abc.Mapping) before checking for the
"mrope_section" key; keep checks against both top-level config and
getattr(config, "text_config", None) as before and return any("mrope_section" in
rope for rope in candidates) only after filtering to valid mappings.
- Around line 15-32: The probes currently call AutoConfig.from_pretrained with
trust_remote_code=True unconditionally (e.g., in is_vlm_with_causal_lm) which
mis-aligns with model load trust decisions; add a trust_remote_code: bool
parameter to the probe functions (including is_vlm_with_causal_lm and the other
VLM probe helpers referenced around lines 45-59, 179, 206), pass that value into
AutoConfig.from_pretrained, and ensure the same trust_remote_code flag is
threaded into any subsequent model-loading/inspection codepaths (e.g., the
load_kwargs or calls that later load the model) so the probe and load behavior
use the same trust decision.
- Around line 178-186: The current code instantiates a full random-weight
CausalLM via causal_lm_class(text_config) which creates a large transient
allocation and loses quantization metadata; instead construct the CausalLM
wrapper with empty/meta weights (e.g., use an empty init context such as
init_empty_weights or the model.config-only constructor) so no heavy tensors are
allocated, then assign text_model.model = backbone and text_model.lm_head =
vlm.lm_head, and copy any quantization/metadata flags from vlm to text_model (so
Model.prepare_peft_model in src/instructlab/training/model.py can detect them)
before returning/using text_model; locate these symbols:
AutoConfig.from_pretrained, MODEL_FOR_CAUSAL_LM_MAPPING, causal_lm_class,
text_model, vlm, and Model.prepare_peft_model to implement the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3ec0b7c0-a94e-4319-b818-e090fd278c58
📒 Files selected for processing (2)
src/instructlab/training/model.pysrc/instructlab/training/vlm_utils.py
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/instructlab/training/model.py (2)
118-122:⚠️ Potential issue | 🟠 MajorUse the current CUDA device for the SM check.
In distributed runs this process may be pinned to a nonzero
LOCAL_RANK. Querying device0here can selectvllm-flash-attn3from the wrong GPU and then fail or fall back unnecessarily on the actual training device.Suggested fix
- major, _ = torch.cuda.get_device_capability(0) + device_id = torch.cuda.current_device() + major, _ = torch.cuda.get_device_capability(device_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 118 - 122, The SM check currently calls torch.cuda.get_device_capability(0) which always queries device 0; update the check to use the actual CUDA device for this process by calling torch.cuda.current_device() (or an available self.local_rank if present) and pass that index to torch.cuda.get_device_capability so the is_gpt_oss branch (and the flash-attn3 major version check) uses the correct GPU device instead of device 0.
389-392:⚠️ Potential issue | 🟡 MinorHandle
eos_token_id == 0.The truthiness check skips valid EOS IDs of
0, so the mismatch fix never runs for those tokenizers.Suggested fix
- and self.tokenizer.eos_token_id + and self.tokenizer.eos_token_id is not None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 389 - 392, The condition that checks EOS token IDs uses a truthiness check which skips valid id 0; update the if to explicitly test for None instead of truthiness: replace "and self.tokenizer.eos_token_id" with "and self.tokenizer.eos_token_id is not None" (so the clause reads: if text_config.eos_token_id is not None and self.tokenizer.eos_token_id is not None and text_config.eos_token_id != self.tokenizer.eos_token_id) so mismatches where eos_token_id == 0 are correctly detected and handled; keep the same variables (text_config.eos_token_id and self.tokenizer.eos_token_id) to locate the change.src/instructlab/training/vlm_utils.py (2)
32-32:⚠️ Potential issue | 🟠 MajorMake
trust_remote_codecaller-controlled.These config probes currently force remote-code execution on their own. A model classification check should not opt into repo-supplied Python unless the caller explicitly asked for that. Please add a
trust_remote_codeparameter (defaultFalse) and thread it through these probes and the later VLM load paths instead of hardcodingTrue.Also applies to: 59-59, 179-179, 206-206, 237-237, 258-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` at line 32, Change the hardcoded trust_remote_code=True usage to a caller-controlled boolean: add a trust_remote_code: bool = False parameter to the functions that call AutoConfig.from_pretrained (and any helper probes) and pass that parameter into AutoConfig.from_pretrained instead of True; thread the same parameter through the VLM load paths that later instantiate the model so repository code execution is only enabled when the caller explicitly sets trust_remote_code=True, and update all call sites to accept/forward the new parameter.
178-186:⚠️ Potential issue | 🟠 MajorPreserve the loaded backbone state when extracting the standalone CausalLM.
causal_lm_class(text_config)materializes another full decoder only to overwrite it, which is a large RAM spike for bigger VLMs. The returned wrapper also drops the loaded k-bit/runtime state (is_loaded_in_4bit,is_loaded_in_8bit,hf_quantizer) and keeps a fresh config object instead of the backbone's config, soModel.prepare_peft_model()can miss the k-bit path and later tokenizer fixes land on the wrong config. Using.model/.lm_headdirectly here is also only safe for a subset of HF CausalLM wrappers. Build the wrapper under an empty/meta init context, transplant the backbone/head through the generic wrapper APIs, and carry over the loaded metadata before returning it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 178 - 186, The current code creates a full decoder via causal_lm_class(text_config) then overwrites its internals, which spikes RAM and loses k-bit/runtime metadata; instead instantiate the wrapper under an empty/meta init (so no weights are materialized), use the wrapper's official APIs to set/transpose the backbone and lm_head (replace text_model.model and text_model.lm_head via those generic transplant APIs) and copy the backbone's config and loaded-state flags (is_loaded_in_4bit, is_loaded_in_8bit, hf_quantizer) into the new wrapper before returning it, so Model.prepare_peft_model and tokenizer-fixup see the original backbone config and quantization state; keep references to MODEL_FOR_CAUSAL_LM_MAPPING, causal_lm_class, text_model, backbone and vlm.lm_head to locate the change.
🧹 Nitpick comments (1)
src/instructlab/training/model.py (1)
589-592: Text-only LoRA on this branch will still target vision projections.Because
projection_layer_names/prepare_peft_model()are suffix-based, keeping the full VLM here means the default/commonq_proj/k_projtargets also attach adapters to the vision tower. For a "text-only" path that's a sizable extra memory/trainable-params hit; consider filtering adapter target discovery to text-module prefixes or bypassing this branch when LoRA is enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 589 - 592, The current branch keeps the full VLM when is_vlm_with_causal_lm()/is_vlm_for_direct_loading() returns true, which lets suffix-based projection_layer_names / prepare_peft_model() attach adapters to vision projections; detect when LoRA/text-only training is requested and either (a) only extract/load the text LM (use extract_causal_lm_from_vlm or a new helper) instead of preserving the full VLM, or (b) restrict projection_layer_names discovery to text-module prefixes (e.g., "text_encoder", "lm_head", "transformer.text", etc.) before calling prepare_peft_model; update the conditional around extract_causal_lm_from_vlm/load_vlm_for_text_training to branch on the LoRA/text-only flag and apply one of these fixes so adapters are not attached to the vision tower.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/instructlab/training/model.py`:
- Around line 118-122: The SM check currently calls
torch.cuda.get_device_capability(0) which always queries device 0; update the
check to use the actual CUDA device for this process by calling
torch.cuda.current_device() (or an available self.local_rank if present) and
pass that index to torch.cuda.get_device_capability so the is_gpt_oss branch
(and the flash-attn3 major version check) uses the correct GPU device instead of
device 0.
- Around line 389-392: The condition that checks EOS token IDs uses a truthiness
check which skips valid id 0; update the if to explicitly test for None instead
of truthiness: replace "and self.tokenizer.eos_token_id" with "and
self.tokenizer.eos_token_id is not None" (so the clause reads: if
text_config.eos_token_id is not None and self.tokenizer.eos_token_id is not None
and text_config.eos_token_id != self.tokenizer.eos_token_id) so mismatches where
eos_token_id == 0 are correctly detected and handled; keep the same variables
(text_config.eos_token_id and self.tokenizer.eos_token_id) to locate the change.
In `@src/instructlab/training/vlm_utils.py`:
- Line 32: Change the hardcoded trust_remote_code=True usage to a
caller-controlled boolean: add a trust_remote_code: bool = False parameter to
the functions that call AutoConfig.from_pretrained (and any helper probes) and
pass that parameter into AutoConfig.from_pretrained instead of True; thread the
same parameter through the VLM load paths that later instantiate the model so
repository code execution is only enabled when the caller explicitly sets
trust_remote_code=True, and update all call sites to accept/forward the new
parameter.
- Around line 178-186: The current code creates a full decoder via
causal_lm_class(text_config) then overwrites its internals, which spikes RAM and
loses k-bit/runtime metadata; instead instantiate the wrapper under an
empty/meta init (so no weights are materialized), use the wrapper's official
APIs to set/transpose the backbone and lm_head (replace text_model.model and
text_model.lm_head via those generic transplant APIs) and copy the backbone's
config and loaded-state flags (is_loaded_in_4bit, is_loaded_in_8bit,
hf_quantizer) into the new wrapper before returning it, so
Model.prepare_peft_model and tokenizer-fixup see the original backbone config
and quantization state; keep references to MODEL_FOR_CAUSAL_LM_MAPPING,
causal_lm_class, text_model, backbone and vlm.lm_head to locate the change.
---
Nitpick comments:
In `@src/instructlab/training/model.py`:
- Around line 589-592: The current branch keeps the full VLM when
is_vlm_with_causal_lm()/is_vlm_for_direct_loading() returns true, which lets
suffix-based projection_layer_names / prepare_peft_model() attach adapters to
vision projections; detect when LoRA/text-only training is requested and either
(a) only extract/load the text LM (use extract_causal_lm_from_vlm or a new
helper) instead of preserving the full VLM, or (b) restrict
projection_layer_names discovery to text-module prefixes (e.g., "text_encoder",
"lm_head", "transformer.text", etc.) before calling prepare_peft_model; update
the conditional around extract_causal_lm_from_vlm/load_vlm_for_text_training to
branch on the LoRA/text-only flag and apply one of these fixes so adapters are
not attached to the vision tower.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c5cdfe4-e446-4dfc-a4ba-1fa8462fce2a
📒 Files selected for processing (2)
src/instructlab/training/model.pysrc/instructlab/training/vlm_utils.py
- Fix eos_token_id truthiness check (0 is valid) - Add isinstance guard for RopeParameters in mrope detection - Add hasattr fallback for non-dict rope objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/instructlab/training/model.py (1)
125-129:⚠️ Potential issue | 🟠 MajorUse the process-local CUDA device for the GPT-OSS capability check.
In distributed runs this process may be pinned to a nonzero GPU, so
get_device_capability(0)can choosevllm-flash-attn3based on the wrong card. Querytorch.cuda.current_device()here instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/model.py` around lines 125 - 129, The GPT-OSS GPU capability check in the is_gpt_oss branch uses torch.cuda.get_device_capability(0), which can be wrong in multi-process/distributed runs; change it to query the process-local device via torch.cuda.current_device() and pass that device index to torch.cuda.get_device_capability(...) so the flash-attn3 decision in the is_gpt_oss block uses the correct local GPU.src/instructlab/training/vlm_utils.py (2)
174-185:⚠️ Potential issue | 🟠 MajorPreserve quantization state when extracting the text LM.
causal_lm_class(text_config)builds a second full wrapper, then Lines 181-182 replace only.modeland.lm_head. That drops quantization metadata from the loaded VLM, soModel.prepare_peft_model()insrc/instructlab/training/model.py:310-319will miss the k-bit path for quantized VLMs. Please build the wrapper under an empty/meta-weight context and copy flags likeis_loaded_in_4bit,is_loaded_in_8bit, andhf_quantizerfromvlmbefore returning it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 174 - 185, When constructing the standalone CausalLM wrapper, create the wrapper (causal_lm_class(text_config)) inside an empty/meta-weight context so it doesn't allocate real weights, then assign the loaded backbone and lm_head as you do (text_model.model = backbone; text_model.lm_head = vlm.lm_head) and also copy quantization metadata from the original vlm onto the new wrapper (e.g., text_model.is_loaded_in_4bit = vlm.is_loaded_in_4bit, text_model.is_loaded_in_8bit = vlm.is_loaded_in_8bit, text_model.hf_quantizer = vlm.hf_quantizer) so downstream code in Model.prepare_peft_model can detect the k-bit path correctly before deleting vlm and returning text_model.
15-32:⚠️ Potential issue | 🟠 MajorStop forcing
trust_remote_code=Truein the VLM probes.These helpers are part of ordinary model classification/loading, but every
AutoConfig.from_pretrained()here bypasses the caller’s trust policy. In this PR,src/instructlab/training/model.pyroutes through them before the actual model load, so config inspection alone now opts into remote code execution. Please thread atrust_remote_codeargument from the caller and default it toFalse.Also applies to: 45-59, 174-175, 188-202, 223-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/instructlab/training/vlm_utils.py` around lines 15 - 32, The VLM probe functions (e.g., is_vlm_with_causal_lm) currently force AutoConfig.from_pretrained(..., trust_remote_code=True); change their signatures to accept a trust_remote_code: bool = False parameter, pass that value to all AutoConfig.from_pretrained calls in this module (including the other probe helpers referenced), and update the caller in src/instructlab/training/model.py to thread the caller's trust_remote_code flag into these probes so config inspection no longer unconditionally opts into remote code execution. Ensure default remains False and propagate the flag through any internal helper calls within vlm_utils.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/instructlab/training/model.py`:
- Around line 125-129: The GPT-OSS GPU capability check in the is_gpt_oss branch
uses torch.cuda.get_device_capability(0), which can be wrong in
multi-process/distributed runs; change it to query the process-local device via
torch.cuda.current_device() and pass that device index to
torch.cuda.get_device_capability(...) so the flash-attn3 decision in the
is_gpt_oss block uses the correct local GPU.
In `@src/instructlab/training/vlm_utils.py`:
- Around line 174-185: When constructing the standalone CausalLM wrapper, create
the wrapper (causal_lm_class(text_config)) inside an empty/meta-weight context
so it doesn't allocate real weights, then assign the loaded backbone and lm_head
as you do (text_model.model = backbone; text_model.lm_head = vlm.lm_head) and
also copy quantization metadata from the original vlm onto the new wrapper
(e.g., text_model.is_loaded_in_4bit = vlm.is_loaded_in_4bit,
text_model.is_loaded_in_8bit = vlm.is_loaded_in_8bit, text_model.hf_quantizer =
vlm.hf_quantizer) so downstream code in Model.prepare_peft_model can detect the
k-bit path correctly before deleting vlm and returning text_model.
- Around line 15-32: The VLM probe functions (e.g., is_vlm_with_causal_lm)
currently force AutoConfig.from_pretrained(..., trust_remote_code=True); change
their signatures to accept a trust_remote_code: bool = False parameter, pass
that value to all AutoConfig.from_pretrained calls in this module (including the
other probe helpers referenced), and update the caller in
src/instructlab/training/model.py to thread the caller's trust_remote_code flag
into these probes so config inspection no longer unconditionally opts into
remote code execution. Ensure default remains False and propagate the flag
through any internal helper calls within vlm_utils.py.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc36d129-e0da-4044-a9b2-c93bd7382ab6
📒 Files selected for processing (2)
src/instructlab/training/model.pysrc/instructlab/training/vlm_utils.py
- Fix isort ordering in vlm_utils.py and model.py - Fix pylint: use 'from torch import nn', mark unused-argument - Mock needs_sdpa and get_module_class_from_name in unit tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- accelerator.py: fall back to warning + default wrap policy instead of ValueError when no _no_split_modules resolve; try underlying HF model as secondary target - model.py: use torch.cuda.current_device() instead of hardcoded 0 - vlm_utils.py: add trust_remote_code param (default False) to all config-loading functions; use init_empty_weights for CausalLM shell; copy quantization metadata from VLM Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove fabricated claim about C API incompatibility - Accurately describe the issue as PyTorch/CUDA ABI mismatch - Broaden exception handling to catch AttributeError Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This PR adds support for the qwen3.5 vl model.
Summary by CodeRabbit
New Features
Bug Fixes