Add OSA chat widget to documentation#13702
Add OSA chat widget to documentation#13702neuromechanist wants to merge 6 commits intomne-tools:mainfrom
Conversation
|
Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴 |
|
Hi maintainers! Could someone approve the CircleCI pipeline? It requires maintainer approval for first-time contributors from forks. Also, if you'd like to customize the widget further (e.g., adding your own logo), you can pass additional options to OSAChatWidget.setConfig({
communityId: 'mne',
logo: 'https://mne.tools/stable/_static/mne_logo.svg',
suggestedQuestions: [...]
});Full configuration reference: https://docs.osc.earth/osa/deployment/widget/#full-configuration |
|
@neuromechanist Thanks for opening this PR. Pushing AI tools is not something that everyone has been fully comfortable with. See for instance this discussion regarding an MCP: #13288 (comment). If I undestand, OSA is more of a chatbot, but still those concerns over the correctness of the bot's answers would apply. One of the suggested prompts And yes, while nobody has to use it, it's inclusion in the documentation is still implicit support that it's answers can be trusted. |
|
I tried it and it failed, but the chat icon in the lower right was at least discoverable enough
Maybe this is an issue with it being on CircleCI, not sure... Would be cool to be able to test here by clicking the CircleCI link, as then we could look into how reasonable its responses are.
I wonder if part of this we could adjust the widget to have a nice clear warning that it's AI generated and answers may or may not be correct. Maybe with a link to read more. From talking to people in the community, people are already "going to ChatGPT for help" so if we can get OSA to be at least as good or better than this, it might be a step in a helpful direction for our users. In other words, try to express: "Hey it's not clear this is a good idea but if you're going to use AI tools, we tried to make this one accurate." I suspect because @neuromechanist here has carefully chosen what to ingest to teach it (e.g., our docs, published papers) it probably does some reasonable things, hopefully moreso than ChatGPT etc. As maintainers we probably do need to discuss more over zoom or similar at some point though. @neuromechanist is this used by other large scientific projects so far that you know of? For example if NumPy or somebody used it, would help me at least trust it a bit more. |
There was a problem hiding this comment.
TODO:
looks like an entry is still missing indoc/changes/names.incchangelog entry should use:newcontrib:role- before I'm comfortable making this live, we should figure out how to add a prominent caveat (something like "this is AI, no guarantee it's accurate, please double-check against our docs")
- Before this goes live, I think we need to decide whether we're committing to this long-term (and thus committing to Yahya's suggested monthly cost-sharing) or just trying it out. If we're just trying it out, that should be prominently stated in the caveat mentioned above (so users don't get as mad if we turn it off later).
- if we can't get it working on circleCI, someone will need to do a local doc build and get it working there in order to test it out I guess
@tsbinns FWIW, I'm comfortable making it easier for end users to get good mne-related results from AI for their own scripts. To me that's a separate problem from whether we allow AI-aided contributions to the codebase.
@drammock I think there was a contrib back in 1.7, see #13702 (comment) |
|
This was a suggestion in an email to @drammock, @larsoner, @agramfort with some more background. Probably, it is useful for everyone to put the email here as wellHi Dan, You might have heard that I've been building the Open Science Assistant (OSA) platform and have already onboarded EEGLAB, NEMAR, HED, BIDS, and FieldTrip. I went ahead and created one for MNE-Python as well. You can try it here: Live demo: https://demo.osc.earth/mne What it knowsThe assistant has a continuously synced knowledge base with:
It understands the MNE data pipeline (Raw -> Epochs -> Evoked -> SourceEstimate), cites sources with links, and provides concise answers. Let me know if you want to onboard it and add it to your docs. There are some requirements, outlined below. Embedding on mne.toolsIf you want to add it to your website: <script src="https://demo.osc.earth/osa-chat-widget.js"></script>
<script>
OSAChatWidget.setConfig({
communityId: 'mne'
});
</script>Floating chat button, bottom-right corner. Auto-detects the page URL for context-aware answers. Lightweight (~30KB), no dependencies, supports dark mode, works on mobile. You can also give it page-specific context: <script>
OSAChatWidget.setConfig({
communityId: 'mne',
widgetInstructions: 'The user is reading the ICA tutorial.'
});
</script>Customizing the assistantThe whole configuration is one YAML file:
You manage it via PRs to our repo. You can change the system prompt, which docs are preloaded, widget appearance, suggested questions, CORS origins for mne.tools, sync schedule, etc. The schema reference documents all options. To enable the widget on mne.tools, uncomment the CORS section in the config: cors_origins:
- https://mne.tools
- https://*.mne.toolsWhat you'd need
The demo currently uses a shared platform key, so feel free to try it out. For production use on mne.tools, you'd set up your own OpenRouter key. Why it matters even if you don't embed itEven if you decide not to add the widget to mne.tools, having these MNE resources synced and indexed is valuable for the broader ecosystem. I'm planning to build an arena where a question to one assistant (say, the BIDS assistant about data preparation) can also query the MNE-Python, EEGLAB, FieldTrip, and other onboarded assistants, so the response is much richer for every community. The more knowledge sources we have connected, the better the answers get across the board.
Let me know what you think.
Yes, CircleCI cannot access the backend at least as of now. This is a Protective precaution and can be controlled with the CORS flag (please look at the email and also the documentation).
Thanks, very good and necessary suggestion, see: OpenScience-Collective/osa#245, and will be added today.
Thanks, I barely tried to experiment with the prompt and knowledge sources for MNE. Please note that the responses are as good as the information and the prompts (i.e., how to use this information). I appreciate if the community gets involved in tuning the responses. Note that creating ad-hoc backends is planned (OpenScience-Collective/osa#219), For now, if you want to see the effect of your changes to the YAML file, you need to merge into the develop and wait about 10 to 15 minutes, so it deploys to the backend. I can create a team for MNE, so you can merge into dev w/o approval for MNE Assistant directory. Having said that, one advantage of OSA is that it can parse information from many resources that a project has (PR, Issues, Docstrings, and even the Discourse sever) look at the details at https://status.osc.earth/osa/mne:
Not yet, but I am probably presenting OSA at an Open Science Conference next month (if I can make it), organized by a couple of these large communities. A couple of other neuroscience communities have reached out and I am working with them to be onboarded.
Yes, it is a chatbot with the goal to provide as much transparency as possible on how it is designed, what information it is using while it is easy to implement for already stretched open source maintainers (basic tools only require one YAML file to setup). Any use of AI comes with its own concerns. Yet it seems that we inevitably use AI or AI products on a daily basis. Whether we choose to be an active participant or a user of these products is up to us for sure.
Similar to almost all parts of OSA, questions can also be adjusted. Probably even easier than most because you can change the questions from the widget script, and even make is specific to the page it is serving (similarly, you can amend the prompt based on the page as well). |
Co-authored-by: Daniel McCloy <dan@mccloy.info>
|
The widget now has a disclaimer baked in for all communities after merged into prod (still need to add an option to remove/adjust it more easily). Thanks @larsoner You can also add or customize the welcome message and add more caution if needed. Within the message by updating the Yaml file, or more easily add https://feature-issue-245-widget-foo-demo.osc.earth/mne
|
I imagine that if MNE were to adopt this bot, folks here would be pretty motivated to make sure that the responses are correct, and to make fixes upstream as needed! I'd be interested to learn how difficult it is to get a development environment set up for debugging issues and implementing fixes. For example, I just asked the MNE-Python assistant:
and the bot told me "MNE-Python doesn't have built-in functions specifically for converting eyetracking pixels to visual angle" But we do have a function exactly for this, that is documented and used in 3 tutorials: mne.preprocessing.eyetracking.convert_units If I have some time in the coming weeks, maybe I can explore how feasible it is to fix/fine-tune this response. |
|
Related to this discussion, we might also want to consider Kapa AI, who offer a similar service which can be free of charge for OSS (https://docs.kapa.ai/kapa-for-open-source). I've used their chatbot on https://docs.pola.rs/api/python/stable/reference/index.html and https://dplyr.tidyverse.org and so far always received very good answers. They also claim that if the bot doesn't have any sources from the docs to back up an answer, it will just say "I don't know" instead of hallucinating something. |
|
Just to add one more example, I tried the OSA chatbot with the following question:
And the answer was completely wrong: The correct answer is that the status channel is not parsed automatically and |
|
+1 on exploring Kapa. Trying to test use cases and "tune" the AI to provide the correct responses is nothing we should have to do. There's systems out there that provide useful responses automatically, and we should evaluate those instead. @neuromechanist I appreciate your effort, this is certainly pushing in the right direction and sparking important discussions! I just believe that this concrete solution here is not quite it (yet!) |
|
I greatly appreciate all the feedback here. Regardless of whether MNE adopts the widget, we will maintain the MNE knowledge base on OSA. It serves users across other onboarded communities (BIDS, NEMAR, EEGLAB, FieldTrip, HED) (see OpenScience-Collective/osa#167) who may have MNE-related questions. Having an MNE resource would benefit the larger neuroscience community. On Kapa AI@cbrnr @hoechenberger evaluating Kapa alongside OSA makes sense. Key differences: OSA is fully open source (MIT), community-owned config (single YAML file), knowledge sources include GitHub issues/PRs, Discourse, and academic papers (not just docs), and a cross-community layer is planned where asking BIDS a question can also query MNE, EEGLAB, and FieldTrip. Kappa also scrapes docs, GitHub, and Discourse. I did not find if they have infrastructure for traversing citing papers, or adding specific tools like a BIDS validator. I also did not find any mention on how long they commit to provide this service for free. That said, evaluate both and pick what works best for MNE. On commitment and cost@drammock I'd suggest framing this as a trial. The widget can be removed with a single commit, no lock-in 🤓. SCCN is happy to cover costs as part of our mission to advance neuroscience tools and our grants outreach. No financial commitment required from MNE. If you later want direct cost control, BYOK (Bring Your Own Key) is available. On the BDF status channel issue (@cbrnr)Knowing that a What the LLM actually did: 1) identified the premise, 2) searched the function docs, 3) found To test this, ask the same question with How do I parse the
|
|
thank you for the very detailed response @neuromechanist. I took a few minutes to look at Kapa's and OSA's websites. Here are my "hot takes":
For me, the most pressing question is: who is willing to do the work to test out one or both of these tools, and eventually declare themselves satisfied enough to recommend that we adopt it? I'm not in favor of unleashing an unvetted tool onto our users and hoping for the best. |
|
Thanks @neuromechanist, I really appreciate your answer and your effort! For reference, I asked Claude Sonnet 4.6 (this is not even their best model, but it had the context of the MNE-Python source repository) the same question, and it got it right perfectly. If a model is so brittle that it cannot understand the difference between status (and its connection to TAL and annotations) and For the sake of completeness, here's Claude's answer: DetailsHere's how to correctly parse the **Status channel** of a BDF file (typically produced by Biosemi ActiveTwo systems) in MNE-Python:1. Reading the fileThe import mne
raw = mne.io.read_raw_bdf('my_file.bdf', preload=True)See edf.py — the 2. The key bit-layout problemOn Biosemi devices, the 24-bit Status channel has this layout:
If you call 3. Masking trigger bits (bits 0–15)# Approach 1: mask post-hoc
events = mne.find_events(raw)
events[:, 2] &= (2**16 - 1) # keep only bits 0–15
# Approach 2: use the mask parameter directly
events = mne.find_events(raw, mask=2**16 - 1)4. Checking system codes (bits 16–23)events = mne.find_events(raw)
# Example: check the CMS-in-range bit (bit 20)
cms_bit = 20
cms_high = (events[:, 2] & (1 << cms_bit)) != 05. Event shift (device-dependent)Some triggering devices encode events offset by 8 bits; if you get unexpected event codes, try: events[:, 2] >>= 86. BDF Annotations (TAL)If your BDF file contains a events, event_id = mne.events_from_annotations(raw)Summary: The most common mistake is forgetting to mask out bits 16–23 (system codes). Always use I agree with most of Dan's takes. I think the next step for us should be to test Kapa, as we don't know how their model performs (although I would be surprised if it wasn't substantially better). I could take care of the application, and then everybody can test their chatbot and provide feedback (as I assume they will provide a test implementation of their product). I would clearly state that we want to evaluate if their product is a worthwile addition for us, so I would not commit to integrating their chatbot without a thorough testing phase. |
Not exactly (at least in the traditional sense), that is why the docs does not claim it. For the common instructions, it loads all the instructions, for function, PR, discourse, paper citations, it queries the database. The assumption is that RAG (where usually some embedding vector is used and docs are chunked, etc) may not benefit our usually short documents (<10k words/tokens per doc).
Nice reference. If you were to ask the same question from OSA with Sonnet 4.6, it would give you a similar answer: How do I parse the status channel of a BDF file correctly?## Parsing the BDF Status ChannelIn BioSemi BDF files, the Status channel encodes trigger/event information as a 24-bit integer. The key is applying the correct bitmask when extracting events. The Core Issue: BitmaskThe Status channel contains more than just trigger codes — bits 16–23 encode system flags. You typically only want the lower 16 bits (or lower 8 bits for simple triggers): import mne
raw = mne.io.read_raw_bdf('your_file.bdf', preload=True)
# Find events, masking to the lower 16 bits
events = mne.find_events(raw, stim_channel='Status', mask=0xFFFF)
# Or if only 8-bit trigger codes are used:
events = mne.find_events(raw, stim_channel='Status', mask=0xFF)Why This MattersWithout the mask, spurious "events" appear because status bits (like the "new epoch" bit at position 20) toggle independently of your triggers. Additional Options
events = mne.find_events(raw, stim_channel='Status',
mask=0xFFFF, uint_cast=True)Follow-up questions to consider:
This answer cost 8 cents, used about 15k tokens (with three tool calls, using cache with 90% discount on the tool calls). In contrast, using Haiku 4.5 (which is the default model for OSA) cost 2 cents. Part of making comparisons is to control for the test parameters. The model, context amount, incurred cost and caching, all change the calculus. @drammock, @scott-huberty, A complication for OSA for development has been access to the DB (creating new ones are time consuming, usually requires setting up API keys, environment, etc). This will be resolved in the upcoming 0.7.2. With OSA CLI, you will be able to interact with the DB while running the LLM calls locally (even using local LLMs if you like), test tuning instructions, mirror the DB and get write access to the mirror to tweak the DB, or even clone the DB for local use. |
This explains a lot! I didn't know the default model, and obviously Haiku is a lot worse than Sonnet. I agree that we need to control for all these parameters in a comparison, but at the end of the day what matters is the quality of the answers, and it seems like a better base model than Haiku 4.5 is necessary. If we can use Sonnet 4.6 (I only found 4.5 in the options, but there shouldn't be that much difference) - that's a whole different story! |




Summary
Details
The widget is a lightweight floating chat button that appears on all documentation pages. It:
mnecommunity configuration from OSAThe widget is served from
demo.osc.earthand configured via thecommunityId: 'mne'setting, which auto-configures the API endpoint, title, theme color, and initial greeting.Test plan
mne.toolsand*.mne.toolsare enabled on the OSA backendmake html)