Chapter 25 — Debugging Guide
This chapter covers the most common issues encountered when integrating and debugging the OpenLCB C Library, with symptoms, diagnosis steps, and solutions.
25.1 Buffer Pool Exhaustion
Symptoms
- Allocate functions return NULL.
- Messages are silently dropped -- no response sent for incoming requests.
- Datagrams fail after a burst of activity.
- Node appears to stop responding after heavy traffic.
Diagnosis
Use the peak allocation counters provided by the buffer store modules. These track the high-water mark for each pool type during runtime:
// Check peak allocation for each pool type
// OpenLCB buffers:
OpenLcbBufferStore_get_peak_allocated_count(BASIC);
OpenLcbBufferStore_get_peak_allocated_count(DATAGRAM);
OpenLcbBufferStore_get_peak_allocated_count(SNIP);
OpenLcbBufferStore_get_peak_allocated_count(STREAM);
// CAN buffers:
CanBufferStore_get_peak_allocated_count();
If the peak count equals the pool depth, you have hit exhaustion.
Tuning
| Pool | Macro | Increase When |
|---|---|---|
| BASIC (16B) | USER_DEFINED_BASIC_BUFFER_DEPTH | Many concurrent simple messages (Verify Node ID, events) |
| DATAGRAM (72B) | USER_DEFINED_DATAGRAM_BUFFER_DEPTH | Multiple overlapping datagram transfers |
| SNIP (256B) | USER_DEFINED_SNIP_BUFFER_DEPTH | Multiple concurrent SNIP requests or events with payload |
| STREAM (512B) | USER_DEFINED_STREAM_BUFFER_DEPTH | Stream transfers (future use) |
| CAN frames | USER_DEFINED_CAN_MSG_BUFFER_DEPTH | CAN RX FIFO overflows (ISR producing faster than main loop consumes) |
8-bit Target Limit
Total buffer count across all pools must not exceed 126 on 8-bit processors. This is enforced at compile time by openlcb_config.h.
25.2 Alias Collisions
Symptoms
- Node repeatedly restarts login (CID frames seen repeatedly).
- Node never reaches RUNSTATE_RUN.
- Intermittent communication loss with a specific node.
Detection Flow
flowchart TD
A["CAN RX ISR receives frame"] --> B{Source alias matches one of our reserved aliases?}
B -->|Yes| C["Set entry.is_duplicate = true"]
C --> D["Set has_duplicate_alias = true"]
D --> E["Main loop detects flag"]
E --> F["Scan table for is_duplicate entries"]
F --> G["Unregister old alias, re-seed, restart login"]
B -->|No| H["Normal processing"]
Log Points
- Add logging in
AliasMappings_set_has_duplicate_alias_flag()to see when collisions are detected. - Add logging in the
on_alias_changecallback to track alias changes. - Monitor the
run_statefield of each node -- a node bouncing between INIT/GENERATE_SEED/GENERATE_ALIAS indicates repeated collisions.
25.3 Datagram Assembly Timeout
Symptoms
- Multi-frame datagrams are never completed.
- Datagram Rejected with temporary error code sent after 3 seconds.
- Configuration memory reads/writes fail intermittently.
Causes
| Cause | Diagnosis | Fix |
|---|---|---|
| Main loop too slow | 100ms tick advancing faster than frames are processed | Speed up main loop, reduce other work between OpenLcb_run() calls |
| CAN bus error | Middle or Final frame lost in transit | Check CAN bus termination, cable length, baud rate |
| Buffer exhaustion | CAN buffer pool depleted, incoming frames dropped | Increase USER_DEFINED_CAN_MSG_BUFFER_DEPTH |
| Sender paused too long | Sender inserts delay between datagram frames | Sender must send all datagram frames consecutively |
25.4 State Machine Stuck
How to Identify
Check the run_state field of each node. A node that stays in a non-RUN state for an extended period is stuck:
| Stuck State | Likely Cause |
|---|---|
RUNSTATE_WAIT_200ms (7) | 100ms timer tick not incrementing. Check that OpenLcb_100ms_timer_tick() is being called. |
RUNSTATE_LOAD_CHECK_ID_* (3-6) | CAN TX hardware not sending frames. Check transmit_raw_can_frame() and is_tx_buffer_clear(). |
RUNSTATE_LOAD_CONSUMER/PRODUCER_EVENTS (11-12) | Event enumeration stuck. Check that event lists are properly initialized in node_parameters_t. |
RUNSTATE_LOGIN_COMPLETE (13) | on_login_complete callback returning false. The callback must return true to allow the node to proceed to RUN. |
25.5 Common Mistakes
When Adding Protocols
- Forgetting to wire the handler in the main state machine interface struct (results in OIR being sent for that MTI).
- Not adding the compile-time feature guard around the handler and its wiring.
- Using the wrong buffer type for the response (e.g., BASIC for a datagram reply that needs 72 bytes).
When Adding Nodes
- Not increasing
USER_DEFINED_NODE_BUFFER_DEPTHwhen adding more virtual nodes. - Not increasing
ALIAS_MAPPING_BUFFER_DEPTH(defaults to node buffer depth, so usually automatic). - Forgetting to set up the
node_parameters_twith correct protocol support bits, SNIP data, and event lists.
25.6 Google Test Patterns
The library's test suite uses Google Test (gTest) with a consistent pattern for mocking the callback interfaces:
- Create a mock interface struct with function pointers set to test-specific implementations.
- Call
Module_initialize(&mock_interface)to inject the test dependencies. - Call the function under test with known inputs.
- Assert on the outputs and on any side effects captured by the mock functions.
// Example test pattern (simplified)
TEST(AliasMappings, RegisterAndLookup) {
AliasMappings_initialize();
// Register an alias
alias_mapping_t *result = AliasMappings_register(0x3AB, 0x050101010001ULL);
ASSERT_NE(result, nullptr);
EXPECT_EQ(result->alias, 0x3AB);
EXPECT_EQ(result->node_id, 0x050101010001ULL);
// Lookup by alias
alias_mapping_t *found = AliasMappings_find_mapping_by_alias(0x3AB);
EXPECT_EQ(found, result);
// Lookup by Node ID
found = AliasMappings_find_mapping_by_node_id(0x050101010001ULL);
EXPECT_EQ(found, result);
}