Article Guide
Senior13 min readJune 13, 2026

Error Boundaries and Failure Isolation in Micro Frontends

Learn how to design failure isolation in micro frontend architecture using error boundaries, fallback UI, remote loading failure handling, retry strategies, monitoring, rollback, and graceful degradation.

Tags:
Micro FrontendsError BoundariesFailure IsolationFrontend ArchitectureReliabilityInterview Prep

One of the biggest promises of micro frontends is independent ownership.

But independent ownership is not enough.

A production-ready micro frontend architecture must also provide failure isolation.

That means:

textEditor
If one remote fails, the whole application should not crash.

In a large frontend system, failures are normal.

A remote may fail because:

  • remoteEntry.js is unavailable.
  • A JavaScript chunk fails to load.
  • A deployed remote has a runtime bug.
  • A shared dependency version is incompatible.
  • An API used by the remote is down.
  • A third-party script breaks rendering.
  • A remote throws during render.

A good micro frontend system should handle these failures gracefully.

The user should still see a stable shell, navigation should remain usable, and the owning team should receive actionable monitoring data.

1. Why Failure Isolation Matters

In a frontend monolith, one runtime error can break the whole app.

In micro frontends, we want a better outcome.

Bad result:

textEditor
Cart Remote crashes.
Entire website becomes blank.
User cannot navigate anywhere.
No team knows what failed.

Good result:

textEditor
Cart Remote crashes.
Shell remains stable.
Header and navigation still work.
Cart fallback appears.
Error is logged with remote name and version.
Cart team is alerted.
Rollback is possible.

Failure isolation protects:

  • User experience
  • Revenue-critical journeys
  • Debugging speed
  • Team ownership
  • Production reliability

Strong interview phrase:

Micro frontends should reduce the blast radius of frontend failures.

2. Common Failure Types

Micro frontend failures can happen at different stages.

Failure TypeExample
Manifest failureShell cannot fetch remote manifest
Remote entry failureremoteEntry.js returns 404
Chunk loading failureRemote entry loads but child chunk fails
Runtime render failureRemote component throws during render
Dependency conflictDuplicate/incompatible React versions
API failureRemote API returns 500
Auth failureSession expires during remote load
CSS failureStyles do not load or cause layout issues
Third-party failurePayment/recommendation script fails
Contract mismatchShell expects one prop/event, remote changed it

A production strategy should handle more than simple React render errors.

3. Failure Isolation Principle

The core principle is:

The shell should isolate remote failures and keep the rest of the product usable.

The shell should:

  • Wrap each remote with an error boundary.
  • Handle loading failures.
  • Show useful fallback UI.
  • Keep global navigation alive.
  • Log remote failure details.
  • Support retry where appropriate.
  • Support rollback for critical remotes.

The remote should:

  • Handle local domain errors.
  • Show domain-specific empty/error states.
  • Avoid crashing the shell.
  • Log domain errors with context.

The backend should:

  • Return meaningful API error responses.
  • Protect state consistency.
  • Avoid exposing sensitive errors.

Failure isolation is shared responsibility.

4. High-Level Architecture

A safe micro frontend layout:

architecture diagram
                    ┌──────────────────────┐
                    │      Shell App        │
                    │ Header/Nav/Auth       │
                    └──────────┬───────────┘
                               │
                  ┌────────────┴────────────┐
                  │  Remote Error Boundary  │
                  └────────────┬────────────┘
                               │
                               ▼
                    ┌──────────────────────┐
                    │     Cart Remote       │
                    └──────────────────────┘

If the Cart Remote fails:

  • Shell App remains alive.
  • Remote Error Boundary catches failure.
  • Fallback UI appears in the content area.
  • Header and navigation remain usable.

The user should not see a blank white page.

5. Error Boundaries

In React, error boundaries catch render-time errors in child components.

They help handle failures like:

  • Component throws during render.
  • Lifecycle error.
  • Unexpected undefined value.
  • Broken remote component.

Example concept:

xmlEditor
<RemoteErrorBoundary remoteName="cartApp">
  <CartRemote />
</RemoteErrorBoundary>

If CartRemote throws, the boundary catches it and renders fallback UI.

Important:

Error boundaries catch render errors, but not every possible failure.

They do not automatically catch:

  • Async promise rejection
  • Network failure before component loads
  • Event handler errors unless handled
  • API failures unless surfaced into render state
  • Remote entry load failure unless wrapped separately

So micro frontends need both error boundaries and remote loading error handling.

6. Where to Place Error Boundaries

Place boundaries at multiple levels.

Recommended:

  • Shell-level boundary
  • Route-level remote boundary
  • Remote-level section boundary
  • Critical component boundary

Example:

textEditor
Shell App
├── Global Error Boundary
├── Header
├── Navigation
└── Route Content
    ├── Catalog Remote Boundary
    │   └── Catalog Remote
    ├── Cart Remote Boundary
    │   └── Cart Remote
    └── Checkout Remote Boundary
        └── Checkout Remote

This prevents one remote from taking down the full application.

7. Global vs Remote Error Boundary

Global Error Boundary

Purpose:

textEditor
Protect the entire shell from unexpected application-level crashes.

It should catch rare platform-level failures.

Remote Error Boundary

Purpose:

textEditor
Protect the shell from a specific remote crash.

This is more important for micro frontends.

Example:

flow diagram
Cart Remote throws
      │
      ▼
Cart Boundary catches it
      │
      ▼
Cart fallback appears
      │
      ▼
Shell remains usable

If only a global boundary exists, the whole app may still show a generic error.

Remote-level boundaries provide better isolation.

8. Remote Loading Failure

A remote can fail before React even renders it.

Example:

textEditor
Shell tries to load:
https://cdn.company.com/cart/1.8.5/remoteEntry.js

Request fails:
404 Not Found

This is not a normal render error.

The shell must handle remote loading states:

  • Loading
  • Loaded
  • Failed
  • Timed out
  • Retrying
  • Fallback

Remote loading failure should not crash the shell.

Expected behavior:

flow diagram
Remote fails to load
      │
      ▼
Shell shows fallback UI
      │
      ▼
Error logged with remote URL/version
      │
      ▼
User can navigate elsewhere

9. Chunk Load Failure

Sometimes remoteEntry.js loads, but a child chunk fails.

Example:

textEditor
remoteEntry.js loads successfully.
main.a82d91.js fails.

Possible reasons:

  • CDN cache issue
  • Deleted old artifact
  • Network instability
  • Bad deployment
  • Version mismatch

The shell should treat this as a remote failure.

Log:

textEditor
remoteName
remoteVersion
chunkUrl
route
shellVersion
errorType: ChunkLoadError

Chunk load errors are common in real production systems.

10. Timeout Handling

A remote may not fail immediately.

It may hang.

Example:

textEditor
Checkout Remote takes 20 seconds to load.

The shell should define timeouts.

Example policy:

RemoteTimeout
Marketing widget2 seconds
Catalog page5 seconds
Cart page5 seconds
Checkout page8 seconds
Recommendation widget2 seconds

Timeout behavior:

  • Show fallback UI.
  • Log timeout.
  • Allow retry.
  • Do not block shell forever.

Critical remotes may get longer timeouts, but not infinite waiting.

11. Fallback UI Design

Fallback UI should be domain-specific.

Bad fallback:

textEditor
Something went wrong.

Better fallback:

textEditor
We are having trouble loading your cart. Your items are safe. Please refresh or try again.

Fallback should answer:

  • What failed?
  • Can the user continue?
  • Should the user retry?
  • Is their data safe?
  • What can they do next?

For non-critical widgets:

textEditor
Recommendations are temporarily unavailable.

For critical flows:

textEditor
Checkout is temporarily unavailable. Your cart is saved. Please try again shortly.

The fallback should reduce panic.

12. Critical vs Non-Critical Remotes

Not all remotes have the same business importance.

RemoteCriticalityFailure Strategy
Marketing bannerLowHide section
RecommendationsLowHide or fallback
CatalogMedium/highShow retry and fallback
CartHighShow safe fallback
CheckoutCriticalAlert, retry, rollback
ProfileMediumShow domain fallback
OrdersMediumShow retry/no-access fallback

For low-risk remotes, graceful disappearance may be enough.

For checkout, failure should trigger alerts and rollback consideration.

13. Graceful Degradation

Graceful degradation means the product still provides value when part of it fails.

Example:

textEditor
Recommendations fail.
Product page still loads.

Another example:

textEditor
Cart badge fails.
Header still renders.
Cart page can still open.

Bad degradation:

textEditor
Recommendation widget fails.
Entire product page crashes.

Good degradation:

textEditor
Recommendation widget fails.
Product details remain usable.
Widget area hides safely.

The smaller and less critical the remote, the more it should degrade silently.

14. Retry Strategy

Retry can help with temporary network failures.

Retry is useful for:

  • Temporary CDN failure
  • Slow network
  • Transient chunk load issue
  • Temporary API issue

But retry should be controlled.

Bad:

textEditor
Retry forever every 100ms.

Good:

textEditor
Retry 1–2 times with backoff.
Then show fallback UI.

Retry policy example:

textEditor
First failure → retry after 500ms
Second failure → retry after 1500ms
Third failure → fallback UI

Do not retry destructive actions automatically.

For checkout/payment, retry behavior must be carefully designed.

15. Circuit Breaker Pattern

A frontend circuit breaker prevents repeatedly loading a known-broken remote.

Example:

textEditor
checkoutApp v1.4.3 fails for many users.

Circuit breaker behavior:

  • Mark version unhealthy.
  • Stop loading it temporarily.
  • Use fallback or previous version.
  • Alert owning team.

This can protect users during incidents.

A simple circuit breaker can be based on:

  • Failure rate
  • Timeout rate
  • Chunk load error rate
  • Runtime error rate
  • Fallback frequency

This is more advanced, but useful for large systems.

16. Rollback Strategy

Failure isolation should connect to rollback.

Example:

textEditor
Cart Remote v1.8.5 causes runtime crashes.

Rollback flow:

  • Monitoring detects error spike.
  • Alert goes to Cart Team.
  • Manifest switches cartApp back to v1.8.4.
  • CDN cache is invalidated if needed.
  • Shell loads stable version.
  • Cart Team investigates v1.8.5.

Important principle:

A remote failure should not require redeploying the entire shell unless the shell itself is broken.

Manifest-based rollback makes this easier.

17. Version-Aware Failure Handling

Every failure should include version data.

Log example:

jsonEditor
{
  "remoteName": "cartApp",
  "remoteVersion": "1.8.5",
  "shellVersion": "3.2.0",
  "route": "/cart",
  "errorType": "ChunkLoadError",
  "teamOwner": "cart-team"
}

This helps answer:

  • Which remote failed?
  • Which version failed?
  • Which shell version was active?
  • Which team owns it?
  • Was this after a deployment?
  • Can we roll back?

Without version data, debugging is much slower.

18. Observability for Failure Isolation

Track these signals:

  • Remote load failures
  • Remote load timeout
  • Chunk load errors
  • Runtime render errors
  • Fallback UI frequency
  • Retry count
  • Remote version
  • Shell version
  • Route
  • User journey
  • API error rate
  • Business conversion drop

Example alert:

textEditor
checkoutApp v1.4.3 fallback rate exceeded 3% for 5 minutes.

Good alerts identify:

  • Remote
  • Version
  • Route
  • Owner
  • Severity
  • Possible rollback target

Bad alert:

textEditor
Frontend error increased.

That is not actionable enough.

19. Remote-Level API Failures

Not every failure is a JavaScript failure.

Example:

textEditor
Orders Remote loads successfully.
Orders API returns 500.

The remote should handle this locally.

Possible UI:

textEditor
We could not load your orders right now. Please try again.

This should not crash:

  • Shell
  • Header
  • Profile Remote
  • Other routes

Remote API failure handling should include:

  • Loading state
  • Error state
  • Retry
  • Empty state
  • Auth error handling
  • Observability

20. Auth Failure During Remote Load

Auth failures can happen while a remote is active.

Examples:

  • Session expires on /checkout.
  • Orders API returns 401.
  • Profile Remote detects forbidden access.

Recommended behavior:

  • 401 → auth refresh or login redirect
  • 403 → no-access state
  • Session expired → preserve current route and ask login

Do not show a generic remote crash fallback for normal auth states.

Auth errors should be handled intentionally.

21. Data Safety in Fallback UI

Fallback messaging should reassure users when possible.

Example: Cart

textEditor
We are having trouble loading your cart. Your items are saved. Please try again.

Example: Checkout

textEditor
Checkout is temporarily unavailable. Your cart is saved. Please try again shortly.

Avoid scary messages:

  • Your checkout failed.
  • Your cart is broken.
  • Something exploded.

Fallback UI is part of trust.

22. Shell Stability

The shell should be the most stable part of the system.

It should rarely change compared to remotes.

The shell should be:

  • Small
  • Well-tested
  • Observable
  • Backward-compatible
  • Strict about remote contracts
  • Resilient to remote failures

If the shell fails, all remotes are affected.

So shell releases should be more conservative.

23. Testing Failure Isolation

Test failure scenarios directly.

Test cases:

  • remoteEntry.js returns 404.
  • Remote chunk fails to load.
  • Remote throws during render.
  • Remote times out.
  • Remote API returns 500.
  • Remote API returns 401.
  • Remote API returns 403.
  • Fallback UI renders.
  • Retry button works.
  • Navigation remains usable.
  • Error log includes remote name/version.

Do not test only happy paths.

A micro frontend system is production-ready only when failure paths are tested.

24. Failure Injection

Failure injection means intentionally simulating failures.

Examples:

  • Block remoteEntry.js in test.
  • Mock chunk load failure.
  • Force remote component to throw.
  • Delay remote load by 10 seconds.
  • Return 500 from API.
  • Return malformed manifest.

This helps verify that the shell is resilient.

In interviews, mentioning failure injection shows strong production thinking.

25. E-commerce Example

Suppose the e-commerce platform has:

  • Catalog Remote
  • Product Remote
  • Cart Remote
  • Checkout Remote
  • Profile Remote
  • Orders Remote

Failure behavior:

FailureExpected Behavior
Recommendations failHide recommendations
Product images failShow placeholder
Catalog API failsShow retry page
Cart remote failsShow safe cart fallback
Checkout remote failsAlert + rollback consideration
Profile API failsShow retry state
Orders API returns 403Show no-access message

The goal is to keep the rest of the journey usable.

26. Common Anti-Patterns

Anti-PatternWhy It Is Bad
One global error boundary onlyPoor remote isolation
No remote loading fallbackBlank screens
No timeoutApp waits forever
Generic fallback everywherePoor UX
No version data in logsHard debugging
No retry strategyTemporary issues become user-facing
Infinite retryNetwork/resource waste
No rollback pathLong incidents
Shell owns too much domain logicLarger blast radius
No failure testingProduction finds bugs first

27. Interview Questions

Q1. What is failure isolation in micro frontends?

Failure isolation means one remote failure should not crash the entire application. The shell should remain usable, show fallback UI for the failed remote, and log actionable failure details.

Q2. How do error boundaries help?

Error boundaries catch render-time errors from remote components and prevent them from taking down the full shell. They allow the app to show domain-specific fallback UI and keep navigation alive.

Q3. Are error boundaries enough?

No. Error boundaries do not handle every failure. You also need remote loading error handling, chunk load handling, timeouts, retry strategy, API error states, observability, and rollback.

Q4. How do you handle a remoteEntry.js failure?

The shell should detect the loading failure, show fallback UI, log the remote URL/name/version/error type, keep the rest of the app usable, and optionally retry or roll back through a manifest.

Q5. What happens if checkout remote fails?

Checkout is critical, so the shell should show a safe fallback, reassure the user that their cart is saved, log the failure with version data, alert the checkout team, and consider rollback if error rate crosses threshold.

Q6. How do you test failure isolation?

Simulate remote loading failure, chunk load failure, render error, API failures, timeout, auth failures, and malformed manifest. Verify fallback UI, retry behavior, navigation stability, and error logging.

28. Strong Senior Answer

If an interviewer asks: "How do you prevent one micro frontend from crashing the whole app?"

I would isolate every remote at the shell boundary. The shell would wrap each remote with a remote-level error boundary and also handle remote loading failures separately. That includes remoteEntry failures, chunk load errors, timeouts, and runtime render errors.

If a remote fails, the shell should keep global layout and navigation alive and show a domain-specific fallback UI. For example, if the Cart Remote fails, the user should see a safe cart fallback instead of a blank screen.

I would also log remote failures with remote name, remote version, shell version, route, error type, and team owner. For critical remotes like checkout, high failure rates should trigger alerts and possibly rollback through a manifest.

Error boundaries are important, but they are only one layer. A production-ready solution also needs loading failure handling, retry, timeout, observability, tested fallback states, and rollback.

29. Final Failure Isolation Checklist

Before calling a micro frontend system reliable, check:

  • Each remote is wrapped in an error boundary.
  • Remote loading failures are handled.
  • Chunk load failures are handled.
  • Remote timeouts are defined.
  • Fallback UI is domain-specific.
  • Shell navigation remains usable after remote failure.
  • Retry strategy is controlled.
  • Critical remotes have alerting.
  • Rollback path exists.
  • Errors include remote name and version.
  • API failures are handled inside remotes.
  • Auth failures are handled correctly.
  • Failure scenarios are tested.
  • Monitoring tracks fallback frequency.
  • Shell remains lightweight and stable.

30. Summary

Failure isolation is one of the most important production capabilities in micro frontend architecture.

A good system ensures:

  • One remote failure does not break the whole app.
  • The shell remains stable.
  • Fallback UI is useful.
  • Errors are observable.
  • Critical failures trigger alerts.
  • Rollback is possible.
  • Failure scenarios are tested.

The strongest takeaway:

Micro frontends are not truly independent unless they can fail independently.

If one remote failure creates a blank page for the whole product, the architecture has not achieved real isolation.

References

  • React Error Boundaries: https://react.dev/reference/react/Component#catching-rendering-errors-with-an-error-boundary
  • webpack Module Federation Documentation: https://webpack.js.org/concepts/module-federation/
  • Module Federation Official Site: https://module-federation.io
  • Micro Frontends — Martin Fowler: https://martinfowler.com/articles/micro-frontends.html
  • Micro Frontends: https://micro-frontends.org
  • AWS Prescriptive Guidance: Micro-frontends: https://docs.aws.amazon.com/prescriptive-guidance/latest/micro-frontends-aws/introduction.html