【前面的话】最近在看服务熔断的相关技术,下面就来总结一下。
壹、入围方案
- Sentinel
- Hystrix
-  resilience4j
- github 地址
- https://resilience4j.readme.io/docs
-  是一款轻量、简单,并且文档非常清晰、丰富的熔断工具。是 Hystrix 替代品,实现思路和 Hystrix 一致,目前持续更新中
- 需要自己对 micrometer、prometheus 以及 Dropwizard metrics 进行整合
-  CircuitBreaker 熔断
-  Bulkhead 隔离
-  RateLimiter QPS 限制
-  Retry 重试
-  TimeLimiter 超时限制
-  Cache 缓存
 
- 自己实现 (基于 Guava)
-  基于 Guava 的令牌桶,可以轻松实现对 QPS 进行限流
 
贰、技术对比
|  | Sentinel | Hystrix | resilience4j | 使用 Guava 实现 | 
| 隔离策略 | 信号量隔离(并发线程数限流) | 线程池隔离 / 信号量隔离 | 信号量隔离 |  | 
| 熔断降级策略 | 基于响应时间、异常比率、异常数 | 基于异常比率 | 基于异常比率、响应时间 |  | 
| 实时统计实现 | 滑动窗口(LeapArray) | 滑动窗口(基于 RxJava) | Ring Bit Buffer | 令牌桶 | 
| 动态规则配置 | 支持多种数据源 | 支持多种数据源 | 有限支持 |  | 
| 扩展性 | 多个扩展点 | 插件的形式 | 接口的形式 |  | 
| 基于注解的支持 | 支持 | 支持 | 支持 | 支持 | 
| 单机限流 | 基于 QPS,支持基于调用关系的限流 | 有限的支持 | Rate Limiter | 基于 QPS | 
| 集群流控 | 支持 | 不支持 | 不支持 |  | 
| 流量整形 | 支持预热模式与匀速排队控制效果 | 不支持 | 简单的 Rate Limiter 模式 |  | 
| 系统自适应保护 | 支持 | 不支持 | 不支持 |  | 
| 热点识别 / 防护 | 支持 | 不支持 | 不支持 |  | 
| Service Mesh 支持 | 支持 Envoy/Istio | 不支持 | 不支持 |  | 
| 控制台 | 提供开箱即用的控制台,可配置规则、实时监控、机器发现等 | 简单的监控查看 | 不提供控制台,可对接其它监控系统 |  | 
| 是否支持默认规则 | 不支持,需要针对每个接口配置规则 | 支持 | 支持 |  | 
| 是否支持过滤异常 | 注解单个接口支持 | 注解和全局默认配置 | 注解和全局默认配置 |  | 
叁、应用改造
3.1、sentinel
3.1.1、引入依赖
| 12
 3
 4
 5
 
 | <dependency><groupId>com.alibaba.cloud</groupId>
 <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId>
 <version>2.0.3.RELEASE</version>
 </dependency>
 
 | 
3.1.2、改造接口或者 service 层
@SentinelResource(value = “allInfos”,fallback = “errorReturn”)
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 
 | @Target({ElementType.METHOD, ElementType.TYPE})@Retention(RetentionPolicy.RUNTIME)
 @Inherited
 public @interface SentinelResource {
 
 String value() default "";
 
 
 EntryType entryType() default EntryType.OUT;
 
 
 int resourceType() default 0;
 
 
 String blockHandler() default "";
 
 
 Class<?>[] blockHandlerClass() default {};
 
 
 String fallback() default "";
 
 
 String defaultFallback() default "";
 
 
 Class<?>[] fallbackClass() default {};
 
 
 Class<? extends Throwable>[] exceptionsToTrace() default {Throwable.class};
 
 
 Class<? extends Throwable>[] exceptionsToIgnore() default {};
 }
 
 | 
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 
 | @RequestMapping("/get")@ResponseBody
 @SentinelResource(value = "allInfos",fallback = "errorReturn")
 public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){
 try {
 if (num % 2 == 0) {
 log.info("num % 2 == 0");
 throw new BaseException("something bad with 2", 400);
 }
 return JsonResult.ok();
 } catch (ProgramException e) {
 log.info("error");
 return JsonResult.error("error");
 }
 }
 
 | 
3.1.3、针对接口配置熔断方法或者限流方法
默认过滤拦截所有 Controller 接口
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 
 | 
 
 
 
 
 
 
 public JsonResult errorReturn(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num) throws BlockException {
 return JsonResult.error("error 限流" + num );
 }
 
 
 
 
 
 
 
 
 
 
 public JsonResult errorReturn(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num,BlockException b) throws BlockException {
 return JsonResult.error("error 熔断" + num );
 }
 
 | 
注意也可以不配置限流或者熔断方法。通过全局异常去捕获 UndeclaredThrowableException 或者 BlockException 避免大量的开发量
3.1.4、接入 dashboard
| 12
 3
 4
 5
 6
 
 | spring:cloud:
 sentinel:
 transport:
 port: 8719
 dashboard: localhost:8080
 
 | 

sentinel
3.1.5、规则持久化和动态更新
接入配置中心如:zookeeper 等等,并对规则采用推模式
3.2、hystrix
3.2.1、引入依赖
 | 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 
 | <dependency><groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-actuator</artifactId>
 </dependency>
 <dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
 <version>2.0.4.RELEASE</version>
 </dependency>
 <dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
 <version>2.0.4.RELEASE</version>
 </dependency>
 
 | 
3.2.2、改造接口
@HystrixCommand(fallbackMethod = “timeOutError”)
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 
 | @Target({ElementType.METHOD})@Retention(RetentionPolicy.RUNTIME)
 @Inherited
 @Documented
 public @interface HystrixCommand {
 String groupKey() default "";
 
 String commandKey() default "";
 
 String threadPoolKey() default "";
 
 String fallbackMethod() default "";
 
 HystrixProperty[] commandProperties() default {};
 
 HystrixProperty[] threadPoolProperties() default {};
 
 Class<? extends Throwable>[] ignoreExceptions() default {};
 
 ObservableExecutionMode observableExecutionMode() default ObservableExecutionMode.EAGER;
 
 HystrixException[] raiseHystrixExceptions() default {};
 
 String defaultFallback() default "";
 }
 
 | 
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 
 | @RequestMapping("/get")@ResponseBody
 @HystrixCommand(fallbackMethod = "fallbackMethod")
 public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){
 try {
 if (num % 3 == 0) {
 log.info("num % 3 == 0");
 throw new BaseException("something bad whitch 3", 400);
 }
 
 return JsonResult.ok();
 } catch (ProgramException | InterruptedException exception) {
 log.info("error");
 return JsonResult.error("error");
 }
 }
 
 | 
3.2.3、针对接口配置熔断方法
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 
 | 
 
 
 
 
 
 public JsonResult fallbackMethod(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num) {
 response.setStatus(500);
 log.info("发生了熔断!!");
 return JsonResult.error("熔断");
 }
 
 | 
3.2.4、配置默认策略
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 
 | hystrix:command:
 default:
 execution:
 isolation:
 strategy: THREAD
 thread:
 # 线程超时15秒,调用Fallback方法
 timeoutInMilliseconds: 15000
 metrics:
 rollingStats:
 timeInMilliseconds: 15000
 circuitBreaker:
 # 10秒内出现3个以上请求(已临近阀值),并且出错率在50%以上,开启断路器.断开服务,调用Fallback方法
 requestVolumeThreshold: 3
 sleepWindowInMilliseconds: 10000
 
 | 
3.2.5、接入监控

hystrix

hystrix 示意图
曲线:用来记录 2 分钟内流量的相对变化,我们可以通过它来观察到流量的上升和下降趋势。
集群监控需要用到注册中心
3.3、resilience4j
3.3.1、引入依赖
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 
 | <dependency><groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-web</artifactId>
 </dependency>
 
 <dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-test</artifactId>
 <scope>test</scope>
 </dependency>
 
 <dependency>
 <groupId>io.github.resilience4j</groupId>
 <artifactId>resilience4j-spring-boot2</artifactId>
 <version>1.6.1</version>
 </dependency>
 
 <dependency>
 <groupId>io.github.resilience4j</groupId>
 <artifactId>resilience4j-bulkhead</artifactId>
 <version>1.6.1</version>
 </dependency>
 
 <dependency>
 <groupId>io.github.resilience4j</groupId>
 <artifactId>resilience4j-ratelimiter</artifactId>
 <version>1.6.1</version>
 </dependency>
 
 <dependency>
 <groupId>io.github.resilience4j</groupId>
 <artifactId>resilience4j-timelimiter</artifactId>
 <version>1.6.1</version>
 </dependency>
 
 | 
可以按需要引入:bulkhead,ratelimiter,timelimiter 等
3.3.2、改造接口
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 
 | @RequestMapping("/get")@ResponseBody
 
 @CircuitBreaker(name = "BulkheadA",fallbackMethod = "fallbackMethod")
 @Bulkhead(name = "BulkheadA",fallbackMethod = "fallbackMethod")
 public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){
 log.info("param----->" + num);
 try {
 
 
 if (num % 2 == 0) {
 log.info("num % 2 == 0");
 throw new BaseException("something bad with 2", 400);
 }
 
 if (num % 3 == 0) {
 log.info("num % 3 == 0");
 throw new BaseException("something bad whitch 3", 400);
 }
 
 if (num % 5 == 0) {
 log.info("num % 5 == 0");
 throw new ProgramException("something bad whitch 5", 400);
 }
 if (num % 7 == 0) {
 log.info("num % 7 == 0");
 int res = 1 / 0;
 }
 return JsonResult.ok();
 } catch (BufferUnderflowException e) {
 log.info("error");
 return JsonResult.error("error");
 }
 }
 
 | 
3.3.3、针对接口配置熔断方法
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 
 | 
 
 
 
 
 
 
 public JsonResult fallbackMethod(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num, BulkheadFullException exception) {
 return JsonResult.error("error 熔断" + num );
 }
 
 | 
3.3.4、配置规则
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 
 | resilience4j.circuitbreaker:instances:
 backendA:
 registerHealthIndicator: true
 slidingWindowSize: 100
 backendB:
 registerHealthIndicator: true
 slidingWindowSize: 10
 permittedNumberOfCallsInHalfOpenState: 3
 slidingWindowType: TIME_BASED
 minimumNumberOfCalls: 20
 waitDurationInOpenState: 50s
 failureRateThreshold: 50
 eventConsumerBufferSize: 10
 recordFailurePredicate: io.github.robwin.exception.RecordFailurePredicate
 
 resilience4j.retry:
 instances:
 backendA:
 maxRetryAttempts: 3
 waitDuration: 10s
 enableExponentialBackoff: true
 exponentialBackoffMultiplier: 2
 retryExceptions:
 - org.springframework.web.client.HttpServerErrorException
 - java.io.IOException
 ignoreExceptions:
 - io.github.robwin.exception.BusinessException
 backendB:
 maxRetryAttempts: 3
 waitDuration: 10s
 retryExceptions:
 - org.springframework.web.client.HttpServerErrorException
 - java.io.IOException
 ignoreExceptions:
 - io.github.robwin.exception.BusinessException
 
 resilience4j.bulkhead:
 instances:
 backendA:
 maxConcurrentCalls: 10
 backendB:
 maxWaitDuration: 10ms
 maxConcurrentCalls: 20
 
 resilience4j.thread-pool-bulkhead:
 instances:
 backendC:
 maxThreadPoolSize: 1
 coreThreadPoolSize: 1
 queueCapacity: 1
 
 resilience4j.ratelimiter:
 instances:
 backendA:
 limitForPeriod: 10
 limitRefreshPeriod: 1s
 timeoutDuration: 0
 registerHealthIndicator: true
 eventConsumerBufferSize: 100
 backendB:
 limitForPeriod: 6
 limitRefreshPeriod: 500ms
 timeoutDuration: 3s
 
 resilience4j.timelimiter:
 instances:
 backendA:
 timeoutDuration: 2s
 cancelRunningFuture: true
 backendB:
 timeoutDuration: 1s
 cancelRunningFuture: false
 
 | 
配置的规则可以被代码覆盖
3.3.5、配置监控
如 grafana 等
肆、关注点
- 是否需要过滤部分异常
- 是否需要全局默认规则
- 可能需要引入其他中间件
-  k8s 流量控制
- 规则存储和动态修改
- 接入改造代价
【后面的话】
个人建议的话,比较推荐 sentinel,它提供了很多接口便于开发者自己拓展,同时我觉得他的规则动态更新也比较方便。最后是相关示例代码: 单体应用示例代码

薏米笔记