android retrofit 请求返回String数据中文乱码解决方案

## 问题来源

[focus](https://www.ihewro.com/archives/948/) 应用中核心的部分是，网络请求订阅的xml文件内容，然后解析xml文件存储到本地数据库。

这里网络请求我使用的是retrofit，返回的类型是String，所以使用的是`ScalarsConverterFactory`的解析器。

***就会出现中文乱码问题。**

## 解决方法

### 给okhttp添加拦截器

*EncodingInterceptor.java*

```java
package com.ihewro.focus.helper;

import com.blankj.ALog;
import com.ihewro.focus.util.StringUtil;

import java.io.IOException;
import java.lang.reflect.Field;
import java.lang.reflect.Modifier;
import java.net.SocketTimeoutException;
import java.util.logging.Logger;

import okhttp3.Headers;
import okhttp3.Interceptor;
import okhttp3.Request;
import okhttp3.Response;
import okhttp3.ResponseBody;

/**
 * <pre>
 *     time   : 2019/05/23
 *     desc   :
 *     version: 1.0
 * </pre>
 */
public class EncodingInterceptor implements Interceptor {

/**
     * 自定义编码
     */
    private String encoding;

public EncodingInterceptor(String encoding) {
        this.encoding = encoding;
    }

@Override public Response intercept(Interceptor.Chain chain) throws IOException {
        Request request = chain.request();
        Response response = chain.proceed(request);

settingClientCustomEncoding(response);
        return response;
    }

/**
     * setting client custom encoding when server not return encoding
     * @param response
     * @throws IOException
     */
    private void settingClientCustomEncoding(Response response) throws IOException {
        setBodyContentType(response);

}

/**
     * set body contentType
     * @param response
     * @throws IOException
     */
    private void setBodyContentType(Response response) throws IOException {
        ResponseBody body = response.body();
        // setting body contentTypeString using reflect
        Class<? extends ResponseBody> aClass = body.getClass();
        try {
            Field field = aClass.getDeclaredField("contentTypeString");
            field.setAccessible(true);
            String contentTypeString = String.valueOf(field.get(body));
            field.set(body, "application/rss+xml;charset=" + encoding);

} catch (NoSuchFieldException e) {
            throw new IOException("use reflect to setting header occurred an error", e);
        } catch (IllegalAccessException e) {
            throw new IOException("use reflect to setting header occurred an error", e);
        }
    }

}
```

然后在okhttp里面拦截一下：

```java
 
 builder.addInterceptor(new EncodingInterceptor("ISO-8859-1"));//全部转换成这个编码

```

> 说明一下为什么全部转换成这个编码ISO-8859-1，（对全部转换，听我的，全部听我的）**因为这个编码可以后面再无损的转换成gbk或者utf-8。相反gbk和utf-8编码直接是无法转换的。**

![](https://www.ihewro.com/usr/uploads/time/5d74f6cee34ca.png)

然后获取到response.body 之后，再进行对内容编码转换。

```java
        //获取xml文件的编码
        String encode = "UTF-8";//默认编码
        String originCode = "ISO-8859-1";
        String temp = xmlStr.substring(0,100);
        Pattern p = Pattern.compile("encoding=\"(.*?)\"");
        Matcher m = p.matcher(temp);
        boolean flag = m.find();//【部分匹配】，返回true or false，而且指针会移动下次匹配的位置
        if (flag){
            int begin = m.start()+10;
            int end = m.end();
            encode = temp.substring(begin,end-1);
            ALog.d("编码："+encode);
        }//否则就是文件没有标明编码格式，按照utf-8进行解码

//如果文件没有乱码，则不需要转换（为什么需要这部分后面分析会写到）
        if (!java.nio.charset.Charset.forName("GBK").newEncoder().canEncode(xmlStr.substring(0,Math.min(xmlStr.length(),3000)))){
            xmlStr = new String(xmlStr.getBytes(originCode),encode);
        }else {
        }
```

## 为什么会乱码

正常的网页返回的时候会有一个返回头信息：

![](https://www.ihewro.com/usr/uploads/time/5d74f6e0b71d4.png)

里面会标明返回字符串的编码。

**有的网页不仅不返回这个编码信息，而且使用gbk编码。这retrofit就默认按照utf-8编码进行读取字符串了。**

具体retrofit的执行流程如下：

```java
//业务代码，同步请求
call.execute();

//跳转 retrofit2.OkHttpCall#execute
 @Override public Response<T> execute() throws IOException {
    okhttp3.Call call;

...（对call的处理）

return parseResponse(call.execute());
  }

```

可以看到对请求是两步处理了：

```java
okhttp3.Response = call.execute()

parseResponse(okhttp3.Response)
```

**我们先分析`call.execute()`**

```java

//okhttp3.RealCall#execute

@Override public Response execute() throws IOException {

...
      Response result = getResponseWithInterceptorChain();
...   
}
  
  //okhttp3.RealCall#getResponseWithInterceptorChain
  Response getResponseWithInterceptorChain() throws IOException {
    // Build a full stack of interceptors.
    ...
    return chain.proceed(originalRequest);
  }
```

后面的`proceed`根据okhttp一开始绑定的拦截器，进行链式处理服务器返回的内容。

```java
public Response proceed(Request request, StreamAllocation streamAllocation, HttpCodec httpCodec,
    
    ...
    
    Interceptor interceptor = interceptors.get(index);
    Response response = interceptor.intercept(next);
    
    ...
 
    return response;
  }
```

我使用了`com.squareup.okhttp3:logging-interceptor:3.4.1`用来打印请求信息，所以会有下面的处理：

```java
//okhttp3.logging.HttpLoggingInterceptor#intercept
@Override public Response intercept(Chain chain) throws IOException {
    
    ...
        
    Charset charset = UTF8;
    MediaType contentType = requestBody.contentType();
    if (contentType != null) {
        charset = contentType.charset(UTF8);
    }
    
    ...
    
    return response;
  }
  
  //okhttp3.MediaType#charset(java.nio.charset.Charset)
    public @Nullable Charset charset(@Nullable Charset defaultValue) {
    try {
      return charset != null ? Charset.forName(charset) : defaultValue;
    } catch (IllegalArgumentException e) {
      return defaultValue; // This charset is invalid or unsupported. Give up.
    }
  }
```

其中`requestBody.contentType()`，就是根据服务器返回的header中的`content-Type`字段（如：content-type: text/html; charset=UTF-8）获取到结果编码。

如果这个字段中没有注明charset，则默认设置为`utf-8`。

关于`contentType()` 函数 如何从content-Type字段获取到编码信息的，可以看下面的源码：

[collapse title="contentType()源码分析" status="false"]  
  ```java
//rawbody.contentType()，跳转到okhttp3.internal.http.RealResponseBody
@Override public MediaType contentType() {
    return contentTypeString != null ? MediaType.parse(contentTypeString) : null;
  }
```

这里的contentTypeString，就是从服务器端返回的字符串，如“content-type: text/html; charset=UTF-8”。

而函数`MediaType.parse(contentTypeString)`返回的结果就是服务器返回数据的编码。

```java

//继续分析 okhttp3.Cache.CacheResponseBody#contentType
   @Override public MediaType contentType() {
      return contentType != null ? MediaType.parse(contentType) : null;
    }
//okhttp3.MediaType#parse
  public static @Nullable MediaType parse(String string) {
    try {
      return get(string);
    } catch (IllegalArgumentException ignored) {
      return null;
    }
  }

//okhttp3.MediaType#get
  public static MediaType get(String string) {
    Matcher typeSubtype = TYPE_SUBTYPE.matcher(string);
  
    ...（这部分不显示了，就是用正则匹配出来contentType字符串里面的charset）
    
    return new MediaType(string, type, subtype, charset);
  }
```
[/collapse]

**至于第二步的parseResponse是根据第一步的处理结果进一步处理（比如根据状态码，不同处理等）**。与问题无关则不再分析。

```java
//retrofit2.OkHttpCall#parseResponse
  Response<T> parseResponse(okhttp3.Response rawResponse) throws IOException {
    ResponseBody rawBody = rawResponse.body();

rawResponse = rawResponse.newBuilder()
        .body(new NoContentResponseBody(rawBody.contentType(), rawBody.contentLength()))
        .build();

...（请求结束的后续处理）
  }  
  ```

按照上面分析，我们只需要在一开始的拦截器，将服务器的`contentTypeString`，修改为编码为`ISO-8859-1`，然后再根据xml文件的`encoding`字段标明的编码重新编码即可。

**但是仍然有一种情况下会出现乱码：**

服务器中`content-type`字段包含utf-8编码信息，按照上面先转ISO-8858-1编码，再转回来，仍然乱码了。

经过debug，发现虽然设置ISO-8858-1的content-type字段，但是结果仍然是utf-8编码，这样的话按照ISO-8858-1解码再UTF-8编码一次就会出现乱码。

**所以为什么结果在未二次编码前不是ISO-8858-1编码，而自动变成了utf-8编码了呢？**

----

这一切要从`parseResponse()`函数，我们前面忽略的部分说起，因为这类编码器正是在这个函数中开始工作的：

```java
  Response<T> parseResponse(okhttp3.Response rawResponse) throws IOException {
    ResponseBody rawBody = rawResponse.body();

// Remove the body's source (the only stateful object) so we can pass the response along.
    rawResponse = rawResponse.newBuilder()
        .body(new NoContentResponseBody(rawBody.contentType(), rawBody.contentLength()))
        .build();

...

ExceptionCatchingResponseBody catchingBody = new ExceptionCatchingResponseBody(rawBody);
    try {
      T body = responseConverter.convert(catchingBody);
      return Response.success(body, rawResponse);
    } catch (RuntimeException e) {
      catchingBody.throwIfCaught();
      throw e;
    }
  }
```

我们使用的`ScalarResponseBodyConverter`太智能了，它能根据服务器数据的字节流判断是否是utf系列编码

```java

//retrofit2.converter.scalars.ScalarResponseBodyConverters.StringResponseBodyConverter#convert
@Override public String convert(ResponseBody value) throws IOException {
      return value.string();
}

//okhttp3.ResponseBody#string
  public final String string() throws IOException {
    BufferedSource source = source();
    try {
      Charset charset = Util.bomAwareCharset(source, charset());
      return source.readString(charset);
    } finally {
      Util.closeQuietly(source);
    }
  }

//okhttp3.internal.Util#bomAwareCharset
  public static Charset bomAwareCharset(BufferedSource source, Charset charset) throws IOException {
    if (source.rangeEquals(0, UTF_8_BOM)) {
      source.skip(UTF_8_BOM.size());
      return UTF_8;
    }
    if (source.rangeEquals(0, UTF_16_BE_BOM)) {
      source.skip(UTF_16_BE_BOM.size());
      return UTF_16_BE;
    }
    if (source.rangeEquals(0, UTF_16_LE_BOM)) {
      source.skip(UTF_16_LE_BOM.size());
      return UTF_16_LE;
    }
    if (source.rangeEquals(0, UTF_32_BE_BOM)) {
      source.skip(UTF_32_BE_BOM.size());
      return UTF_32_BE;
    }
    if (source.rangeEquals(0, UTF_32_LE_BOM)) {
      source.skip(UTF_32_LE_BOM.size());
      return UTF_32_LE;
    }
    return charset;
  }    
```

所以，我们需要在二次编码前判断`response.body()`是否是乱码，如果是，才二次编码，否则就不需要二次处理了。

问题来源

focus 应用中核心的部分是，网络请求订阅的xml文件内容，然后解析xml文件存储到本地数据库。

这里网络请求我使用的是retrofit，返回的类型是String，所以使用的是ScalarsConverterFactory的解析器。

*就会出现中文乱码问题。

解决方法

给okhttp添加拦截器

EncodingInterceptor.java

package com.ihewro.focus.helper;

import com.blankj.ALog;
import com.ihewro.focus.util.StringUtil;

import java.io.IOException;
import java.lang.reflect.Field;
import java.lang.reflect.Modifier;
import java.net.SocketTimeoutException;
import java.util.logging.Logger;

import okhttp3.Headers;
import okhttp3.Interceptor;
import okhttp3.Request;
import okhttp3.Response;
import okhttp3.ResponseBody;

/**
 * <pre>
 *     time   : 2019/05/23
 *     desc   :
 *     version: 1.0
 * </pre>
 */
public class EncodingInterceptor implements Interceptor {


    /**
     * 自定义编码
     */
    private String encoding;

    public EncodingInterceptor(String encoding) {
        this.encoding = encoding;
    }

    @Override public Response intercept(Interceptor.Chain chain) throws IOException {
        Request request = chain.request();
        Response response = chain.proceed(request);

        settingClientCustomEncoding(response);
        return response;
    }

    /**
     * setting client custom encoding when server not return encoding
     * @param response
     * @throws IOException
     */
    private void settingClientCustomEncoding(Response response) throws IOException {
        setBodyContentType(response);

    }
    

    /**
     * set body contentType
     * @param response
     * @throws IOException
     */
    private void setBodyContentType(Response response) throws IOException {
        ResponseBody body = response.body();
        // setting body contentTypeString using reflect
        Class<? extends ResponseBody> aClass = body.getClass();
        try {
            Field field = aClass.getDeclaredField("contentTypeString");
            field.setAccessible(true);
            String contentTypeString = String.valueOf(field.get(body));
            field.set(body, "application/rss+xml;charset=" + encoding);

        } catch (NoSuchFieldException e) {
            throw new IOException("use reflect to setting header occurred an error", e);
        } catch (IllegalAccessException e) {
            throw new IOException("use reflect to setting header occurred an error", e);
        }
    }

}

然后在okhttp里面拦截一下：

 
 builder.addInterceptor(new EncodingInterceptor("ISO-8859-1"));//全部转换成这个编码

说明一下为什么全部转换成这个编码ISO-8859-1，（对全部转换，听我的，全部听我的）因为这个编码可以后面再无损的转换成gbk或者utf-8。相反gbk和utf-8编码直接是无法转换的。

然后获取到response.body 之后，再进行对内容编码转换。

        //获取xml文件的编码
        String encode = "UTF-8";//默认编码
        String originCode = "ISO-8859-1";
        String temp = xmlStr.substring(0,100);
        Pattern p = Pattern.compile("encoding=\"(.*?)\"");
        Matcher m = p.matcher(temp);
        boolean flag = m.find();//【部分匹配】，返回true or false，而且指针会移动下次匹配的位置
        if (flag){
            int begin = m.start()+10;
            int end = m.end();
            encode = temp.substring(begin,end-1);
            ALog.d("编码："+encode);
        }//否则就是文件没有标明编码格式，按照utf-8进行解码


        //如果文件没有乱码，则不需要转换（为什么需要这部分后面分析会写到）
        if (!java.nio.charset.Charset.forName("GBK").newEncoder().canEncode(xmlStr.substring(0,Math.min(xmlStr.length(),3000)))){
            xmlStr = new String(xmlStr.getBytes(originCode),encode);
        }else {
        }

为什么会乱码

正常的网页返回的时候会有一个返回头信息：

里面会标明返回字符串的编码。

有的网页不仅不返回这个编码信息，而且使用gbk编码。这retrofit就默认按照utf-8编码进行读取字符串了。

具体retrofit的执行流程如下：

//业务代码，同步请求
call.execute();

//跳转 retrofit2.OkHttpCall#execute
 @Override public Response<T> execute() throws IOException {
    okhttp3.Call call;

    ...（对call的处理）

    return parseResponse(call.execute());
  }

可以看到对请求是两步处理了：

okhttp3.Response = call.execute()

parseResponse(okhttp3.Response)

我们先分析call.execute()


//okhttp3.RealCall#execute

@Override public Response execute() throws IOException {

...
      Response result = getResponseWithInterceptorChain();
...   
}
  
  //okhttp3.RealCall#getResponseWithInterceptorChain
  Response getResponseWithInterceptorChain() throws IOException {
    // Build a full stack of interceptors.
    ...
    return chain.proceed(originalRequest);
  }

后面的proceed根据okhttp一开始绑定的拦截器，进行链式处理服务器返回的内容。

public Response proceed(Request request, StreamAllocation streamAllocation, HttpCodec httpCodec,
    
    ...
    
    Interceptor interceptor = interceptors.get(index);
    Response response = interceptor.intercept(next);
    
    ...
 
    return response;
  }

我使用了com.squareup.okhttp3:logging-interceptor:3.4.1用来打印请求信息，所以会有下面的处理：

//okhttp3.logging.HttpLoggingInterceptor#intercept
@Override public Response intercept(Chain chain) throws IOException {
    
    ...
        
    Charset charset = UTF8;
    MediaType contentType = requestBody.contentType();
    if (contentType != null) {
        charset = contentType.charset(UTF8);
    }
    
    ...
    
    return response;
  }
  
  //okhttp3.MediaType#charset(java.nio.charset.Charset)
    public @Nullable Charset charset(@Nullable Charset defaultValue) {
    try {
      return charset != null ? Charset.forName(charset) : defaultValue;
    } catch (IllegalArgumentException e) {
      return defaultValue; // This charset is invalid or unsupported. Give up.
    }
  }

其中requestBody.contentType()，就是根据服务器返回的header中的content-Type字段（如：content-type: text/html; charset=UTF-8）获取到结果编码。

如果这个字段中没有注明charset，则默认设置为utf-8。

关于contentType() 函数如何从content-Type字段获取到编码信息的，可以看下面的源码：

contentType()源码分析

//rawbody.contentType()，跳转到okhttp3.internal.http.RealResponseBody
@Override public MediaType contentType() {
  return contentTypeString != null ? MediaType.parse(contentTypeString) : null;
}

这里的contentTypeString，就是从服务器端返回的字符串，如“content-type: text/html; charset=UTF-8”。

而函数MediaType.parse(contentTypeString)返回的结果就是服务器返回数据的编码。


//继续分析 okhttp3.Cache.CacheResponseBody#contentType
   @Override public MediaType contentType() {
      return contentType != null ? MediaType.parse(contentType) : null;
    }
//okhttp3.MediaType#parse
  public static @Nullable MediaType parse(String string) {
    try {
      return get(string);
    } catch (IllegalArgumentException ignored) {
      return null;
    }
  }

//okhttp3.MediaType#get
  public static MediaType get(String string) {
    Matcher typeSubtype = TYPE_SUBTYPE.matcher(string);
  
    ...（这部分不显示了，就是用正则匹配出来contentType字符串里面的charset）
    
    return new MediaType(string, type, subtype, charset);
  }

至于第二步的parseResponse是根据第一步的处理结果进一步处理（比如根据状态码，不同处理等）。与问题无关则不再分析。

//retrofit2.OkHttpCall#parseResponse
  Response<T> parseResponse(okhttp3.Response rawResponse) throws IOException {
    ResponseBody rawBody = rawResponse.body();

    rawResponse = rawResponse.newBuilder()
        .body(new NoContentResponseBody(rawBody.contentType(), rawBody.contentLength()))
        .build();

    ...（请求结束的后续处理）
  }

按照上面分析，我们只需要在一开始的拦截器，将服务器的contentTypeString，修改为编码为ISO-8859-1，然后再根据xml文件的encoding字段标明的编码重新编码即可。

但是仍然有一种情况下会出现乱码：

服务器中content-type字段包含utf-8编码信息，按照上面先转ISO-8858-1编码，再转回来，仍然乱码了。

经过debug，发现虽然设置ISO-8858-1的content-type字段，但是结果仍然是utf-8编码，这样的话按照ISO-8858-1解码再UTF-8编码一次就会出现乱码。

所以为什么结果在未二次编码前不是ISO-8858-1编码，而自动变成了utf-8编码了呢？

这一切要从parseResponse()函数，我们前面忽略的部分说起，因为这类编码器正是在这个函数中开始工作的：

  Response<T> parseResponse(okhttp3.Response rawResponse) throws IOException {
    ResponseBody rawBody = rawResponse.body();

    // Remove the body's source (the only stateful object) so we can pass the response along.
    rawResponse = rawResponse.newBuilder()
        .body(new NoContentResponseBody(rawBody.contentType(), rawBody.contentLength()))
        .build();

   ...

    ExceptionCatchingResponseBody catchingBody = new ExceptionCatchingResponseBody(rawBody);
    try {
      T body = responseConverter.convert(catchingBody);
      return Response.success(body, rawResponse);
    } catch (RuntimeException e) {
      catchingBody.throwIfCaught();
      throw e;
    }
  }

我们使用的ScalarResponseBodyConverter太智能了，它能根据服务器数据的字节流判断是否是utf系列编码


//retrofit2.converter.scalars.ScalarResponseBodyConverters.StringResponseBodyConverter#convert
@Override public String convert(ResponseBody value) throws IOException {
      return value.string();
}

//okhttp3.ResponseBody#string
  public final String string() throws IOException {
    BufferedSource source = source();
    try {
      Charset charset = Util.bomAwareCharset(source, charset());
      return source.readString(charset);
    } finally {
      Util.closeQuietly(source);
    }
  }
  

//okhttp3.internal.Util#bomAwareCharset
  public static Charset bomAwareCharset(BufferedSource source, Charset charset) throws IOException {
    if (source.rangeEquals(0, UTF_8_BOM)) {
      source.skip(UTF_8_BOM.size());
      return UTF_8;
    }
    if (source.rangeEquals(0, UTF_16_BE_BOM)) {
      source.skip(UTF_16_BE_BOM.size());
      return UTF_16_BE;
    }
    if (source.rangeEquals(0, UTF_16_LE_BOM)) {
      source.skip(UTF_16_LE_BOM.size());
      return UTF_16_LE;
    }
    if (source.rangeEquals(0, UTF_32_BE_BOM)) {
      source.skip(UTF_32_BE_BOM.size());
      return UTF_32_BE;
    }
    if (source.rangeEquals(0, UTF_32_LE_BOM)) {
      source.skip(UTF_32_LE_BOM.size());
      return UTF_32_LE;
    }
    return charset;
  }

所以，我们需要在二次编码前判断response.body()是否是乱码，如果是，才二次编码，否则就不需要二次处理了。

最后修改：2019 年 09 月 08 日

喜欢我的文章吗？
别忘了点赞或赞赏，让我知道创作的路上有你陪伴。

android retrofit 请求返回String数据中文乱码解决方案

问题来源

解决方法

给okhttp添加拦截器

为什么会乱码

1 条评论

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

handsome —— 一款typecho主题

Focus——不只是RSS订阅器

2019北邮考研我的个人经历

再见，2016

Leaf — A Typecho Theme

不必为难自己

放弃你，从此再无悲伤

想成为夏目一样温暖的人

jsDelivr 事故后的公共CDN故障自动切换思考

正确提问方式

android retrofit 请求返回String数据中文乱码解决方案

问题来源

解决方法

给okhttp添加拦截器

为什么会乱码

1 条评论

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

android retrofit 请求返回String数据中文乱码解决方案

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款