Java如何获得文件编码格式_第2页

发布网友发布时间：2022-05-17 11:44

共1个回答

热心网友时间：2023-08-21 12:44

所以，一般有了这个探测器就可满足大多数项目的要求，如果你还不放心，可以再多加几个探测器，比如下面的ASCIIDetector、UnicodeDetector等。 ---------------------------------------------------------------------------*/ detector.add(JChardetFacade.getInstance());//用到antlr.jar、chardet.jar // ASCIIDetector用于ASCII编码测定 detector.add(ASCIIDetector.getInstance()); // UnicodeDetector用于Unicode家族编码的测定 detector.add(UnicodeDetector.getInstance()); java.nio.charset.Charset charset = null; File f = new File(path); try { charset = detector.detectCodepage(f.toURI().toURL()); } catch (Exception ex) { ex.printStackTrace(); } if (charset != null) return charset.name(); else return null; } public static void main(String[] args) throws IOException, FileNotFoundException { String path = "J:\\Unicode\\ub.txt"; //Windows下Unicode探测后得到Windows-1252 String encode = getFileEncode(path); if("Windows-1252".equalsIgnoreCase(encode)) encode = "Unicode"; File file = new File(path); InputStream ios = new java.io.FileInputStream(file); byte[] b = new byte[3]; ios.read(b); ios.close(); if (b[0] == -17 && b[1] == -69 && b[2] == -65)//文件头 System.out.println(file.getName() + "：编码为UTF-8"); else System.out.println(file.getName() + "：可能是GBK，也可能是其他编码。"); BufferedReader bufferedReader = new BufferedReader( new InputStreamReader(new FileInputStream(file), encode)); System.out.println(encode); System.out.println(bufferedReader.readLine().substring(1));//去掉第一行中的文件头 } }
上面代码中的detector不仅可以用于探测文件的编码，也可以探测任意输入的文本流的编码，方法是调用其重载形式： charset=detector.detectCodepage(InputStream in, int length); 上面的字节数由程序员指定，字节数越多，判定越准确，当然时间也花得越长。要注意，字节数的指定不能超过文本流的最大长度。