举个例子:
'赵','1234'
'钱','2345''孙','3456'
'李','4567''周','56
78'
……
这是我文件里的内容,一共是五条数据:
赵 1234
钱 2345
孙 3456
李 4567
周 5678
但是在文件中的格式不规范,如上显示。
两条数据之间没有分隔符,如第二条和第三条;
一条数据的某个字段也可能分两行显示,如第五条,
现在我想把这些数据准确的读取出来并插入数据库,请问大家在读取这个地方该怎么编写
[code="java"]
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
public class Reader {
public static void main(String[] args) throws Exception {
FileReader fr = new FileReader("D:\input.txt");
BufferedReader br = new BufferedReader(fr);
List list = new ArrayList();
while (br.ready()) {
String line = br.readLine();
list.add(line);
}
Bean bean = null;
List result = new ArrayList();
bean = new Bean();
int count = 0;
boolean addname = true;
for (int i = 0; i < list.size(); i++) {
String line = (String) list.get(i);
while (line.indexOf("'") != -1) {
int index = line.indexOf("'");
count++;
if (count % 2 == 0) {
String tmp = line.substring(0, index);
line = line.substring(index + 1);
if (addname) {
if (bean.name == null) {
bean.name = tmp;
} else {
bean.name += tmp;
}
addname = false;
} else {
if (bean.code == null) {
bean.code = tmp;
} else {
bean.code += tmp;
}
addname = true;
result.add(bean);
bean = new Bean();
}
} else {
line = line.substring(index + 1);
}
}
if (!"".equals(line.trim())) {
if (addname) {
if (bean.name == null) {
bean.name = line.trim();
} else {
bean.name += line.trim();
}
} else {
if (bean.code == null) {
bean.code = line.trim();
} else {
bean.code += line.trim();
}
}
}
}
for (int i = 0; i < result.size(); i++) {
bean = (Bean) result.get(i);
System.out.println(bean.name + "\t" + bean.code);
}
}
}
class Bean {
public String name;
public String code;
}
[/code]
一种实现,你看看可没可用吧。
对于这样的解析,都是有针对性的,不会有很通用的方法,都是遇到了什么处理什么,
你提到的我处理了,如有问题再联系。
[code="java"]
public class ReadTxt {
public static void readFileByChars(String fileName) {
File file = new File(fileName);
Reader reader = null;
Writer writer = null;
StringBuffer sb = new StringBuffer();
String txtContent = "";
try {
reader = new InputStreamReader(new FileInputStream(file));
writer = new OutputStreamWriter(new FileOutputStream("D:\\names1.txt"));
int tempchar;
while ((tempchar = reader.read()) != -1) {
if (((char) tempchar) != '\r' && ((char) tempchar) != '\n') {
sb.append((char) tempchar);
}
}
System.out.println(sb);
txtContent = sb.toString().replaceAll("''", "','");
txtContent = txtContent.replaceAll("' '", "','");
txtContent = txtContent.replaceAll(" ", "");
txtContent = txtContent.replaceAll("'", "");
String[] names = txtContent.split(",");
for (int i = 0; i < names.length; i++) {
if(i % 2 == 0){
System.out.print(names[i]);
}else{
System.out.println(names[i]);
}
}
writer.write(txtContent, 0, txtContent.length());
reader.close();
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
finally {
if (reader != null) {
try {
reader.close();
writer.close();
} catch (IOException e1) {
}
}
}
}
public static void main(String[] args) {
ReadTxt.readFileByChars("D:\\names.txt");
}
}
[/code]
将内容读到字符串中
[code="java"]
String dat=文件内容
StringTokenizer s = new StringTokenizer(dat .replaceAll(",", "").replaceAll(" ", ""), "'',");
Map map = new HashMap();
String key = "";
while (s.hasMoreElements()) {
String data = s.nextElement().toString();
if (Pattern.matches("[\u4E00-\u9FA5]", data)) {
key = data;
} else {
if (map.containsKey(key)) {
map.put(key, map.get(key) + " " + data);
} else {
map.put(key, data);
}
}
}
for (Entry<?, ?> entry : map.entrySet()) {
System.out.println("key = " + entry.getKey() + ", value = "
+ entry.getValue());
}
[/code]
你修改下,把写文件那个代码删掉就行了
结果都输出给你了,要什么你简单高下就ok了!
需要修改个地方
如果你是单个汉字
if (Pattern.matches("[\u4E00-\u9FA5]", data)) {
这样可以
如果是多个汉字需要修改正则
那就的
if (Pattern.matches("[\u4E00-\u9FA5]+", data)) {
了
'赵','123456','12''钱','123
456','23'
23按位置来说,应该是名字吧,
那code应该是什么放个空就可以?
假定:
每条记录无乱是否有数据内容,都有单引号存在
每条记录两个值都是用,隔开
那么思路可以为:
字符流 readline --> StringBuffer
处理StringBuffer即可
不知道是否是这样的。
[code="java"] public static void main(String[] args) throws IOException{
File f = new File("name.txt");
BufferedReader fis = new BufferedReader(new FileReader(f));
StringBuffer sb = new StringBuffer();
String line = null;
while((line=fis.readLine())!=null){
sb.append(line);
}
System.out.println(sb.toString());
Pattern p = Pattern.compile("[\u4E00-\u9FA5]|[0-9]{4}");
Matcher m = p.matcher(sb.toString());
int i = 0;
while(m.find()){
System.out.print(m.group()+"\t");
i++;
if(i%2==0){
System.out.println();
}
}
}[/code]