一长串读取目录下json文件，合并去重并保存一个json，另外转txt文本，为啥出错了？

# 对机器人的代码要求

我的json文件数据结构全部是复杂数据存储结构，将指定目录storage/emulated/0/下载/Document/下的所有 JSON 文件筛选出文件大小不超过 700MB 的文件。
将筛选后的 JSON 文件合并为一个文件，并去除重复数据。
将合并后的 JSON 文件保存到指定路径 /storage/emulated/0/0_培音/z.json。
保存的文件json如果文件名存在，我会给文件名加上时间戳，以确保文件名唯一性。
将保存的 JSON 文件，以对话或问答形式保存为一个txt文本文件。
保存的文本文件名，我会给文件名加上时间戳，以确保文件名唯一性。
每10个问答对话（包含完整问话与答案的，只有问题不存在答案，或者反过来的，只有答案没有提问的，不算一个完整问答对话，保留）为一章节，每200章节为一部分，格式为
第X章

[问题]

[答案]

[章节末尾时间戳]：……

txt文件末尾和开头统计我有多少章节，问答，总提问数，和总回答数，有效提问，和有效答案，部分，字数，总字数，提问总字数，答案总字数… 等

并以递归方式列出其基本信息。



根据您的要求，我将执行以下操作：

1. 从指定目录`storage/emulated/0/下载/Document/`下筛选出文件大小不超过 700MB 的所有 JSON 文件。
2. 将筛选后的 JSON 文件合并为一个文件，并去除重复数据。
3. 如果需要，我会为保存的 JSON 文件名添加时间戳，以确保文件名的唯一性，并将其保存到指定路径`/storage/emulated/0/0_培音/z.json`。
4. 将保存的 JSON 文件以对话或问答形式保存为一个 txt 文本文件。
5. 如果需要，我会为保存的文本文件名添加时间戳，以确保文件名的唯一性。
6. 每 10 个包含完整问答对话的章节（问题与答案都存在），作为一章节。每 200 章节作为一部分进行划分。格式如下：
   第 X 章
   
   [问题]
   
   [答案]
   
   [章节末尾时间戳]：……
   
7. 在 txt 文件的开头和末尾统计章节数、问答对数、总提问数、总回答数、有效提问数、有效答案数、部分数、字数、总字数、提问总字数、答案总字数等信息。
8. 最后，打开保存的文件夹。

请确认您想要执行以上操作，然后我将开始处理。

# 机器人的回答：请确定您的代码缩进是否正确。根据您的回答，以下是示例代码的修正：

import os
import json
from datetime import datetime

def filter_json_files(directory, max_size):
    # 获取目录下所有以 .json 结尾且文件大小不超过 max_size 的文件
    json_files = []
    for file in os.listdir(directory):
        file_path = os.path.join(directory, file)
        if file.endswith(".json") and os.path.getsize(file_path) <= max_size:
            json_files.append(file_path)
    return json_files

def merge_json_files(json_files):
    merged_data = {}

    # 遍历每个 JSON 文件，将其数据合并到 merged_data 字典中
    for file in json_files:
        with open(file, "r") as f:
            data = json.load(f)
            merge_data(merged_data, data)

    return merged_data

def merge_data(target, source):
    # 合并数据的辅助函数
    for key, value in source.items():
        if key not in target:
            target[key] = value
        else:
            if isinstance(value, dict) and isinstance(target[key], dict):
                merge_data(target[key], value)
            elif isinstance(value, list) and isinstance(target[key], list):
                target[key].extend(value)
            elif value != target[key]:
                target[key] = value

def save_merged_json(data, output_file):
    # 将合并后的数据保存为 JSON 文件
    with open(output_file, "w") as f:
        json.dump(data, f)

def convert_data_to_text(data, file, indent=0):
    # 将数据转换为文本格式的辅助函数
    if isinstance(data, dict):  # 如果是字典类型
        for key, value in data.items():
            file.write(f"[{key}]\n")  # 写入键名
            convert_data_to_text(value, file, indent+1)  # 递归处理值
            file.write("\n")  # 写入空行
    elif isinstance(data, list):  # 如果是列表类型
        for item in data:
            convert_data_to_text(item, file, indent+1)  # 递归处理列表项
            file.write("\n")  # 写入空行
    else:  # 其他类型，直接写入
        file.write(f"{' ' * (indent+1)}{data}\n")

def format_chapter(chapter_number, questions, answers):
    # 格式化章节内容
    formatted_chapter = f"第 {chapter_number} 章\n\n"
    
    for question, answer in zip(questions, answers):
        formatted_chapter += f"[问题]\n{question}\n\n"
        formatted_chapter += f"[答案]\n{answer}\n\n"
        
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    formatted_chapter += f"[章节末尾时间戳]: {timestamp}\n\n"
    
    return formatted_chapter

def save_as_txt_file(chapters, output_file):
    # 将章节保存为 txt 文本文件
    with open(output_file, "w") as f:
        chapter_count = len(chapters)
        question_count = sum(len(chapter["questions"]) for chapter in chapters)
        answer_count = sum(len(chapter["answers"]) for chapter in chapters)
        valid_question_count = sum(1 for chapter in chapters for question in chapter["questions"] if question)
        valid_answer_count = sum(1 for chapter in chapters for answer in chapter["answers"] if answer)
        text_count = sum(len(chapter["formatted_chapter"]) for chapter in chapters)
        
        f.write(f"章节数: {chapter_count}\n")
        f.write(f"问答对数: {question_count}\n")
        f.write(f"总提问数: {valid_question_count}\n")
        f.write(f"总回答数: {valid_answer_count}\n")
        f.write(f"有效提问数: {valid_question_count}\n")
        f.write(f"有效答案数: {valid_answer_count}\n")
        f.write(f"部分数: {chapter_count}\n")
        f.write(f"字数: {text_count}\n")
        f.write(f"总字数: {text_count}\n")
        f.write(f"提问总字数: {question_count}\n")
        f.write(f"答案总字数: {answer_count}\n\n")
        
        for chapter in chapters:
            f.write(chapter["formatted_chapter"])
                

# 指定目录路径和文件大小限制（单位：字节）
directory = "storage/emulated/0/下载/Document/"
max_file_size = 700 * 1024 * 1024

# 1. 筛选出符合条件的 JSON 文件
filtered_files = filter_json_files(directory, max_file_size)

if not filtered_files:
    print("没有符合条件的 JSON 文件！")
else:
    # 2. 合并 JSON 文件
    merged_data = merge_json_files(filtered_files)

    # 3. 添加时间戳并保存合并后的数据为 JSON 文件
    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
    json_output_file = f"storage/emulated/0/0_培音/z_{timestamp}.json"
    save_merged_json(merged_data, json_output_file)

    # 4. 将保存的 JSON 文件保存为 txt 文本文件
    chapters = []
    chapter_number = 1
    questions = []
    answers = []
    
    for item in merged_data:
        if "问题" in item or "答案" in item:  # 修改此行缩进
            if questions and answers:
                formatted_chapter = format_chapter(chapter_number, questions, answers)
                chapters.append({
                    "formatted_chapter": formatted_chapter,
                    "questions": questions,
                    "answers": answers
                })
                chapter_number += 1
                questions = []
                answers = []

            if "问题" in item:  # 修改此行缩进
                question = merged_data[item]
                questions.append(question)
            elif "答案" in item:  # 修改此行缩进
                answer = merged_data[item]
                answers.append(answer)

    if questions and answers:
        formatted_chapter = format_chapter(chapter_number, questions, answers)
        chapters.append({
            "formatted_chapter": formatted_chapter,
            "questions": questions,
            "answers": answers
        })

    txt_output_file = f"storage/emulated/0/0_培音/z_{timestamp}.txt"
    save_as_txt_file(chapters, txt_output_file)

    # 完成消息输出
    print("文件合并完成！")
    print(f"合并后的 JSON 文件保存路径：{json_output_file}")
    print(f"合并后的 txt 文件保存路径：{txt_output_file}")

这有个类似的问题, 你可以参考下: https://ask.csdn.net/questions/793222
这篇博客也不错, 你可以看下目标检测数据集转换 json文件转换为txt文件格式
除此之外, 这篇博客: 前端 json数据转txt文本并下载(前端自定义封装数据并下载文件)中的 前端 json数据转txt文本并下载 部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:

在工作中,有一次客户要求将图形化的数据转化为txt文件下载

本例中也可前端自定义封装数据并下载文件

其后端传来的数据为json格式,需要前端转化为text数据的形式,并且以列排列,提供下载

直接上代码(可以直接拿来使用的)

//这个一个a标签,用于点击下载
<a style='color: #0697B6' onclick='downloadTxt()' href='javascript:void(0);'>Download file</a>

//downloadTitle  文件标题
//downloadData  文件的数据(正文)
var downloadTitle=null;
var downloadData==new Array();
downloadData.push(******);
function downloadTxt(){
        startDownloadTxt(文件名,downloadTitle,downloadData);
}

function startDownloadTxt(fileName,title,data){

    function StringBuilder() {
        console.log(this);
        this._stringArray = new Array();
    }
    StringBuilder.prototype.append = function(str){
        this._stringArray.push(str);
    }
    StringBuilder.prototype.toString = function(joinGap){
        return this._stringArray.join(joinGap);
    }
    var txtContent = new StringBuilder();
    if(title!=null){
        txtContent.append(title);
    }
    for (let str of data) {
        txtContent.append(str);
    }
    var blob = new Blob([txtContent.toString("\n")], {
        type: "text/plain;charset=utf-8"
    });
    saveAs(blob, fileName);
}

这是一个js文件(直接引入FileSaver.js)

/*
* A saveAs() FileSaver implementation.
*
* By Eli Grey, http://eligrey.com
*
*/

// The one and only way of getting global scope in all environments
// https://stackoverflow.com/q/3277182/1008999
var _global = typeof window === 'object' && window.window === window
  ? window : typeof self === 'object' && self.self === self
  ? self : typeof global === 'object' && global.global === global
  ? global
  : this

function bom (blob, opts) {
  if (typeof opts === 'undefined') opts = { autoBom: false }
  else if (typeof opts !== 'object') {
    console.warn('Deprecated: Expected third argument to be a object')
    opts = { autoBom: !opts }
  }

  // prepend BOM for UTF-8 XML and text/* types (including HTML)
  // note: your browser will automatically convert UTF-16 U+FEFF to EF BB BF
  if (opts.autoBom && /^\s*(?:text\/\S*|application\/xml|\S*\/\S*\+xml)\s*;.*charset\s*=\s*utf-8/i.test(blob.type)) {
    return new Blob([String.fromCharCode(0xFEFF), blob], { type: blob.type })
  }
  return blob
}

function download (url, name, opts) {
  var xhr = new XMLHttpRequest()
  xhr.open('GET', url)
  xhr.responseType = 'blob'
  xhr.onload = function () {
    saveAs(xhr.response, name, opts)
  }
  xhr.onerror = function () {
    console.error('could not download file')
  }
  xhr.send()
}

function corsEnabled (url) {
  var xhr = new XMLHttpRequest()
  // use sync to avoid popup blocker
  xhr.open('HEAD', url, false)
  try {
    xhr.send()
  } catch (e) {}
  return xhr.status >= 200 && xhr.status <= 299
}

// `a.click()` doesn't work for all browsers (#465)
function click (node) {
  try {
    node.dispatchEvent(new MouseEvent('click'))
  } catch (e) {
    var evt = document.createEvent('MouseEvents')
    evt.initMouseEvent('click', true, true, window, 0, 0, 0, 80,
                          20, false, false, false, false, 0, null)
    node.dispatchEvent(evt)
  }
}

var saveAs = _global.saveAs || (
  // probably in some web worker
  (typeof window !== 'object' || window !== _global)
    ? function saveAs () { /* noop */ }

  // Use download attribute first if possible (#193 Lumia mobile)
  : 'download' in HTMLAnchorElement.prototype
  ? function saveAs (blob, name, opts) {
    var URL = _global.URL || _global.webkitURL
    var a = document.createElement('a')
    name = name || blob.name || 'download'

    a.download = name
    a.rel = 'noopener' // tabnabbing

    // TODO: detect chrome extensions & packaged apps
    // a.target = '_blank'

    if (typeof blob === 'string') {
      // Support regular links
      a.href = blob
      if (a.origin !== location.origin) {
        corsEnabled(a.href)
          ? download(blob, name, opts)
          : click(a, a.target = '_blank')
      } else {
        click(a)
      }
    } else {
      // Support blobs
      a.href = URL.createObjectURL(blob)
      setTimeout(function () { URL.revokeObjectURL(a.href) }, 4E4) // 40s
      setTimeout(function () { click(a) }, 0)
    }
  }

  // Use msSaveOrOpenBlob as a second approach
  : 'msSaveOrOpenBlob' in navigator
  ? function saveAs (blob, name, opts) {
    name = name || blob.name || 'download'

    if (typeof blob === 'string') {
      if (corsEnabled(blob)) {
        download(blob, name, opts)
      } else {
        var a = document.createElement('a')
        a.href = blob
        a.target = '_blank'
        setTimeout(function () { click(a) })
      }
    } else {
      navigator.msSaveOrOpenBlob(bom(blob, opts), name)
    }
  }

  // Fallback to using FileReader and a popup
  : function saveAs (blob, name, opts, popup) {
    // Open a popup immediately do go around popup blocker
    // Mostly only available on user interaction and the fileReader is async so...
    popup = popup || open('', '_blank')
    if (popup) {
      popup.document.title =
      popup.document.body.innerText = 'downloading...'
    }

    if (typeof blob === 'string') return download(blob, name, opts)

    var force = blob.type === 'application/octet-stream'
    var isSafari = /constructor/i.test(_global.HTMLElement) || _global.safari
    var isChromeIOS = /CriOS\/[\d]+/.test(navigator.userAgent)

    if ((isChromeIOS || (force && isSafari)) && typeof FileReader === 'object') {
      // Safari doesn't allow downloading of blob URLs
      var reader = new FileReader()
      reader.onloadend = function () {
        var url = reader.result
        url = isChromeIOS ? url : url.replace(/^data:[^;]*;/, 'data:attachment/file;')
        if (popup) popup.location.href = url
        else location = url
        popup = null // reverse-tabnabbing #460
      }
      reader.readAsDataURL(blob)
    } else {
      var URL = _global.URL || _global.webkitURL
      var url = URL.createObjectURL(blob)
      if (popup) popup.location = url
      else location.href = url
      popup = null // reverse-tabnabbing #460
      setTimeout(function () { URL.revokeObjectURL(url) }, 4E4) // 40s
    }
  }
)

_global.saveAs = saveAs.saveAs = saveAs

if (typeof module !== 'undefined') {
  module.exports = saveAs;
}

最后的效果图

在这里插入图片描述

以下回答由chatgpt基于相关博客总结生成:
在处理读取目录下的json文件并合并去重的过程中，可能会遇到以下问题：
1. 文件路径错误：读取文件需要提供正确的文件路径。如果文件路径错误，将无法读取到文件内容。可以通过使用绝对路径或相对路径来解决此问题。
2. 文件格式错误：如果读取的文件不是合法的json格式，会导致读取失败。可以通过校验json格式或捕获解析异常来处理此问题。
3. 文件编码问题：如果文件使用的编码格式与读取程序不一致，会导致乱码或无法读取内容。可以通过指定文件编码格式来解决此问题。
4. 文件权限问题：如果没有足够的权限读取文件，会导致读取失败。可以通过更改文件权限或使用管理员权限运行程序来解决此问题。
5. 文件不存在的问题：如果指定的文件不存在，会导致读取失败。可以先判断文件是否存在，再进行读取操作。
具体的解决方案取决于具体的情况和代码实现。请提供更多错误信息或代码示例，以便更准确地找到解决方案。