• Resolved creactivemind

    (@creactivemind)


    hi, thanks for the plugin and upates again.

    I’m learning how to finetue chatGPT using the instructions. But I’m wondering if the jsonl file should always contain “prompt” and “completion”.

    I’ve created a json lines file that looks like the following

    Would something like this also be understood by chatGPT? thank you.

    {"Date":"2021-01-31","Total Gains":-981.6,"Agricultural":-14.2,"Manufacturing":-45.9,"Construction":-19.5,"Retail and Wholesale":-218.4,"Warehouse and Transportation":29.6,"Restaurants and Lodging":-367.1}
    {"Date":"2021-02-28","Total Gains":-473.2,"Agricultural":32.7,"Manufacturing":-26.5,"Construction":28.1,"Retail and Wholesale":-193.9,"Warehouse and Transportation":25.3,"Restaurants and Lodging":-231.8}
    {"Date":"2021-03-31","Total Gains":313.6,"Agricultural":-25.7,"Manufacturing":-11.4,"Construction":91.7,"Retail and Wholesale":-167.7,"Warehouse and Transportation":72,"Restaurants and Lodging":-27.9}
    {"Date":"2021-04-30","Total Gains":651.5,"Agricultural":-3,"Manufacturing":8.9,"Construction":140.8,"Retail and Wholesale":-182.3,"Warehouse and Transportation":106.6,"Restaurants and Lodging":61.4}
    {"Date":"2021-05-31","Total Gains":619.4,"Agricultural":-3.1,"Manufacturing":19.4,"Construction":131.6,"Retail and Wholesale":-135.5,"Warehouse and Transportation":90.7,"Restaurants and Lodging":3.5}
    {"Date":"2021-06-30","Total Gains":582,"Agricultural":15.7,"Manufacturing":-10.2,"Construction":139.8,"Retail and Wholesale":-164.2,"Warehouse and Transportation":89.4,"Restaurants and Lodging":11.8}
    {"Date":"2021-07-31","Total Gains":542.4,"Agricultural":11.5,"Manufacturing":6.2,"Construction":92.4,"Retail and Wholesale":-185.9,"Warehouse and Transportation":120.8,"Restaurants and Lodging":-11.9}
    {"Date":"2021-08-31","Total Gains":518,"Agricultural":36.6,"Manufacturing":-75.6,"Construction":123.3,"Retail and Wholesale":-113.4,"Warehouse and Transportation":107.1,"Restaurants and Lodging":-38.2}
    {"Date":"2021-09-30","Total Gains":671,"Agricultural":22.2,"Manufacturing":-36.5,"Construction":57,"Retail and Wholesale":-121.8,"Warehouse and Transportation":162.9,"Restaurants and Lodging":38.7}
    {"Date":"2021-10-31","Total Gains":652.3,"Agricultural":20,"Manufacturing":-12.5,"Construction":52,"Retail and Wholesale":-113.3,"Warehouse and Transportation":162.5,"Restaurants and Lodging":21.5}
    {"Date":"2021-11-30","Total Gains":553.3,"Agricultural":30.7,"Manufacturing":50.7,"Construction":15.9,"Retail and Wholesale":-123,"Warehouse and Transportation":147.5,"Restaurants and Lodging":-85.9}
    {"Date":"2021-12-31","Total Gains":772.8,"Agricultural":35,"Manufacturing":36.9,"Construction":39.6,"Retail and Wholesale":-80,"Warehouse and Transportation":126.7,"Restaurants and Lodging":66.3}
    {"Date":"2022-01-31","Total Gains":1134.5,"Agricultural":87.8,"Manufacturing":65.5,"Construction":100.1,"Retail and Wholesale":-55.6,"Warehouse and Transportation":120.7,"Restaurants and Lodging":128.1}
    {"Date":"2022-02-28","Total Gains":1036.9,"Agricultural":49.2,"Manufacturing":31.6,"Construction":64.7,"Retail and Wholesale":-46.8,"Warehouse and Transportation":134.5,"Restaurants and Lodging":54.8}
    {"Date":"2022-03-31","Total Gains":831.1,"Agricultural":34.9,"Manufacturing":99.9,"Construction":63.5,"Retail and Wholesale":-32,"Warehouse and Transportation":81.2,"Restaurants and Lodging":-20.3}
    {"Date":"2022-04-30","Total Gains":864.6,"Agricultural":68.4,"Manufacturing":131.9,"Construction":48.1,"Retail and Wholesale":-11.3,"Warehouse and Transportation":86.6,"Restaurants and Lodging":-27.1}
    {"Date":"2022-05-31","Total Gains":934.9,"Agricultural":121.6,"Manufacturing":106.8,"Construction":72,"Retail and Wholesale":-44.5,"Warehouse and Transportation":120.2,"Restaurants and Lodging":33.9}
    {"Date":"2022-06-30","Total Gains":841.2,"Agricultural":88.7,"Manufacturing":157.5,"Construction":50.2,"Retail and Wholesale":-37.3,"Warehouse and Transportation":126.2,"Restaurants and Lodging":27.5}
    {"Date":"2022-07-31","Total Gains":826.2,"Agricultural":93.2,"Manufacturing":176.4,"Construction":16,"Retail and Wholesale":-9.6,"Warehouse and Transportation":83.3,"Restaurants and Lodging":54.4}
    {"Date":"2022-08-31","Total Gains":807.3,"Agricultural":89.9,"Manufacturing":239.5,"Construction":-22.4,"Retail and Wholesale":-13.5,"Warehouse and Transportation":74.9,"Restaurants and Lodging":67.1}
    {"Date":"2022-09-30","Total Gains":706.5,"Agricultural":84.3,"Manufacturing":227,"Construction":-11.9,"Retail and Wholesale":-24.4,"Warehouse and Transportation":26,"Restaurants and Lodging":93.7}
    {"Date":"2022-10-31","Total Gains":677.1,"Agricultural":46.6,"Manufacturing":201.4,"Construction":12.1,"Retail and Wholesale":-59.8,"Warehouse and Transportation":4.7,"Restaurants and Lodging":152.8}
    {"Date":"2022-11-30","Total Gains":626.3,"Agricultural":59.1,"Manufacturing":100.7,"Construction":11,"Retail and Wholesale":-77.7,"Warehouse and Transportation":-11.9,"Restaurants and Lodging":231.3}
    {"Date":"2022-12-31","Total Gains":509.4,"Agricultural":-14,"Manufacturing":85.8,"Construction":-12.3,"Retail and Wholesale":-72.6,"Warehouse and Transportation":-13.5,"Restaurants and Lodging":215.8}
    
Viewing 1 replies (of 1 total)
  • Plugin Author senols

    (@senols)

    Hi @creactivemind,

    Thanks for giving it a try. I am also super curious about how people will be fine tuning their models.

    Now regarding the data format, below informations is directly from OpenAI website. You need to convert your data to become prompt, completion format. OpenAI offers a free tool to convert data to jsonl. You may give it a try here: https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset

    Currently my plugin only support fine tuning.. but there are other methods like answer, search and classification. I will be adding those features too..

    For example:

    Answer format:

    {“text”: “2-1B (Too-Onebee) is a Light Side card from expansion set Hoth.”, “metadata”: “2-1B (Too-Onebee)”}
    {“text”: “2-1B (Too-Onebee) is rarity R1.”, “metadata”: “2-1B (Too-Onebee)”}

    Fine-tune format:

    {“prompt”: “2-1B (Too-Onebee) ->”, “completion”: ” is a Light Side card from expansion set Hoth.”}
    {“prompt”: “2-1B (Too-Onebee) ->”, “completion”: ” is rarity R1.”}

    Search format:

    {“text”: “2-1B (Too-Onebee) is a Light Side card from expansion set Hoth.\n2-1B (Too-Onebee) is rarity R1.\n2-1B (Too-Onebee) is a Character.\n2-1B (Too-Onebee) is a Character – Droid.\n2-1B (Too-Onebee) has uniqueness symbol *.\n2-1B (Too-Onebee) has lore of \”Made by Genetech. Unusually independent for a droid. Forced to serve a Moff on Firro, but was liberated by Tiree. Now dedicated to serving the Alliance.\”.\n2-1B (Too-Onebee) has gametext \”Once per turn, one of your non-droid characters lost from same site may go to your Used Pile rather than your Lost Pile. Subtracts 2 from X on you Bacta Tank.\”.\n2-1B (Too-Onebee) is destiny 1.\n2-1B (Too-Onebee) is power 0.\n2-1B (Too-Onebee) is deploy 2.\n2-1B (Too-Onebee) is forfeit 5.\n2-1B (Too-Onebee) is a unique Character.”, “metadata”: “2-1B (Too-Onebee)”}

    Data formatting

    To fine-tune a model, you’ll need a set of training examples that each consist of a single input (“prompt”) and its associated output (“completion”). This is notably different from using our base models, where you might input detailed instructions or multiple examples in a single prompt.

    • Each prompt should end with a fixed separator to inform the model when the prompt ends and the completion begins. A simple separator which generally works well is \n\n###\n\n. The separator should not appear elsewhere in any prompt.
    • Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.
    • Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be \n###, or any other token that does not appear in any completion.
    • For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.
Viewing 1 replies (of 1 total)
  • The topic ‘fine-tuning file format’ is closed to new replies.