-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFX Workspace: Create a Wasm-based LLM app for financial analysts [Term-3] #3715
Comments
Hi @juntao Do we plan to use external libraries to download the fillings like sec-edgar downloader , etc. Also, regarding the database for download of files, are we planning to host it on the cloud or somewhere. It would be of great help if you could possibly share some resources. |
You can use any library (and programming language) to develop this script. Initially, the document database should run locally -- perhaps a local MySQL or SQLite server. I think we can easily connect to remote database servers later. |
Dear @juntao , I tried making a script in python which could fetch the sec filings and save in SQL, but I wanted to know one thing. The libraries primarily download the filings as files (maybe text,pdf,html) on the disk first which can then be saved to sql database as binary large object file (BLOB). Do we need it this way or instead we directly save it to the database after parsing them and without saving on disk? |
The PDF can be saved as BLOB for sure. The text version needs to be inserted into a prompt later, so it needs to be clean text -- ideally markdown. I would suggest that you use an LLM to create the markdown file from the original text. Both the orginal text and the markdown text could be stored as BLOB or TEXT. The database table should have fields for the stock symbol and start / end / publication time for each report so that we can search for them later. |
Cool. Can you provide docs on how to run this? I mean what files I need to edit, what are the meaning of variables I need to change, and how to run it. Thanks. |
Please use this. I will provide a better version of the code once you give your insights. Thanks. |
Hey @juntao Are you able to run it? Could you please guide on the next steps? |
Never wait for me. According to the plan, the next step is to build the SQL database for the application to search for the latest reports. You will need to develop and test a special prompt that asks an LLM to extract all stock symbols from a paragraph of user question or comments. For example, if the user mentions Nvidia and Apple, it should respond with NVDA and AAPL. You can use a standard llama 3.1 model or a llama-3-groq tool call model for this. |
I was testing some prompts in this Colab. I tried for some of the LLMs on GAIA and most of them gave long answers. I think using multiple LLMs / asking multiple times can help us extract the exact stock symbols. Also I have implemented tool calling feature for adding SEC filings in SQL in this script. For the question-answer pairs and summary, do we add them in the database itself or keep them separately for just the knowledge base? |
Hmm, if the long answer is a problem, you could ask the model to return JSON format as in too call? This way, your client application can extract the JSON from the response. If no valid JSON is in the response, you could simply ask the LLM to re-generate. |
I think you need to write the client side project so that it blocks until one request finishes before starting the next request. You cannot run multiple requests in parallel (eg async api calls) since the server only has one GPU. |
The llama node you are using have 80k context length. So, it is correct to say 81920. You are using it to generate QA for your chunk? That means the chunk is too big. You will need to sub-divide it. |
Most RAG systems require chunks no greater than 300 tokens. We relaxed it to 81920 tokens -- that is 300x increase. But you are sending text much longer than that. |
But if I send greater than 8000 then I get time-out error again and again. How will I get the appropriate chunk size to not get time-out error while even being within the bounds of 80k. |
I think you need to 1 use small chunks 2 Slow down the requests -- make each request blocking and wait for 5 seconds after each request before sending next one. It could take 10+ hours to run through a large document. |
If the time-out problem persists, we could start a new api server just for this. But before we do that, I think we need to try simple things. You could also pause your program for 5 minutes when you see a time-out and then retry. |
Okay sir I was thinking of that but thought that longer times would be inconvenient . What about the gibberish text part? |
Shorter input would not produce gibberish. If your input is close to its context length, it tends to degrade. |
Sir I was getting gibberish text in some parts of markdown which was parsed via LlamaParse. |
Oh. Then it is a LlamaParse problem. :( Which backend LLM are you using with LlamaParse? |
I dont know the backend since I am just using the free API. Might be llama I guess. You can view the parsed markdown for Apple 10k. |
Perhaps try Or |
I have been thinking about this project. Many people told me that when they search for financial information: 1 They are only concerned about now. Historical data is interesting but not useful. 2 They know exactly which company / ticker symbol that they'd like to learn more. So, semantic search on past financial reports is probably not going to be useful. Given that, I'd like to make some changes to our approach. 1 We should focus on a SQL database that stores the latest financial statements. It is indexed by the stock symbol. We will use our tool to collect the financial reports, turn them into text, and save them in the database. 2 Let's build a web UI that allows the user to enter a stock symbol. It will create a page that has:
3 If the user enters a question about the stock symbol, it will use news + latest financial statement as the context to answer it. What do you think? Here is an example we could learn from: https://github.com/bklieger-groq/stockbot-on-groq |
Next week is great. Thank you! |
Check out this |
Dear Sir, |
My apologies for the delay. Yeah, the CrewAI demo is cool. Can you replicate their work but switch the backend from OpenAI to a Gaia node? If that is successful, we need to add a web UI to it. I saw that they have RAG search in the workflow. We should try to improve that. I do not like the idea of breaking up financial documents into small pieces. So, perhaps we should just do a SQL search on the latest reports and put the comprehensive summary (from another LLM) into the agent context. |
Screen.Recording.2024-10-26.at.1.13.45.PM.movHow about this? |
This looks great. Thank you. One thing I would like to do is to start with a standard HTML form input. We should just have a "search box" where the user must enter a stock symbol or company name. It will respond with the price chart + report summary + news. After that, the user can ask follow up questions in the chat ui. For the initial search box, I think we should allow for any natural description of the company he wants to search. For example, maybe "Elon Musk's car company" will map to the TSLA stock symbol. I am thinking that maybe perform a search and let the LLM figure out the stock symbol based on the search results. https://www.google.com/search?q=%22elon+musk%27s+car+company%22+stock+ticker+symbol Then, in the follow up conversation, we should have report summary + latest stock price + news in the "system prompt". Bonus feature: We could setup a Gaia node with "knowledge" on financial terms (e.g., what is a P/E ratio?). This knowledge could be a public text book or QA set, and we can create a vector DB from it for the node. |
For this initial search , do we perform a google search? If we want to inject latest data which is beyond the knowledge of the llm then this would help us find the stock symbol even for some new company. |
My current approach uses a tool call to fetch the ticker according to query and then it proceeds with transferring the (news+report summary+price chart) in the prompt and then answering questions until the I fetch a different ticker in the query. So if I want to put search operation for some natural description, then maybe we could use one more api call to first analyse the query and then predict some ticker. If the tool call is successful for that, then we can proceed via the above strategy. |
Which model did you use to generate the tool call? My experience with tool call is that they are not very reliable. I think asking the user to start by entering a stock symbol would be a faster and better experience. |
If we directly start by entering a symbol then do we really need a natural description? I used llamatool gaia node for tool calling and it was working fine mostly. |
Perhaps we can simply provide a list of popular symbols on the UI for the users to click on and select. This is the first time the user interact with our app, and we want it to be very reliable. |
Then sir what about the natural description of the company? I mean if the user provides the symbol then we dont need to map to the symbol right |
Yeah. I think for the initial MVP, we can ask for the symbol and focus on providing good information and engagement after we have the symbol. We can improve symbol lookup using NL later. |
https://github.com/lazyperson1020/LFX-Updates |
Can you make a single command to start both the database manager and chatbot? For example, the chatbot could be started on port 8080 and the database manager could be started on port 8081. Also, please read the parameters (e.g., database connection, LLM API endpoint, llamaparse API key etc.) from env vars or from the command line args. This way, I can run the entire app inside a Docker container. Also, I thought the app need to search and summarize latest news. You do not need an API key for a search service? |
I have used yfinance for searching the latest news. The problem is that they are protected content and sometimes cant be scraped. Which service do you want me to use instead? |
Hmm, I thought we are using yfinance for charts and data etc. And we can use a real search engine API, such as Tavily, Duckduckgo and Bing, to collect the news. But, lets try your current version first. Let me know when you have a server version that I can run in Docker. Thanks. |
Sir please check if it is fine now. |
Any updates sir?? |
Sorry for the delay. Can you provide a Dockerfile or a Docker image that I can simply run with a docker run command? You can map the necessary ports to the host and pass arguments (eg api keys) on the docker command line. Example: https://github.com/GaiaNet-AI/gaianet-node/tree/main/docker Thanks! |
Sir check now & tell if you face any problems |
Also, I would like the user to enter anything in the search box, and it would suggest symbols in a dropdown menu for the user to select. For example, the user could enter Tesla automobile and the dropdown would be TSLA. Finally, can we support crypto symbols such as
They would not have SEC reports. Just latest charts and news. But you cannot find them on Yahoo Finance. You could try the following |
This may happen due to wrong database connection parameters |
Oh, right. Do you have a SQL script that I need to run to initialize the database? (i.e., create tables etc) |
Sir you just need to set the connection values ( try it on local sql server ) and just put the name of your choice for DB_Name variable. Also how do I map the natural description to stock ? |
I think the best way to map natural language to stock symbol is to use an LLM. The system prompt should contain a list of stock symbols and its associated description. Then, we ask the LLM to map the user input to a most like stock symbol on that list. |
Dear Sir, please excuse me for a slight delay. I will update in some time. |
Sir do I need to provide documentation/demonstration also? |
I think your repo should suffice. Maybe change it to a different name -- e.g., But, I would like to see two features we described: 1 Suggest stock symbols (maybe in a drop-down menu) based on natural language description. 2 Support crypto symbols (BTC, ETH, SOL, etc) |
Project Title
Create a Wasm-based LLM app for financial analysts
Motivation
The WasmEdge community has developed a decentralised computing infrastructure named Gaianet where users can create and deploy customized AI agents as nodes. Each gaianet node can be fine-tuned with specific knowledge, allowing for personalized responses rather than generic AI outputs. In this project, we aim to develop a use case for analysis of the SEC (Securities Exchange Council) financial reports (like 10-K,10-Q) by an agent application.
Details
We download the SEC reports and press releases and then store them in a database for data retrieval. An open-source LLM is used to generate summary and question-answer pairs on the documents for analysis and a knowledge base is prepared from it for deployment of public node. Further the agent app utilises the database/web and LLM function call is performed to generate answers for the users.
Milestones
M1 [1 Week]
M2 [2 Weeks]
M3 [1 Week]
M4 [1.5 Weeks]
M5 [1.5 Weeks]
M6 [2 Weeks]
M7 [2 Weeks]
Appendix
The project's milestones will be refined and updated as the work progresses
The text was updated successfully, but these errors were encountered: