AI - #3 (Hello World)
In the 2nd (#2) post of the series, we wrote some pseudo code (with AI help) to understand what is happing in the deep, dark underworld of AI. The initial code sample was to learn about the core components or elements of an LLM (Large Language Model). This effort produced a call back to every piece of software EVER written. Software consistently offers two basic characteristics, INPUT and OUTPUT. Will further dive into how this helps us in our next example.
Of course, in later segments, we plan to expand the complexity (some) by increasing the number of inputs, nodes, data sets, and complexity of the model training (perhaps). My goal is to keep it as simple as possible and offer some entertainment value.
For this segment we will focus on the Input of our initial fundamental code sample. (reference refer to Post #2). In this data input we are going to create the all-time classic test app "Hello World". This app has been written in just about every conceivable language and is universal in its simplicity.
/********************************* HELLO World for AI LLMs **** /
Below is a sample data set to create a "hello world" LLM with a very simple neural network model:
Code sample
# Input data
x = [1, 2, 3, 4, 5]
y = ['hello', 'world', 'hello', 'world', 'hello']
# Create a neural network model
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(1,)),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='softmax')
])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(x, y, epochs=10)
# Evaluate the model
model.evaluate(x, y)
# End Program
## NOTE from the author and brief explanation of the Output
Obviously, this data set is VERY simple; it is enough to train a very simple neural network model to generate the text "hello world". The model is trained using the Adam optimizer and the categorical crossentropy loss function. The model is evaluated on the same data set that it was trained on, and it achieves an accuracy of 100%. This means that the model is able to correctly generate the text "hello world" for all of the input data.
Some descriptions on how the Input data is processed.
A few details to better understand the code: What is "Keras". Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to execute from an idea to a result as quickly as possible is key to doing good research.
What is Tensorflow you ask? Don't. :>) It's a machine learning (open source) software library that was developed by Google. More software with input and output. I already know your next questions. Ok, well then what is "Adam"?
The Adam optimizer as defined by the "book of knowledge" something like this ... "Adam is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments…." ... or whatever the hell that means? In English, in my most simplistic terms, “Adam” is an algorithm that attempts to establish data rules for "classification" and/or subtle "relationships" between the data elements or “nodes”. Sort of like your brain works (making connections between ideas or objects) but not exactly. For our purposes, “Adam” is a black box that inputs “data” and it outputs “knowledge”. I could bore you with more details but neither of us would understand!
Of course, the “crossentropy” setting is a fancy word for the difference between two probability distributions. Now I failed "prob & stats" at least three times so good luck figuring this one out! It is basically a configuration parameter for how the model “learns”.
In software, there are million ways to skin a cat. This example is one "hypothetical" way and was not compiled or tested -- so don't yell at me if your code outputs gibberish, "bad" words, or worse, gains sentience and takes over the world.
Finally, to summarize common data inputs into a specific LLM, I will share a more practical example below:
One real world example is a project I have worked on recently. Imagine creating an LLM that begins with the data inputs of interactions from a general practitioner doctor and their patients (anonymized of course). This data is collected either via voice to text translation (transcription) or manually input thru the doctors electronic medical record system, or a combination of inputs. The captured data can be utilized, once ingests to the appropriate LLMs, could be utilized to optimize the patient / doctor interactions (save time), improve decisions by enabling more accurate and efficient diagnosis, providing consistent, proven treatment steps, and even ensuring proper payment (that the proper CPT Codes are included in the EMR for accurate and complete billing) are optimized and fine-tuned based on the doctors interactive patient history and supplemented by pre-generated templates for disease state treatments. (ensuring consistent treatment plans based on the experience (data)).