Large Language Models – Links

Carnegie Mellon University

The L3 Lab course at Carnegie Mellon University (CMU) is an advanced study area combining Machine Learning, Language, and Logic. Key components of the course include:

  1. Principles of Machine Learning for Language: This aspect delves into the core principles of applying machine learning to language processing. It encompasses both theoretical foundations and practical applications, involving coding and mathematical concepts essential for understanding and developing language models.
  2. Machine Learning for High-Trust Applications: This module focuses on the use of machine learning in contexts where trust and reliability are paramount. This includes the verification of mathematical proofs and software, ensuring that machine learning applications are robust, secure, and reliable, particularly in critical environments.
  3. Enabling Self-Improving Machine Learning Systems: The course also emphasizes the development of machine learning systems that are capable of discovering new knowledge and improving autonomously over time. This includes techniques and methodologies for creating systems that can adapt, learn from new data, and evolve their capabilities without human intervention.

Overall, the L3 Lab at CMU offers a comprehensive and interdisciplinary approach to understanding and advancing the field of machine learning, especially in its application to language and logic in high-stakes scenarios.
Take at look at the link here: https://cmu-l3.github.io

Inflection

https://inflection.ai/company

AVA PLS

https://avapls.com

LANGCHAIN Blog

https://blog.langchain.dev/testing-fine-tuned-open-source-models-in-langsmith/

Brev.dev

Brev is a dev tool that makes it really easy to code on a GPU in the cloud. Brev does 3 things: provision, configure, and connect.
https://github.com/brevdev/notebooks

Hands On Train and Deploy ML

https://github.com/Paulescu/hands-on-train-and-deploy-ml

Brain Computing Interfaces – Topics

https://schedule.gdconf.com/session/future-realities-summit-seven-ways-vr-will-change-the-world-of-mental-healthcare-/884675?_mc=sem_gdcsf_sem_x_le_tspr_brndca__2022

https://www.br41n.io/IEEE-SMC-2022

https://www.wiley.com/en-us/Brain+Computer+Interfaces+2%3A+Technology+and+Applications-p-9781848219632

https://compneuro.neuromatch.io/tutorials/intro.html

http://xrbiosense.org/

https://mailchi.mp/248bbf405e31/explore-the-neurotech-world

Precision Medicine and the Future

In recent years, there has been an amplified focus on the use of artificial intelligence [AI] in various domains to resolve complex issues. Likewise, the adoption of artificial intelligence [AI] in healthcare is growing while radically changing the face of healthcare delivery. AI is being employed in a myriad of settings including hospitals, clinical laboratories, and research facilities. AI approaches employing machines to sense and comprehend data like humans has opened up previously unavailable or unrecognized opportunities for clinical practitioners and health service organizations. Some examples include utilizing AI approaches to analyze unstructured data such as photos, videos, physician notes to enable clinical decision making; use of intelligence interfaces to enhance patient engagement and compliance with treatment; and predictive modelling to manage patient flow and hospital capacity/resource allocation.

The continuous improvement in our understanding of the human genome and AI fused is leading to an increasing viable and effective Precision Medicine. Its intention is to provide a personalized solution to any individual health problem. Nevertheless, three main issues must be considered to make Precision Medicine with AI a reality:

  1. The understanding of the huge amount of genomic data, spread out in hundreds of genome data sources, with different formats and contents, whose semantic interoperability is a must;
  2. The development of information systems intended to guide the search of relevant genomic repositories related with a disease, the identification of significant information for its prevention, diagnosis and/or treatment and its management in an efficient software platform;
  3. The high variability in the quality of the publicly available information. by
    1. using a precise conceptual schema of the human genome, and
    2. introducing a method to search, identify, load and adequately interpret the required data, assuring its quality during the entire process

Precision Medicine platform is also a gateway to a new paradigm of prevention, diagnosis and treatment of diseases. The Precision Medicine approach is based on the individuality of each human being. It considers the genetic predisposition, lifestyle and the influence of the environment over the health to take the right decisions for each patient. In order to succeed when applying Precision Medicine in the clinical practice, it is necessary to integrate information coming from diverse areas of knowledge. But they have been traditionally studied independently. They are the so called “omic sciences”, such as Genomics, Proteomics, Epigenomics and Pharmacogenomics. All these sciences have experimented a great progress during the last two decades, especially Genomics.
Nevertheless, to take advantage of it, we must be able to provide mechanisms to enrich this knowledge with information coming from other research areas, such as fMRI, MRI, EEG, EHR, MEG and more. In order to achieve this aim, two main issues must be faced:

  • The available information is heterogeneous and dispersed:hundreds of different genomic data sources are publicly available, allowing biologists and clinicians to tackle complex diseases in a multidisciplinary way. However, they have been commonly developed ad-hoc, focused on addressing specific knowledge requirements and not designed to share information among them.
  • The complexity of biological processes, the noisy nature of experimental data and the diversity of sequencing technologies results in a great variability in the quality of the available information. That is why a huge amount of information is ready to be used, but only part of it is relevant to be applied with clinical purposes.

Personalized genomic medicine [PGM] is only one component of precision medicine, which means, in a broadest sense, it requires clinical care providers to combine genomic information with other types of information, such as fMRI, MRI, EHR, EEG, MEG, biochemical, physiological testing results, neuro-developmental history, environmental exposures, and psychosocial experiences. The most important goal is provide more precise diagnosis, genetic counseling, management, prevention, and therapy. This effort has already got started, but plenty of work still needs to be done before PGM becomes integrated into medicine itself.

Conflict of interest

The author declares no conflict of interest.

What is Brain Computing Interfaces you ask?

For the past 4 years I have been working and having many discussions about a new interaction technique which directly connects a human brain and a machine. Researchers in laboratories all over the world are using BCI devices to record and analyze brain signals. This happens to be the one of the popular ways to read our brain signals. If you google BCI today you will get tons of hits on several products that are commercially available. These BCI systems are being used to control a variety of external devices, from cursors and avatars on computer screens to televisions and wheelchairs, to robotic arms and neuroprostheses. People with and without disabilities have tested these systems and a few people who are severely disabled are already using them for important purposes in their daily lives. As mentioned earlier, Brain is the most complex of all the other organs of human body. Interestingly, it generates electrical signals to directly and indirectly control the entire body. The electrical activity of the brain generated by millions of neurons is recorded by medical technique known as Electroencephalogram [EEG]. Since the last six years or so, there has been an explosion in these BCI devices that are being used for commercial use.
The EEG data usually consists of four waves, namely alpha, beta, delta and theta waves.

Delta Waves: These are obtained during sleep, extremely deep meditation. It’s frequency ranges from 0.5 – 4Hz
Theta Waves: These waves occur during emotional stress, disappointment, day dreaming, drowsiness, etc. Its frequency range is from 4 – 8Hz
Alpha Waves: These waves normally occur in relaxed conditions such as deep relaxation, imagination, intuitive thinking. It’s frequency ranges from 8 – 12 Hz
Beta Waves: Beta waves occur only when the person is in active state

I frequently organize Hackathon and Meetups on this interesting topic. You can signup here at https://www.meetup.com/NOVA-Brain-Computing/

I will have more interesting topics in the coming days / weeks. Stay tuned

Conflict of interest

The author declares no conflict of interest.

Base 64 to Image

Recently I embarked on a journey to explore the benefits of Base 64 encoding for Images and Image search using this technique. In this blog I would like to share my experience. Hope this helps.

Base64
A better mechanism to transport binary data is not by streaming the bits and bytes over the cable in a raw format. This is because the media files are made for streaming text. There are some protocols that may interpret your binary data as control characters or your binary data can be changed because the protocol that is used to transport this data may think that you have used a special character sets.

Although, this is one of the disadvantages, users started encoding the binary data into characters. Base64 happens to be one of the basic types of encodings. Essentially each 6 bits of the input is encoded in a 64 character alphabet. The “standard” alphabet uses A-Z, a-z, 0-9 and + and /, with = as a padding character. So in essence, Base-64 encoding is a way of taking bunary data and turning it into text so that it’s more easily transmitted in things like e-mail and HTML form data.
Base64 is often used in cryptography is not a security mechanism. Anyone can convert the Base64 string back to its original bytes, so it should not be used as a means for protecting data, only as a format to display or store raw bytes more easily.
Example: Base-64 maps 3 bytes (8 x 3 = 24 bits) in 4 characters that span 6-bits (6 x 4 = 24 bits). The result looks something like “TWFuIGlzIGRpc3Rpb…”. Therefore the bloating is only a mere 4/3 = 1.3333333 times the original.

Image to Base64 String:
“Web Performance”, this can be very critical to our application design. Steve Souders has dedicated his career talking about improving website performances. It’s embedded in my head that extra HTTP requests add a lot of additional overhead, and we should explore new ways to dramatically decrease the load time of our web app. We explored Image Sprites, where we crammed a collection of images to put into a single image. A web page with many images can take a long time to load and generates multiple server requests, using image sprites will reduce the numbers of server requests and save bandwidth. The concept of image sprits is predominant in the Video Games industry. When it comes to viewing an image from sprites is to rearrange a “viewport” of sorts to view only specific pieces of that file at a time.
Although the “sprites” approach can save some resource time, since multiple requests have now been combined into one requests, but there were some drawbacks when it came to handling medical images such as “TIFF” files.
It was hard to maintain and update, increased memory consumption and bleedthrough [Nearby images visibly bleeding through other elements].

Base64 encoding images – Option to the Rescue:
You can embed an image as an inline HTML code using Base64 encoding.

Example:

<img src=”data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAADSCAMAAABThmYtAAAAXVB”>

Java Code to Convert an Image to Base64 String:

private static String encodeImageToBase64Binary(File file){
 String encodedfile = null;
 try {
 FileInputStream fileInputStreamReader = new FileInputStream(file);
 byte[] bytes = new byte[(int)file.length()];
 fileInputStreamReader.read(bytes);
 encodedfile = Base64.encodeBase64(bytes).toString();
 } catch (FileNotFoundException e) {
 // TODO Auto-generated catch block
 e.printStackTrace();
 } catch (IOException e) {
 // TODO Auto-generated catch block
 e.printStackTrace();
 }

 

Featured

Algorithms

The term algorithm is used in computer science to describe a finite, deterministic, and effective problem-solving method suitable for implementation as a computer program. Algorithms are the stuff of computer science: they are central objects of study in the field.
We can define an algorithm by describing a procedure for solving a problem in a natural language, or by writing a computer program that implements the procedure.

Algorithms are not confined to mathematics alone. When you cook bread from a recipe, you’re following an algorithm. When you knit a sweater from a pattern, you’re following an algorithm. When you put a sharp edge on a piece of flint by executing a precise sequence of strikes with the end of an antler—a key step in making fine stone tools—you’re following an algorithm. Algorithms have been a part of human technology ever since the Stone Age.

THE STUDY OF ALGORITHMS AND DATA STRUCTURES is fundamental to any computer-science curriculum, but it is not just for programmers and computer-science students. Everyone who uses a computer wants it to run faster or to solve larger problems.

From N-body simulation problems in physics to genetic-sequencing problems in molecular biology, the algorithms have become essential in scientific research; from architectural modeling systems to aircraft simulation, they have become essential tools in engineering; and from database systems to internet search engines, they have become essential parts of modern software systems.

The descriptions of algorithms in this blog are based on complete implementations and on a discussion of the operations of these programs on a consistent set of examples. In addition to presenting pseudo-code, I like to work with real code, so that the programs can quickly be put to practical use. I tend ro write the code in Java, Python or both, but in a style such that most of our code can be reused to develop implementations in other modern programming languages.

As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.”
Happy learning!

Here is a list of my favorite Open Data Sources that I use with my students often: