Rafael Dias Ghiorzi, 21🇧🇷

Software Developer

Computer Science Graduate @ Universidade de Brasília

Education

Computer Science at Universidade de Brasília

Jul 2023 — Jul 2028

Experience

Data Scientist and Researcher at Instituto de Pesquisa Econômica Aplicada

Feb 2025 — Dec 2025

• Deployment, versioning, and maintenance of a Python package for structured information extraction from PDF documents, based on open-source Machine Learning projects • Design and implementation of a data extraction pipeline, integrating multiple Machine Learning techniques to generate structured tables from historical documents • Development of an intelligent agent system for the institutional knowledge repository, using state-of-the-art frameworks and RAG (Retrieval-Augmented Generation) architectures

Git
GitHub
GitLab
Python
Agno
GCS
CI/CD
YAML
HuggingFace
OCR
AI
RAG
Docling
Manager and Full-Stack Developer at Empresa de Júnior de Computação - CJR

Oct 2023 — Feb 2025

• Leadership and coordination of development teams using Scrum, increasing operational efficiency and delivery predictability • Development and maintenance of web applications with Next.js, NestJS, and PostgreSQL, with a stronger focus on front-end architecture and implementation • Automation of internal processes through a member management system, reducing the execution time of organizational processes by up to half

NextJs
NestJs
React
TailwindCSS
Asana
Agile
PostgreSQL
Python
TypeScript
JavaScript

Projects

Foca na Gestão de Membros (FGM)

• Development of a complete web portal for managing internal processes of a junior enterprise’s members • Implementation of data collection and analysis to support internal policy planning • Design of a scalable architecture using modern front-end and back-end frameworks • Integration with AWS services, including Amazon RDS and S3 for data persistence and storage

NextJs
NestJs
PostgreSQL
AWS
Python
Docker
RAG publicações

• Development of a RAG system for search and question answering in Ipea’s public knowledge repository, with support for tables, charts, and images • Document ingestion pipeline using Docling, local computer vision models, and embedding generation • Implementation of a multi-agent system using the Agno framework and the ChatGPT API to handle different query profiles • Deployment with Docker and Kubernetes; Streamlit interface and FastAPI backend

Python
HuggingFace
Agno
Qdrant
Docker
Name Extraction - DOE São Paulo

• Development of a hybrid pipeline (CV + OCR + LLM) for structured extraction of student data from scanned historical official documents • Fine-tuning and validation of a YOLO model for robust multi-column layout detection, increasing name recall to up to 75% • Integration of OCR with typographic metadata (font size, style) and reconstruction of educational hierarchy in noisy documents • Application of LLMs for reliable semantic parsing of semi-structured text, replacing fragile regex-based approaches • Empirical evaluation of multiple approaches (OCR-only, k-means, CV-only), including cost, error, and scalability analysis • Generation of a longitudinal dataset for educational studies (1982–2001), enabling previously infeasible analyses

Python
Yolo
Google DocumentAI
OpenAI

Skills

Programming Languages
Python
Java
C++
HTML
CSS
TypeScript
JavaScript
SQL
Bash
Frameworks
Agno
NextJs
NestJs
FastAPI
PyTorch
Technologies
Linux
GitHub
PostgreSQL
Terraform
AWS
Git
Kubernetes
Docker
GCS
HuggingFace
Terraform

Languages

Portuguese — Native
English — Fluent
Italian — Beginner
Spanish — Beginner