While explaining Optuna to a client in the context of hyperparameter tuning, and performing more research on the topic, I came across AutoGluon to perform "AutoML for images, text, and tabular data". After a quick scan of the documentation, I decided to give it a try and see how it performs on a simple project.
Saturday, February 3, 2024
Sunday, January 28, 2024
I recently attended the Esri Saudi Arabia User Conference and was amazed by the changes in the Kingdom. The capital city of Riyadh is booming and proliferating. During the conference, I presented on integrating GenerativeAI and GIS in the plenary session and led a session on BigData and GeoAnalytics Engine. GeoAnalytics Engine, based on Apache Spark, allows spatial operations on Spark data frames. We showcased a project called "A Day in the Life," which used historical traffic data from HERE to demonstrate traffic congestion during peak hours. Traffic is notoriously bad in the city, so this was a fitting example. My colleague Mahmoud H. presented a traditional workflow process in a Jupyter Notebook off a Google Cloud DataProc cluster, efficiently processing over 300 million records (this is relatively "small"). The processed traffic information was then displayed in ArcGIS Pro in a time-aware layer to reflect the congestion visually while activating a time slider. At the end of the presentation, we surprised the audience by using ChatGPT to translate Arabic sentences to SparkSQL code, and Azure OpenAI GPT4 handled the translation very well. Look here for code snippets. This form of interaction IS the future, and I am excited to invest more in this technology and in the following areas:
- Enhanced Visualization and Real-time Data Integration:
- Dynamic Visualization: Integrating real-time traffic data feeds into existing models. This will not only show historical congestion but also provide live updates. Dynamic heatmaps can be particularly effective in visualizing the intensity of traffic at different times.
- 3D Modeling: Utilize ArcGIS's 3D scene capabilities to give a more immersive view of traffic congestion and urban planning scenarios.
- Improved Data Analysis through Machine Learning:
- Predictive Analytics: Integrate machine learning models to predict future traffic patterns based on historical data, weather conditions, events, and other variables.
- Anomaly Detection: Implement anomaly detection algorithms to identify unusual traffic patterns, which can be crucial for incident response and urban planning.
- Enhancing User Interaction and Accessibility:
- Multilingual Support: While we showcased the translation of Arabic sentences to SparkSQL code, we should consider expanding this feature to include more languages, making your tool more accessible to a global audience.
- Voice Commands and Chatbots: Integrate voice command functionality and develop a chatbot using Azure OpenAI GPT4 for querying and controlling the GeoAnalytics Engine, making the system more interactive and user-friendly.
- Scalability and Performance Optimization:
- Optimization for Large Datasets: Continue to refine the efficiency of processing large datasets. Explore the latest advancements in distributed computing and in-memory processing to handle even larger datasets more efficiently.
- Cloud Integration: Ensure the solutions are cloud-agnostic and can be deployed on any public or private cloud provider, enhancing the system's scalability and reliability.
- Collaboration and Sharing:
- Collaborative Features: Develop features that allow multiple users to work on the same project simultaneously, including version control and change tracking for shared projects.
- Export and Sharing Options: Enhance the ability to export results and visualizations in various formats and share them across different platforms, facilitating easier collaboration and reporting.
- Ethical Considerations and Transparency:
- Data Privacy: Address data privacy concerns by implementing robust data encryption and anonymization techniques, ensuring that individual privacy is respected while analyzing traffic patterns.
- Algorithm Transparency: Provide clear documentation and explanations of the algorithms used, promoting transparency and trust in your system.
Saturday, January 27, 2024
Hello, everyone. It has been a while since my last post, and I wanted to explain my absence. I have been working on demanding client projects requiring confidentiality, so I couldn't share anything.
But now I'm back and excited to dive into something new and exciting.
Generative AI (GenAI) has gained much attention lately, but I'm taking it to a different level by merging Large Language Models (LLMs) with insights from geospatial analysis. It's GenAI with a GeoSpatial twist.
I'm thrilled to be back and can't wait to start this new journey with you. Keep an eye on this space for future updates, tips, and unique code snippets. Your feedback and questions are valuable, so please don't hesitate to reach out.
I'll see you in the next post, and as usual, you can check out the source code here.
Monday, August 24, 2020
Monday, August 3, 2020
Sunday, August 2, 2020
Note that the join is to a "small" spatial dataset that we can:
- Broadcast to all the spark workers.
- Brutly traverse it on each worker, as it is cheaper and faster to do so that spatially index it.
Saturday, August 1, 2020
Micropathing is the construction of a target's path from a limited set of a consecutive sequence of target points. Typically, the sequence is time-based, and the collection is limited to 2 or 3 target points. The following is an illustration of 2 micropaths derived from 3 target points:
Micropathing is different than path reconstruction, in such that the latter produced one polyline for the path of a target. Path reconstruction losses insightful in-path behavior, as a large number of attributes cannot be associated with the path parts. Some can argue that the points along the path can be enriched with these attributes. However, with the current implementations of Point objects, we are limited to only the extra M and Z to the necessary X and Y. You can also join the PathID and M to a lookup table and gain back that insight, but that joining is typically expensive and is difficult to derive from it the "expression" of the path using traditional mapping. A micropath overcomes today's limitations with today's traditional means to express the path insight better.
So, a micropath is a line composed of typically 2 points only and is associated with a set of attributes that describe that line. These attributes are typical enrichment metrics derived from its two ends. An attribute can be, for example, the traveled distance, time, or speed.
In this notebook, we will construct "clean" micropaths using SparkSQL. What do I mean by clean? As we all know, emitted target points are notoriously affected by noise, so using SparkSQL, we will eliminate that noise during the micropath construction.
Here is a result:
More to come...